Re: Cheap way to tokenize variable array?
Travis wrote:
Given an array (say of ints or bytes if easier to think about), the
begginning of the array tells you the overall length and each index in
the array is formatted as follows:
first byte: length
second / third byte: unique id
optional: data
Also keep in mind this array is never sorted in anyway. So to find a
specific unique ID, I'm forced to traverse the entire thing until I
find it.
What I'm curious about is if there's a more efficient way to do this.
Any STL mechanisms that might make this less expensive. Something that
can tokenize the array or something like (given that i know the status
ID I'm looking for).
It has been already mentioned (by Alf) that this would be
a good candidate for std::map. But what are the id's like?
If you have a two byte id, that means you have 65534 different
chunks. The simplest thing would be then - to create an offset
array:
...
size_t id_offset[65536]; // maximum unsigned short offset
...
and store the chunk offsets at the corresponding
array elements:
[pseudo]
...
typedef unsigned short int unique_id_t; // assert: this has to be 2 bytes!
...
const unsigned char *p = theArray;
for(size_t n=0; n<nchunks; n++) {
// store the unique_id's offsett located at p+1
id_offset[ *(unique_id_t *)(p+1) ] = p - theArray;
// advance to next chunk
p += *p;
}
...
If you want to access your "array", just use this offset
...
// access array chunks by stored offset
size_t id = 10333;
unsigned char *chunk = theArray + id_offset[id];
cout << "id:" << id << "/" << *(unique_id_t*)(chunk+1)
<< ", chunk length: " << size_t(*chunk)
<< " bytes, data:" << string(chunk+3, chunk+*chunk)
<< endl;
...
If this won't be possible, you can use the hash (std::map)
in the same spirit:
...
typedef unsigned short int unique_id_t; // the 2 byte thing
typedef map<unique_id_t, size_t> MyHash;
...
// create a hash map from the data array
void map_chunks_to(MyHash& hash, const unsigned char *theArray, size_t nchunks)
{
const unsigned char *p = theArray;
for(size_t n=0; n<nchunks; n++) {
hash[ *(unique_id_t *)(p+1) ] = p - theArray;
p += *p;
}
}
...
the access will then be similar to the above:
MyHash h;
map_chunks_to(h, theArray, nchunks); // see above
...
size_t id = 10333;
unsigned char *chunk = theArray + h[id];
...
cout << ...
Regards
M.