Re: Memory issues with Map
Well the OP hasn't replied, but I think this is an interesting subject,
having had to solve similar problems myself in the past.
Here's a simpler case: say I have a std::map<int,int>, which I
initialise from a file and subsequently only read from. The memory
required will be substantially more than 2*N*sizeof(int); I'm not sure
exactly what the overhead of a typical map implementation is, but I
guess that each tree node has pointers to two child nodes, a parent
node, and the data; if sizeof(T*)==sizeof(int), then that needs
6*N*sizeof(int) plus any allocator overhead.
So here's one way to improve that: use a std::vector< std::pair<int,int>
>. Read them in and then sort the array, and then use a binary search
to do lookups. That should have the same complexity as using a map, and
needs only 2*N*sizeof(int) storage.
But it becomes more interesting if you want to be able to modify the
data after initially loading it. With the vector, insert will be O(N)
rather than O(log N). So is there a solution that has O(log N) insert
and lookup complexity and ~ 2*N*sizeof(int) storage? For example, can
you group together M of the pair<int,int>s, and store those groups in a
map? The overhead of the map is then amortised over the M elements.
Example (just the flavour, lots of details ommitted - feel free to
finish it if you like!):
template <typename KEY, typename VALUE>
class dense_map {
typedef std::vector<VALUE> vec_t;
typedef std::map<KEY,vec_t> impl_t;
impl_t impl;
public:
VALUE& operator[](KEY k) {
impl_t::iterator i = impl.upper_bound(k); --i;
vec_t& v = i->second;
vec_t::iterator j = std::find(v.begin(),v.end(),k);
return *j;
}
void insert(std::pair<KEY,VALUE> p) {
KEY& k = p.first;
impl_t::iterator i = impl.upper_bound(k); --i;
vec_t& v = i->second;
if (v.size()>max_size_per_chunk) {
// split v into two chunks, and re-start the insert
} else {
v.push_back(p);
}
}
};
Question: can something like this be implemented that has the same
guarantees (complexity, iterator validity, etc) as a std::map? In
particular, I think there's an unavoidable conflict between the split
(and join, in erase) operations and iterator / reference validity. So
it can't be used as a drop-in replacement for a std::map. Any thoughts?
Cheers, Phil.