Re: STL map or hash map using struct as data and find it
On Dec 31, 11:48 am, kl <kjell.ll...@gmail.com> wrote:
I'm trying to learn some STL using map or hash_map would maybe
even better to use but I don't really know how to find a
specific struct depending on a DWORD value that is in range of
two DWORD values (see below for more).
A hash map generally cannot be made to work with a range. In
the case of std::map, it will take a somewhat special ordering
function, but I think it can be made to work. I'd still tend to
favor a sorted std::vector, using lower_bound for the look-up.
So what I trying to achieve is creating a look up table of a
IP adress with a start adress (a DWORD value) and a end
adress (a DWORD value) which I would like to look up to the
get the Country name using a specific IP adress (also in
DWORD) which is random access to the table.
Just a guess, but I would imagine that lookups outnumber
insertions by a very significant margin. That most likely, in
fact, that table is initialized once from a file, before first
use, and never modified afterwards.
In that case, std::vector and lower_bound are definitely the way
to go. In fact, I'd recommend sorting the initialization file
once and for all, and just asserting on the order while reading
it.
It is easy to use vector, linked list but I would like to try
to use map or hash map for more effecient way to look up or at
least test it out to compare some.
A sorted vector and lower_bound is likely to be faster than
std::map, since it will make roughly the same number of
comparisons, and will have a lot better locality. If speed is
really an issue, of course, the solution in this particular case
is probably to use a trie starting with the high order byte. My
guess is that a single table lookup (using direct indexation)
will suffice 99% of the time, or more.
The code is basicly looks like this:
//Struct definition
struct IPCOUNTRY
{
DWORD startIP; //IP start adress in DWORD which is always unique=
DWORD endIP; //IP end adress in DWORD which is always unique
char COUNTRYNAME[45]; //Country name like England, Germany Spain e=
tc...
};
Unless you need static initialization, you should probably use
std::string instead of char[45]. More importantly, you should
generally avoid all caps except for macros, and probably use
more or less standard types, like uint32_t, instead of some
typedef to an unknown type, like DWORD (which a priori I would
expect to be a 64 bit type).
typedef map <DWORD, IPCOUNTRY> mIPC;
mIPC mIPcountry;
That doesn't look right to me. You're not mapping a single IP
address to a country. mIPC.lower_bound will still work, but
only if you assume a dense allocation, with no holes. I'd go
for something like:
typedef std::set< IpCountry, IpCountryCmp >
Map ;
Map myIpCountryMap ;
Defining IpCountryCmp is not going to be that simple, however.
//Initilize the map when and call this function during start up
void initilizeMap()
{
//read data from file and insert it into a map
IPCOUNTRY g_structIP;
FILE *fp=NULL;
fp=fopen("ipcountry.dat", "rb");
if(fp!=NULL)
{
while( !feof( fp ) )
{
if(fread(&g_structIP, sizeof(g_structIP), 1, fp)!=
=0)
{
mIPcountry[g_structIP.startIP] = g_str=
uctIP;
}
}
}
fclose(fp);
}
The above code will not work at all, for several reasons.
First, of course, you haven't told us the format of the file
you're reading, so it's difficult to say how you really should
read it, but fread is only a solution if you're reading the data
into a buffer of raw bytes, which you then parse manually.
(Most likely, you're dealing with a file in CSV format, unless
you're reading directly from a data base connection. So some
parsing will probably be necessary anyway.)
Secondly, feof only tells you whether the last read encountered
EOF or not; if there was some other type of error, it will
remain false. Forever.
And fclose on a null pointer is undefined behavior.
The classical solution for this problem would be to write an
operator>> for IPCOUNTRY, and use it on an ifstream. Using the
FILE* functions of C here is a sure loser.
struct compareIPC
{
bool operator()(const IPCOUNTRY* ipc1, const IPCOUNTRY* icp2) const
{
return strcmp(ipc1, ipc2) < 0;
}
};
strcmp on something which isn't a string (a null terminated
sequence of char) is a sure recepe for disaster.
//This function will be called with IPaddress set and set the
countryname depending which country it belongs
//to using the map look up table
int lookupCountryName(DWORD IPadress, char *countryname)
{
//ok here I'm lost what I should exactly do
mIPC::iterator mIPCbegin = mIPcountry.begin();
mIPC::iterator mIPCend = mIPcountry.end();
return 0;
}
If you're using std::set, you'll need to construct a "key", i.e.
an IPCOUNTRY, such that the comparison function will work
correctly. If you do this, I'd probably use something like:
class IpCountryEntry
{
public:
//! \post
//! country() == ""
//! start() == 0
//! end() == 0
IpCountryEntry() ;
//! \post
//! country() == country
//! start() == start
//! end() == end + 1
//! for half open range...
IpCountryEntry( uint32_t start,
uint32_t end,
std::string const& country ) ;
//! \post
//! country() == ""
//! start() == ipAddress
//! end() == ipAddress
IpCountryEntry( uint32_t ipAddress ) ;
std::string country() const ;
uint32_t start() const ;
uint32_t end() const ;
bool operator<(
IpCountryEntry const& other ) const ;
// ...
} ;
The trick would be to ensure that operator< always returned
false if one of the ranges was completely included in the other
(and probably raised an exception if the ranges overlapped).
You could then use the find() function of std::set.
But as I said, a sorted std::vector and std::lower_bound would
probably be more appropriate. Note that the comparison function
of lower_bound is a template argument, and the value argument's
type is also separate from the value_type of the iterator, so
you can invoke something like:
std::lower_bound( v.begin(), v.end(),
ipAddress,
CmpIpEntry() ) ;
where CmpIpEntry is something like:
struct CmpIpEntry
{
bool operator( IpCountryEntry const& lhs,
IpCountryEntry const& rhs ) const ;
bool operator( IpCountryEntry const& lhs,
uint32_t rhs ) const ;
bool operator( uint32_t lhs,
IpCountryEntry const& rhs ) const ;
} ;
(Formally, at least according to the latest draft, the third
function above should never be called. But it's trivial to
implement, just in case.)
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34