Re: Case insensitive set of strings
On Apr 17, 9:30 pm, Adrian <n...@bluedreamer.com> wrote:
I want a const static std::set of strings which is case insensitive
for the values.
So I have the following which seems to work but something doesnt seem
right about it. Is there a better way or any gotcha's from my code
below.
Your code has undefined behavior.
#include <iostream>
#include <functional>
#include <algorithm>
#include <set>
#include <string>
#include <iterator>
Don't forget:
#include <cctype>
(or <locale>, if you use the toupper functions from there).
class Test
{
public:
void p()
{
std::copy(fields.begin(), fields.end(),
std::ostream_iterator<std::string>(std::cout, ","));
std::cout << std::endl;
}
private:
struct nocase_cmp : public std::binary_function<const
std::string &, const std::string &, bool>
{
struct nocase_char_cmp : public std::binary_function<char,
char, bool>
{
bool operator()(char a, char b)
The function should be const, I think.
{
return std::toupper(a) < std::toupper(b);
Calling the single argument form of toupper with a char as
argument is undefined behavior. The argument type is int, with
the constraint that the value of the int must be either EOF, or
in the range [0...UCHAR_MAX]. If char is signed, it won't be in
range when converted (implicitly) to int.
There are two solutions here: either explicitly convert the char
to unsigned char before calling toupper, e.g.:
return toupper( static_cast< unsigned char >( a ) )
< toupper( static_cast< unsigned char >( b ) ) ;
or use the two operator forms in std::ctype. (In that case, I
would use something like:
class nocase_char_cmp
{
public:
typedef std::ctype< char >
ctype ;
explicit nocase_char_cmp(
std::locale const& l = std::locale() )
: my_ctype( &std::use_facet< ctype >( l ) )
{
}
bool operator()( char a, char b ) const
{
return my_ctype->tolower( a ) < my_ctype->toupper( a ) ;
}
private:
ctype const* my_ctype ;
} ;
..)
If you have a lot of case insensitive comparisons, it might be
worth writing a case insensitive collate facet (or there might
even be one available ready-made); in that case, just pass an
std::locale with this facet as the fifth argument to
lexicographical_compare, and you're done with it.
}
};
bool operator()(const std::string &a, const std::string &b)
{
return std::lexicographical_compare(a.begin(), a.end(),
b.begin(), b.end(),
nocase_char_cmp());
}
};
typedef std::set<std::string, nocase_cmp> Field_names_t;
static const Field_names_t fields;};
const char *f[]={
"string1",
"string2",
"string3",
"STRIng1",
"string5"};
Try throwing in some characters whose encoding results in a
negative number, and see what happens. (On my machine, just
about any accented character will do the trick. In my test
suites, I'll generally make sure that there is a =FF somewhere,
since in the most frequent encoding, it is 0xFF, which, when
stored into a char, becomes -1, or EOF. You'd be surprised how
many programs stop when they encounter this character in a
file.)
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34