Re: Caseless String
Le Chaud Lapin wrote:
so I guess at this point I will
retreat, work with my code, and come up with some concrete examples
that I can present for discussion.
Thanks so far...
Update:
After some fiddling, at least it seems that it is possible to have a
caseless string. I don't have the same knid of warm and fuzzy that I
got when developing, say, Priortized_Associative_Set<>, but it's a
start:
My objective was to do caseless string comparisons:
Caseless_String s1 = "Hello";
String s2 = "HeLLo";
s1 == s2; // true.
I am not sure, but it seems so far that a way to do this is to not
define caseless strings, but a caseless character class:
template <typename C> struct Caseless
{
typedef C Type;
C c;
Caseless(C c = 0) : c(c) {}
// operator C & () {return c;}
} ;
Then define, for example,
String<Caseless<wchar_t> > s3 = "World."
String<wchart_t> s4 = "WORLD."
s3 == s4; // true
The code for template String<> would be written as it would by anyone
making a string class template, with the exception that most of the
member functions would be templates themselves. More about that in a
moment.
The key for caseless comparisons is to define global operators for
comparisons between a Caseless<> character and any other character.
The rule is simple - whenever a Caseless<> character is involved in a
comparison with any other type of character (including another
Caseless<> character), both get converted using toupper before doing
the comparison:
template <typename C, typename X> inline bool operator == (const
Caseless<C> &c, const X &x) {return toupper(c.c) == toupper(x);}
template <typename C, typename X> inline bool operator != (const
Caseless<C> &c, const X &x) {return toupper(c.c) != toupper(x);}
These two functions are just two of a set of functions defined for when
a Casless<> character is present as the right operand, the left
operand, or both operands.
Furthermore, I noticed that std::string does not allow copy
construction from a narrow string to a wide string or vice-versa:
std::string s6 = "Hallo";
std::wstring s7 = s6; // Construction from different type not
permitted.
It seemed reasonable that this should be allowed. The same could be
said for inter-string assignment. I also felt that the programmer
should be left to his own conscience to decide if it is appropriate to
compare, say, a String<char> with a String<wchar_t>; After all, there
are many situations where this is desirable, and where it is known from
context that the character set of that particular String<wchar_t> is
limited to those that could be contained by a char (think IP
addresses), so comparison is guaranteed not to cause any surprises.
To supply these three features:
template <class C1 = char> class String
{
template <typename C2> String (const String<C2> &that);
template <typename C2> bool operator == (const String<C2> &) const;
// template member function for assignment omitted (I lost it
somewhere)
}
Then one could write:
String<Caseless<unsigned int> > s8 = "SALUT LE MONDE."
String<unsigned short> s9 = "Hola, Que Tal?"
String<long double> s10 = s8;
String<int> s11 = s9;
s11 == s9; // True
I have not yet thought about signed-ness for different character types.
I suspect that there is troube ahead. Still, this seems better than
the other options.
Note finally, that the Caseless<> template could be useful in its own
right. For example, to check to see if string contains the letter 'z'
or 'Z', without regard for case, one could wrap a lower-case 'z' in a
Caseless<>, then supplied the packaged character to a bool contains ()
template member function of the string.
I do have one question:
Caseless<> has only one member, c, whose type is the type of the
character being wrapped. I used Visual C++ to verify that
sizeof(Caseless<char>) == 1. This makes sense. There is only 1 field,
and it has an alignment requirement of 1-byte.
I would like to know if I can rely on this behavior in general. IOW,
when I make an array of Caseless<>, am I guranteed that packing will be
as optimal as it would have been had I not wrapped the character in
Casless<>, based on the fact that there is one and only one member of
the struct?
-Le Chaud Lapin-
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]