Re: std::string and case insensitive comparison
On Jul 20, 11:04 am, Kai-Uwe Bux <jkherci...@gmx.net> wrote:
[...]
If I had a pound for everytime this mistake is made I would be as rich
as Bill Gates.
tolower( String1[i] )
is undefined since char may be signed and therefore you may
pass a negative number to tolower. tolower is only defined
on integer values in the range of unsigned char and the
value of EOF.
tolower( (unsigned char) String1[i] )
is correct.
This also means that
std::transform(str.begin(), str.end(), tolower)
is undefined for the same reason.
That wording is a little too harsh. The above code has perfectly
well-defined behavior for quite a lot of input values.
By "the above code", which example to you mean? "tolower(
String1[i] )" has undefined behavior for slightly more than half
all input values if char is signed (as it is by default with
most C++ compilers).
To dismiss it as undefined is like saying *p is undefined
since p might be null.
If p might be null, it is undefined. That's why we generally
check it before hand, or require the user to do so. If the
specification of his StrLowCompare function specifically says
that the behavior is undefined if e.g. either of the strings
actually contains a character not in the basic execution
character set, then he's off the hook. But then every user must
verify any strings which contain characters from the outside.
And it's a pain, because a lot of normal text does contain
characters outside the basic execution character set.
I agree,
however, that one can and should do better.
For the use in std::transform(), I would suggest a function object like
this:
#include <locale>
#include <string>
#include <iostream>
#include <algorithm>
class to_lower {
std::locale const & loc;
public:
to_lower ( std::locale const & r_loc = std::locale() )
: loc ( r_loc )
{}
The defaul argument (and probably most of the arguments a user
will pass here) are temporaries, and will leave you with a
dangling reference once you return from the constructor. The
loc member should not be a temporary.
template < typename CharT >
CharT operator() ( CharT chr ) const {
return( std::tolower( chr, this->loc ) );
}
}; // class to_lower;
I'd suggest extracting the ctype facet once up front, since
that's what std::tolower is going to do anyway.
For most applications, using a std::ctype<char> const* as the
member is probably the appropriate solution, e.g. :
template< typename charT >
class toLower
{
public:
typedef std::ctype< charT >
CType ;
explicit toLower( std::locale const& loc =
std::locale() )
: myCType( &std::use_facet< CType >( loc ) )
{
}
charT operator( charT in ) const
{
return myCType->tolower( in ) ;
}
private:
CType const* myCType ;
} ;
This has a potential problem with the lifetime of the facet if
the user passes it a temporary locale, or changes the locale
while instance of the class is alive. A perfectly robust
solution requires keeping a copy of the locale in the object as
well (which in turn makes copying it significantly more
expensive).
int main ( void ) {
std::string str ( "Hello World!" );
std::transform ( str.begin(), str.end(), str.begin(), to_lower() );
This actually will work with your code, because the temporary
passed to the constructor of to_lower will last until the end of
the full expression. Something like:
to_lower l ;
std::transform( s1.begin(), s1.end(), s1.begin(), l ) ;
won't, however. And it's what I'd naturally write if I wanted
to call transform on a number of strings. e.g.:
to_lower l ;
for ( std::vector< std::string > it = v.begin() ;
it != v.end() ;
++ it ) {
std::transform( it->begin(), it->end(), it->begin(), l ) ;
}
std::cout << str << '\n';
}
In professional code, I agree that using <locale> is the way to
go. But <locale> was designed to make it particularly difficult
to use. For a beginner, I'd suggest writing your own functional
object with the tolower in <ctype>, and casting the char to
unsigned char. While less flexible as a solution based on
<locale>, it's an order of magnitude (or more) simpler to write
and understand.
--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34