Re: iostreams and code conversion
Jens Theisen wrote:
I already posted this question in gnu.g++.help, as it might be
platform specific; it might also not be, so I'm posting it here as I
didn't get any answer.
I experimented with the iostreams locale support and am puzzled with
Locale supporte (except for locale "C") is more or less
implementation defined, so don't expect too much portability.
Different versions of g++ have different levels of support; for
a long time, there was almost none, but more recent versions
seem to have improved greatly.
#include <fstream> // for the next example
using namespace std;
int main(int argc, char** argv)
wcout.imbue( locale("") );
wcout << wchar_t(0xe4); // is the unicode codepoint for `??'
using a en_GB.UTF-8 locale, but the output isn't converted to utf8
(not sure what it actually is that comes out).
A '?' on my implementation. (Under Linux; it's also possible
that the results depend on the underlying system.)
The conversion facet is correctly installed, and indeed
os.imbue( locale("") );
os << wchar_t(0xe4);
does what one would expect. I had a quick scan through the
sources of libstd++ (gcc's C++ library) and noticed that the
only place where libstdc++ appear to honour this facet are the
filebufs - which appears to me as the least sensible place.
It's the only standard streambuf/wstreambuf which is supposed to
do code translation. The standard doesn't indicate what type of
streambuf/wstreambuf cin/wcin uses, so there's no real guarantee
of much of anything from a standards point of view. From a
quality of implementation point of view, I think it is
reasonable to expect either a filebuf/wfilebuf or a custom
streambuf/wstreambuf which behaves in a similar fashion with
regards to code translation. (Ideally, it would be nice if the
standard required a filebuf/wfilebuf. But I think that this
could create some implementation problems on some systems.)
The standard indeed mentions special code conversion with this
facet for fstreams, and appears not to for other streams (in
particular, I can't find anything about cout/cin/cerr in this
Code translation (or at least the code translation here) is
designed to map between external and internal encodings.
Logically, it only makes sense for filebuf; you wouldn't want it
in stringbuf, for example.
And the standard doesn't say anything about the type of
streambuf cin, cout, etc. use. Logically, at least on systems
where the standard in, etc. are pre-defined "files" (e.g. Unix,
Unix-like and Windows), it should be a filebuf, but the standard
doesn't require it, and in the end, it is a quality of
implementation issue. (Practically speaking, of course, if the
actual streambuf type is a custom streambuf which acts exactly
like a filebuf, except that it doesn't support is_open, open and
close, that might be acceptable.)
However, it seems very reasonable to have them at least for the
stdio_sync_filebuf as well,
It might seem reasonable, but the standard cannot require it, as
the stdio_sync_filebuf is not in the standard; the standard
expects synchronization to be done with a flag in ios_base,
which can be set or unset for each file.
I don't have any g++ sources currently installed on my machines
here, to see exactly what this streambuf does. There is a valid
question as to what an stdio_filebuf---a filebuf which simply
forewards requests to a C FILE*---should do in this case,
because the FILE* is presumably also handling code translation.
Have you tried setting the global C locale to ""? That did the
trick on my Linux machine:-).
as one would rather want the above code
for output "just work", and after all, the following works for
me in C:
char const* narrow = "\xc3\xa4";
wchar_t const* wide = L"\xe4";
printf("printf narrow: %s\n", narrow);
printf("printf wide: %ls\n", wide);
So it should in C++ with iostreams, shouldn't it?
Logically. But there's not much logic in the relationship
between the C locales and those in C++; you can expect
implementations to vary. But with g++ under Linux, at least,
setlocale( LC_ALL, "" ) ;
std::locale::global( std::local( "" ) ) ;
before the first output apparently make your code work.
In practice; I don't think this is standard, and in fact, I
think it contradicts the standand; changing the global locale
should not change the local imbued in streams that are already
constructed. On the other hand, it's a useful and logical
behavior, probably preferable to what the standard does
require. I think that the ideal behavior, from a user point of
view, is probably something along the lines of the standard IO
objects tracking the global locale until the first output, OR
until imbue is called on them (and then, that they be required
to respect the imbue). But there's certainly no justification
for that in the standard.
Does someone know what the language requires? Does someone
know what platforms generally implement?
I know that what is implemented varies greatly from one platform
to the next. What the standard says is pretty much nothing. It
specifies the types of the standard iostream object (and I'm not
sure that this intentional---historically, all that was
guaranteed in the classic IO streams is that they be either an
[io]stream OR a class derived from an [io]stream). Obviously,
the associated streambuf is a derived class, since streambuf is
abstract. But the standard doesn't say what kind.
James Kanze (Gabi Software) email: firstname.lastname@example.org
Conseils en informatique orient?e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S?mard, 78210 St.-Cyr-l'?cole, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]