Re: Convert CString to LONG.
"Control Freq" <nick@nhthomas.freeserve.co.uk> ha scritto nel messaggio
news:bb766698-a861-4f6d-8bd5-c0840c6cba47@y21g2000hsf.googlegroups.com...
I am confused by the UNICODE or non-UNICODE aspects.
It is a "history" problem.
In the past, people used ASCII to store strings. Each character was stored
in a 1-byte 'char'.
For American people having 256 (i.e. 2^8 bits) "symbols" is fine. American
alphabet letters are few (just a-Z).
But, for example, we in Italy have "accented letters", too, like ? ? ? ...
And people from Germany have other symbols, and people from Norway others,
etc.
And the Chinese and Japanese men have *thousands* symbols to write their
texts...
So, one ASCII byte is absolutely useless in these contexts...
Then, other encodings were defined. There were code-pages and other messy
stuff.
For example, I recall that under MS-DOS I had Italian code page, but
text-based user interfaces of American programs were drawn in a meaningless
way, because characters with most-significant-bit on (i.e. >= 128) where
code-page dependent, and so let's say that in American code-page the value
200 was associated to a straight vertical double line e.g. || (used to draw
windows borders in text-mode), instead in Italian code page 200 was "?", so
instead of having a border like this:
||
||
||
the Italian code page rendered that as:
?
?
?
(or something similar).
At certain point in time, Unicode was considered the standard to represent
text.
But for back compatibility for non-Unicode platforms like Windows 9x, the
Microsoft APIs started this Unicode-aware thing, i.e. the API were available
in both ANSI and Unicode version. Typically, ANSI version ended with A,
instead Unicode version ended with W, e.g.
DoSomethingA
DoSomethingW
and the Windows header files had the _UNICODE (or UNICODE) trick, e.g.
#ifdef _UNICODE
#define DoSomething DoSomethingW /* Unicode */
#else
#define DoSomething DoSomethingA /* ANSI */
#endif
So, in your code you used DoSomething, but it actually expanded to
DoSomethingA or DoSomethingW basing on Unicode build mode.
The same trick was there for strings, e.g. you can have ANSI strings (const
char *), and Unicode strings (const wchar_t *); using a preprocessor macro
(TCHAR) you can write code that expands to char* or wchar_t* based on
Unicode build mode:
#ifdef _UNICODE
#define TCHAR wchar_t
#else
#define TCHAR char
#endif
And string literals in ANSI are identified by "something", instead in
Unicode they have the L prefix: L"something".
So there is the _T() decorator, which expands to nothing on ANSI builds, and
to L"" in Unicode builds.
_T("something") --> "something" (in ANSI builds)
_T("something") --> L"something" (in Unicode builds)
How should I make my code correct (compiling and running) in a UNICODE
and non-UNICODE aware way. I presume I should be coding so that a
#define _UNICODE would produce unicode aware binary, and without that
definition it will produce a non-unicode aware binary, but the actual
code will be the same.
It is easy to code in Unicode-aware way if you use CString and Microsoft
non-standard stuff, like the Unicode-aware version of C library routines.
e.g. instead of using sscanf, you should use _stscanf.
For example:
<code>
int ParseInt( const CString & str )
{
int n;
_stscanf( str, _T("%d"), &n );
... check parsing error
return n;
}
...
CString s = _T("1032");
int n = ParseInt( s );
</code>
The above code is Unicode-ware. It compiles and runs fine in both ANSI/MBCS
and Unicode builds (i.e. when _UNICODE and UNICODE are #defined).
In ANSI builds, the _T() decorator expands to nothing, _stscanf expands to
sscanf(), and CString is a char-based string (or CStringA, in modern Visual
C++).
So, in ANSI builds the code *automatically* (thanks to preprocessor
#define's or typedef's) becomes something like this:
<code>
int ParseInt( const CStringA & str )
{
int n;
sscanf( str, "%d", &n );
... check parsing error
return n;
}
...
CStringA s = "1032";
int n = ParseInt( s );
</code>
Instead, in Unicode builds (UNICODE and _UNICODE #defined), the code
becomes:
TCHAR --> wchar_t
CString -> CStringW,
_T("something") --> L"something"
_stscanf() --> swscanf()
<code>
int ParseInt( const CStringW & str )
{
int n;
swscanf( str, L"%d", &n );
... check parsing error
return n;
}
...
CStringW s = L"1032";
int n = ParseInt( s );
</code>
So, if you use CString, _T() decorator, and Microsoft Unicode-aware
extensions to C standard library, you can build code that compiles in both
ANSI and Unicode builds.
The problem is with C++ standard library and I/O streams, which do not have
this TCHAR idea.
So, if you really want to use C++ standard library, you may choose to write
Unicode source code from the start, i.e. using wchar_t, or wcout or wcerr
and wcin instead of ANSI cout, cerr, cin...
UNICODE and _UNICODE macros are Windows-only, not portable C++.
It may be possible to do some trick for standard C++ libraries, too.
e.g. for STL string, you may simulate CString behaviour with something like
this:
typedef std::basic_string< TCHAR > TString;
In ANSI builds, TCHAR expands to char, and TString will expands to
std::basic_string< char > == std::string.
Instead, in Unicode builds, TCHAR expands to wchar_t, and TString will
expands to std::basic_string< wchar_t > == std::wstring.
HTH,
Giovanni