Re: How to convert from UTF-8 or ASCII to UTF-16 and back.
"Tom Serface" <tom.nospam@camaswood.com> wrote in message
news:umaX3e0sHHA.4476@TK2MSFTNGP03.phx.gbl...
Hi Jeff,
Are you asking how to do this or offering up a solution. I looked at your
.cpp file and I can't testify to whether or not it works (I'll assume it
does), but why not just use the following?
If you are using ATL/MFC you may find these macros handy:
http://msdn2.microsoft.com/en-us/library/87zae4a3(VS.80).aspx
Otherwise take a look at MultiByteToWideChar() and WideCharToMultiByte()
functions.
I didn't click on the .EXE link (wouldn't do that in a newsgroup), but
like I said, I'll assume it works.
BTW, the "magic" number you're referring to is called a BOM (Byte Order
Mark) and you'll find it at the start of most Unicode and UTF-8 files. It
odes make it easier to figure out the file type.
Thanks for your post. The code was an interesting read.
Tom
"Jeff.Relf" <Jeff_Relf@Yahoo.COM> wrote in message
news:Jeff_Relf_2007_Jun_20__6_1_A0@Cotse.NET...
Hi Tom_Serface Mr. Z.K. and David Lowndes,
This is my line-wrapper for .HTM files and the like:
www.Cotse.NET/users/jeffrelf/Wrap_HTML.EXE
www.Cotse.NET/users/jeffrelf/Wrap_HTML.CPP ( VC++ 8 )
www.Cotse.NET/users/jeffrelf/Wrap_HTML.VCProj
Pass Wrap_HTML.EXE the file you want to wrap.
( e.g. run " Wrap_HTML index.HTM " )
It's a simple example of how to convert from UTF-8 or ASCII to UTF-16
and then then back to the original encoding ( UTF-8 or ASCII ).
UTF-16 files begin like this:
" const wchar_t Magic_UTF_16 = 0xFeFF ; ";
UTF-8 like this:
" const unsigned char Magic_UTF_8[] = { 0xeF, 0xbb, 0xbF }; ".
Basically, Unicode is just wchar_t ( an unsigned short )
instead of char ( i.e. a " 7-bit " signed byte, __int8 ).
Intel is little byte first,
so a memory dump of the " space glyph " ( ASCII 32, 20 hex )
shows " 20 00 " ( hex ).
Some UTF-16 characters aren't ever used,
allowing custom control codes like this
( used to color-code differences between 2 files ):
const wchar_t
Ch_Default = 0xD801 , Ch_Hi = 0xD802 , Ch_Klld = 0xD803
, Ch_Born = 0xD804 , Ch_Klld_Swapd = 0xD805
, Ch_Born_Swapd = 0xD806 ;
For more on that, search for " Dif.CPP " at my website:
" www.Cotse.NET/users/jeffrelf ".
Don't you love it when you look at the "example code" (having looked up
"MultiByteToWideChar" and followed the example code link to "Looking Up a
User's Full Name") and find:
MultiByteToWideChar( CP_ACP, 0, UserName,
strlen(UserName)+1, wszUserName,
sizeof(wszUserName)/sizeof(wszUserName[0]) );
MultiByteTOWideChar( CP_ACP, 0, Domain,
strlen(Domain)+1, wszDomain,
sizeof(wszDomain)/sizeof(wszDomain[0]) );
Was it ever compiled, let alone tested?