Re: 32-bit characters in Java string literals
On 2009-12-23 03:30:29 -0500, Roedy Green
On Tue, 22 Dec 2009 18:01:17 -0800, Roedy Green
<email@example.com> wrote, quoted or indirectly quoted
someone who said :
I started to think about what would be needed to make this less
If you had only a few, you could create library of named constants for
them, and glue them together with compile time concatenation. With
only a little cleverness, a compiler would avoid embedding constants
it did not use.
Is any OS, JVM, utility, browser etc. capable of rendering a code
point above 0xffff? I get the impression all we can do is embed them
in UTF-8 files.
OS X comes with fonts that contain glyphs for some (but not all)
characters above U+FFFF out of the box, and can render them anywhere
they appear. Their visibility in Swing apps depends heavily on the L&F;
if you don't force it, Java will default to the Aqua L&F and render
most things correctly.
Webapps, obviously, render nothing; they send encoded characters to
other things, which may render them. Safari, Chrome, and Firefox can
all render U+1D360 (COUNTING ROD UNIT DIGIT ONE).
In the interests of science, what characters do you see on the next line?
???? ???? ???? ???? ???? ???? ????
This message is encoded as UTF-8, and those should be, in order,
Codepoint (UTF-8 representation) NAME
U+10100 (F0 90 84 80) AGEAN WORD SEPARATOR LINE
U+10140 (F0 90 85 80) GREEK ACROPHONIC ATTIC ONE QUARTER
U+10190 (F0 90 86 90) ROMAN SEXTANS SIGN
U+10300 (F0 90 8C 80) OLD ITALIC LETTER A
U+10400 (F0 90 90 80) DESERET CAPITAL LETTER LONG I
U+10450 (F0 90 91 90) SHAVIAN LETTER PEEP
U+1D121 (F0 9D 84 A1) MUSICAL SYMBOL C CLEF
with spaces between.