Re: regular expressions

From:

"Oliver Wong" <owong@castortech.com>

Newsgroups:

comp.lang.java.programmer

Date:

Tue, 25 Apr 2006 20:25:05 GMT

Message-ID:

<BAv3g.397$aI4.228@edtnps89>

"Roedy Green" <my_email_is_posted_on_my_website@munged.invalid> wrote in
message news:hhts4219t18ckkq38aohngblqi919goprg@4ax.com...

On Tue, 25 Apr 2006 19:08:12 GMT, "Oliver Wong" <owong@castortech.com>
wrote, quoted or indirectly quoted someone who said :

All of categories are mutually exclusive except for "Unicode
characters". And any character that you can get in memory via a program
written in Java is a "unicode character", so that last category seems
pretty
redundant. Perhaps you mean something like a character within Unicode, but
outside of ASCII?

I think that is what he meant, something like ó or ⇒ You
just want to mix up the categories to foil a simple dictionary search.

You could do it pretty easily with a giant switch. Unfortunately
switches don't implement ranges, so you have have to code that
manually if you don't want to spell it out longhand.

To test whether a given unicode character is outside of (or inside of,
for that matter) ASCII, you could serialize it to ASCII, then re-read the
ASCII data back into an in-memory Java string, and check if you still have
the same original character that you started with. I believe what most ASCII
encoders do for characters outside of ASCII is replace them with the '?'
character.

default handles
the unicode. You might add control character category and reject
such passwords. Putting whitespace on either end of a password is not
a wise idea.

    I suspect whitespace isn't that big of a problem, because any password
validationg system which performs a trim() on the password before processing
it is probably very poorly designed. Control characters (e.g. backspace,
EOF, etc.) is probably a very bad idea, because different systems will
handle them differently. Using outside-of-ASCII characters is also a bit
risky for web based authentication, because one day you might be trying to
access your site from a terminal which only supports ASCII. As Unicode
support becomes more widespread, this will probably be less of an issue.

    One particularly bad password system implementation is Microsoft's ".NET
Passport" (which actually has very little to do with the .NET platform, to
which C# usually compiles). When you create your passport account, your
password is silently truncated to something like 12 or 14 characters; but
when you validate your password, it doesn't get truncated.

    So if I create a new account with the password "1234567890ABCDEF", the
database will be updated to say that my password is "1234567890AB", but the
website never mentions that truncation has occured. Then when I try to log
on with the password "1234567890ABCDEF", it compares "1234567890ABCDEF"
(what I wrote) against "1234567890AB" (what's in the DB), sees that they are
not equal, and tell me that my password is incorrect.

    It took me several days to figure out why my 20 character password
wasn't working.

    - Oliver