[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Re: Unicode and Security
In a message dated 2002-02-09 13:00:59 Pacific Standard Time,
larsga@garshol.priv.no writes:
> It seems to me that this problem really needs some other fix than the
> merging of all similar-looking characters in all character sets. I
> just can't see that working.
Even the "merging" part wouldn't work. Let's say that I, like Ken Sakamura
or Bernard Miller before me, have decided that I know much more about
character encoding than the Unicode Consortium or WG2, and I am going to
develop my own character encoding that will solve the problem of confusables
once and for all.
OK, we start with the easy ones. Latin A, Greek Alpha, and Cyrillic A all
get unified. Latin E, Greek Epsilon, Cyrillic E, unified. Hey, this is
easier than I thought. Latin B, Greek Beta, Cyrillic Ve. Ha! I'm smart
enough to know that Ve gets unified with B and Beta, even though it
represents a different sound. Just like Han unification! Boy, those Unicode
dolts really missed something there.
Let's keep going. Latin Y, Greek Upsilon, Cyrillic U. Wait a minute, that
Cyrillic U doesn't look *quite* the same. Oh well, it's close enough, right?
Let's try some lower-case letters. Latin a, Greek alpha, Cyrillic a. That
Greek alpha looks kinda cursive, doesn't it? Should we unify it or not.
Hmmm...
How about Latin n and Greek eta? Is that descender on the eta significant or
not? Hey, you could stick an eta in the middle of a Web address and really
fool somebody. Better unify. How about Latin v and Greek nu? Different
glyphs or not? In 9-point MS Sans Serif, they're pretty close, aren't they?
(And don't forget Armenian vo!) Same goes for Latin y and Greek gamma.
Well, you get the point. The world of alphabetic confusables is just not
that simple or that 1-to-1. There are more edge cases, in fact, than obvious
cases such as the a/alpha or o/omicron that we keep hearing about. And if I
were trying to design this hypothetical "Uniglyph" encoding to get rid of
those pesky confusables, and still provide support for alphabetic scripts
besides Latin, I would eventually have to face the fact that it *can't be
done*. Oh, sure, it can be done for a/alpha and o/omicron, so I can make a
sales presentation or a picket sign. But a complete technical solution, uh,
no.
-Doug Ewell
Fullerton, California
(address will soon change to dewell at adelphia dot net)