[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Where will we see bad domain names?





On 10 Jan 2001, D. J. Bernstein wrote:
> The obvious solution is to improve the address-typing support in the
> operating system's Japanese keyboard interface. That's the only piece of
> software that has to worry about bad domain names. It will provide a
> friendly Japanese-specific environment for typing good domain names. It
> will deal with the dot problem.
> 
> Done.

I am not sure I am following you. Are you suggesting that the
Input Method Editor be rewritten to do the right thing? or the
domain name text widget?

> In contrast, Patrik and Bill want us to add bad->good conversion code
> (and Japanese dot handling) to Outlook, Messenger, Eudora, Mutt, Pine,
> Sendmail (which will have to check whether incoming IDNs match UTF-8
> domain names in user-typed configuration files), qmail (same reason),
> Exchange, IMail, Exim, Post.Office, Explorer, Communicator, Opera, Lynx,
> w3m, Apache, IIS, Enterprise, WebLogic, Zeus, BIND, djbdns, Webmin, etc.

Something like nameprep is already in place.
U+0061 == U+0041 aka  A == a
U+0041 LATIN CAPITAL LETTER A
U+0061 LATIN SMALL LETTER A

How do we handle the same characters at seperate code points?
U+FF21 FULLWIDTH LATIN CAPITAL LETTER A
U+FF41 FULLWIDTH LATIN SMALL LETTER A

Should code point location be preserved or mapped?
U+0061 == U+0041 == U+FF21 == U+FF41
 or
(U+FF21 -> U+0041) == (U+FF41 -> U+0061)

Then how should this be handled? 
U+30F2 KATAKANA LETTER WO
U+FF66 HALFWIDTH KATAKANA LETTER WO

> Of course, if every one of those programs has to be changed, then every
> one of them has to be redeployed. Patrik claims that this is necessary
> for full IDN support. But my solution will allow some of these programs
> to be left alone: they can already handle UTF-8 IDNs without trouble.

My cynical view is that it is better to rewrite the applications and
protocols to support UTF-8. Until then every other language will be
second to thouse that map cleanly into ASCII. I dont see things changing
after they get implemented.

-Bill