[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Where will we see bad domain names?
- To: "D. J. Bernstein" <djb@cr.yp.to>
- Subject: Re: [idn] Where will we see bad domain names?
- From: William Morris <me@williammorris.com>
- Date: Tue, 9 Jan 2001 23:01:28 -0500 (EST)
- cc: idn@ops.ietf.org
- Delivery-date: Tue, 09 Jan 2001 20:02:14 -0800
- Envelope-to: idn-data@psg.com
On 10 Jan 2001, D. J. Bernstein wrote:
> The obvious solution is to improve the address-typing support in the
> operating system's Japanese keyboard interface. That's the only piece of
> software that has to worry about bad domain names. It will provide a
> friendly Japanese-specific environment for typing good domain names. It
> will deal with the dot problem.
>
> Done.
I am not sure I am following you. Are you suggesting that the
Input Method Editor be rewritten to do the right thing? or the
domain name text widget?
> In contrast, Patrik and Bill want us to add bad->good conversion code
> (and Japanese dot handling) to Outlook, Messenger, Eudora, Mutt, Pine,
> Sendmail (which will have to check whether incoming IDNs match UTF-8
> domain names in user-typed configuration files), qmail (same reason),
> Exchange, IMail, Exim, Post.Office, Explorer, Communicator, Opera, Lynx,
> w3m, Apache, IIS, Enterprise, WebLogic, Zeus, BIND, djbdns, Webmin, etc.
Something like nameprep is already in place.
U+0061 == U+0041 aka A == a
U+0041 LATIN CAPITAL LETTER A
U+0061 LATIN SMALL LETTER A
How do we handle the same characters at seperate code points?
U+FF21 FULLWIDTH LATIN CAPITAL LETTER A
U+FF41 FULLWIDTH LATIN SMALL LETTER A
Should code point location be preserved or mapped?
U+0061 == U+0041 == U+FF21 == U+FF41
or
(U+FF21 -> U+0041) == (U+FF41 -> U+0061)
Then how should this be handled?
U+30F2 KATAKANA LETTER WO
U+FF66 HALFWIDTH KATAKANA LETTER WO
> Of course, if every one of those programs has to be changed, then every
> one of them has to be redeployed. Patrik claims that this is necessary
> for full IDN support. But my solution will allow some of these programs
> to be left alone: they can already handle UTF-8 IDNs without trouble.
My cynical view is that it is better to rewrite the applications and
protocols to support UTF-8. Until then every other language will be
second to thouse that map cleanly into ASCII. I dont see things changing
after they get implemented.
-Bill