[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] UTF-8 / RACE
- To: idn@ops.ietf.org
- Subject: Re: [idn] UTF-8 / RACE
- From: "D. J. Bernstein" <djb@cr.yp.to>
- Date: 28 May 2001 05:09:43 -0000
- Delivery-date: Sun, 27 May 2001 22:09:36 -0700
- Envelope-to: idn-data@psg.com
- Mail-Followup-To: idn@ops.ietf.org
Keith Moore writes:
> gethostbyname should not be upgraded, because this would cause
> harm to applications that expect ASCII domain names.
Sorry, Keith, but the real world doesn't care about your delusions. The
current gethostbyname() release already works with UTF-8 domain names.
> UTF-8 is causing sutble bugs in software that was written to expect ASCII
Obviously those programs should be fixed as soon as possible. I've
already proposed that the WG publish a document requiring this. The only
known example is sendmail, and the fix in that case is straightforward.
> each protocol must be "fixed" by people who understand that protocol.
You're absolutely right, Keith. This WG can't handle a job as tricky as
changing ``sequence of bytes 1-127'' to ``sequence of bytes 1-255.''
Those Internet text protocols are amazingly subtle.
> UTF-8 guarantees that the data is unreadable for users of
> today's non UTF-8 software.
Programs that don't support UTF-8 are being fixed. We have, for example,
terminals and editors that work with UTF-8 text. UTF-8 is much bigger
than IDN; it is going to be available everywhere no matter what this WG
does. See http://www.cl.cam.ac.uk/~mgk25/unicode.html.
In contrast, ACE is not a general-purpose text format. Nobody is writing
an ACE version of xterm or vi. There is no movement to convert all
textual data to ACE; the idea is absurd. ACE is never going to be
readable the way that UTF-8 is.
> reliably compare two UTF-8 representations of an IDN
There is only one good (``canonical'') UTF-8 representation of an IDN.
With fast nameprep, the keyboard interface helps the user type this good
name, so applications don't have to worry about bad names.
This is just like the current handling of dots. Yes, there are bad dots,
but the keyboard interface helps the user type domain names with the
ASCII dot, so applications don't have to worry about bad dots. Are you
going to demand that we change and redeploy thousands of programs to
accept non-canonical dots in domain names?
---Dan