[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] UTF-8 / RACE
- To: "D. J. Bernstein" <djb@cr.yp.to>
- Subject: Re: [idn] UTF-8 / RACE
- From: Keith Moore <moore@cs.utk.edu>
- Date: Mon, 28 May 2001 01:56:09 -0400
- cc: idn@ops.ietf.org
- Delivery-date: Sun, 27 May 2001 22:56:26 -0700
- Envelope-to: idn-data@psg.com
> Keith Moore writes:
> > gethostbyname should not be upgraded, because this would cause
> > harm to applications that expect ASCII domain names.
>
> Sorry, Keith, but the real world doesn't care about your delusions.
And it regards your delusions as the irrefutable truth?
I'm not interested in whose delusions the world cares about, I'm
interested in what works well.
> The current gethostbyname() release already works with UTF-8 domain names.
delusion #1.
In the current DNS world, neither the client nor the server implements
any kind of name canonicalization before comparing two names. thus
there is no way for any current implement gethostbyname () to "work"
for any useful definition of "work".
> > UTF-8 is causing sutble bugs in software that was written to expect ASCII
>
> Obviously those programs should be fixed as soon as possible.
obviously. and everyone should upgrade all of their software immediately,
regardless of whether the people involved are actually likely to use
IDNs, because Dan said so. (delusion #2 is that this is going to happen)
> I've already proposed that the WG publish a document requiring this.
as if that were in this WG's purview (delusion #3)
> The only known example is sendmail,
(delusion #4)
and the fix in that case is straightforward.
(delusion #5)
> > each protocol must be "fixed" by people who understand that protocol.
>
> You're absolutely right, Keith. This WG can't handle a job as tricky as
> changing ``sequence of bytes 1-127'' to ``sequence of bytes 1-255.''
(delusion #6 is that the change is as simple as making the tool 8-bit
transparent.)
> > UTF-8 guarantees that the data is unreadable for users of
> > today's non UTF-8 software.
>
> Programs that don't support UTF-8 are being fixed.
delusion #7. yes, more and more programs support UTF-8, but that doesn't
support a generalization that "programs...are being fixed". the programs
that are being upgraded might or might not be the ones that people
want to use. After all, people are still using VMS MAIL and ucbmail.
> We have, for example,
> terminals and editors that work with UTF-8 text. UTF-8 is much bigger
> than IDN; it is going to be available everywhere no matter what this WG
> does.
wow...you actually made a defensible statement. I was starting to worry.
yes, UTF-8 is a very useful encoding that will eventually be supported
on all platforms. that does not mean it will replace all other encodings,
nor that all applications will support it.
> In contrast, ACE is not a general-purpose text format. Nobody is writing
> an ACE version of xterm or vi. There is no movement to convert all
> textual data to ACE; the idea is absurd. ACE is never going to be
> readable the way that UTF-8 is.
>
> > reliably compare two UTF-8 representations of an IDN
>
> There is only one good (``canonical'') UTF-8 representation of an IDN.
> With fast nameprep, the keyboard interface helps the user type this good
> name, so applications don't have to worry about bad names.
let's see - so you reject the approach that the IDN portions of a
plain UTF-8 text file can be encoded differently than the rest of
the text file (as if plain text files were a dominant text format
anymore), but you somehow think that everybody's keyboard interface
should have a separate IDN input mode.
nothing in the ACE approach suggests that IDNs in plain text
should be in ACE format. the only constraint is that they be
nameprepped and converted to ACE format before they are passed
as protocol elements.
> This is just like the current handling of dots. Yes, there are bad dots,
> but the keyboard interface helps the user type domain names with the
> ASCII dot, so applications don't have to worry about bad dots. Are you
> going to demand that we change and redeploy thousands of programs to
> accept non-canonical dots in domain names?
no. I'm just going to demand that we don't break existing applications
by sending them UTF-8 domains when they quite reasonably expect ASCII.
Keith