[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Reality Check



"D. J. Bernstein" <djb@cr.yp.to> wrote:

> Are you saying that there are mail readers today that can't reply to a
> UTF-8 address?  Please identify the specific software bugs that you're
> talking about.

I'm not talking about any bugs.  I'm talking about software that
conforms to the standards.

> As far as I know, there are only two relevant bugs anywhere in the
> entire mail path: the UNIX gethostbyname() 8-bit bug, which in
> current versions is trivially worked around with a one-line change to
> /etc/resolv.conf, and the sendmail 8-bit bug.

Out of dozens (if not hundreds) of mail user agents, and dozens of mail
transfer agents, and various resolver and DNS server implementations,
you think there are only two that rely on hostnames being what the
standard says they are?

I think that's very unlikely, but suppose these were indeed the only
two.  Then the only people who can't reply to my IDN address are UNIX
users who haven't customized their /etc/resolv.conf and anyone whose
mail would have to pass through sendmail on its way to me.  Hmmm, if I
care about getting replies, I think I'd stick with ASCII names rather
than stuff UTF-8 into message headers.  But with ACE, everyone can
reply, even people behind "buggy" (standard-conformant) software, so I'd
use that.

> I believe that the IDN WG has consensus on the following point: ``The
> long-term IDN solution will encode Unicode characters as UTF-8 on the
> wire.'' I find the other possibilities absurd, and I am dismayed at
> the amount of time that has been wasted discussing them on the IDN
> list.

I think we have consensus on "New protocols should use UTF-8 for
hostnames."  We might have strong support for "Existing application
protocols should try to migrate toward UTF-8."  I don't think there is
anything approaching consensus on "ACE will eventually be eradicated
from all protocols."

Here is one end-game that I don't find absurd:  UTF-8 everywhere
except in DNS messages, which use ACE.  DNS already uses a different
representation of domain names than everything else does:  Everything
else uses dots to separate the labels, while DNS does not, so the
resolver already has to translate back and forth.  There are very few
things that speak DNS directly (like DNS servers, resolvers, and MTAs)
and everything else could be oblivious to ACE.

Why don't I think it's important to eliminate ACE from DNS?  Because
I don't think it's even practical.  We wouldn't be able to remove ACE
support from DNS until all other message/file formats support UTF-8 IDNs
and all ACEs in all data files have been decoded.  Even if that ever
came to pass, how would we know?

Why don't I think it's important to add a UTF-8 option to DNS?  Because
ACE gets the job done, and adding a UTF-8 option would increase the
complexity of DNS servers (which would then have to handle both UTF-8
and ACE) while helping only a few entities (the ones that speak DNS
directly).  While many applications might appreciate a UTF-8 resolver
interface, most have no way to know how DNS messages are encoded, and no
reason to care.

AMC