[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Reality Check



Dan Oscarsson <Dan.Oscarsson@trab.se> wrote:

> What about DNS?

What about DNS?  How DNS works just doesn't matter very much, because
it's invisible to everything except the resolver, which can offer any
combination of interfaces to applications (including ASCII, ACE, UTF-8,
and the local charset) regardless of what actually appears on the wire
between the resolver and the DNS server.  We can define a way for DNS to
use UTF-8 now, or later, or never, and I really don't think it will make
much difference to anything.

What matters much more than DNS are all the other protocols that
embed domain names in messages (RFC 2822, SMTP, URIs, HTTP cookies,
SSL certificates, SSH keys, PGP keys, etc).  Maybe these should all
be upgraded to support UTF-8 domain names, and maybe they all will be
eventually, but that could take a long time.  We need ACE to be an
option (in addition to UTF-8), at least for a while.

> Why should it be quicker to implement DNS in the applications and get
> them to be updated faster?

Huh?  I don't think anyone is suggesting this.  DNS should be
implemented in the resolver, as it always has been.  Applications are
going to need to be updated to support nameprep if they want to compare
names, and they are going to need to be updated to support UTF-8 if they
want to display IDNs returned from a UTF-8 resolver or pass user-entered
names to a UTF-8 resolver.  This is true regardless of whether the
protocols (email etc.) use ACE or UTF-8.

> MIME was introduced very long time ago and still there are a lot of
> mail programs that do not handle it.

Good point.  Do you think an extension to RFC 2822 to allow UTF-8 domain
names is going to be supported much more quickly?  I wouldn't count on
it.

> What does the normal human beings of the real world expect from DNS?
> The semantics of current DNS says:
> - a name can have mixed upper and lower case letters
> - names will be matched case insensitively
> IDNA says:
> - a name can only be in lower case
> - names will be matched exactly
> So IDNA breaks current semantics of DNS.

No, IDNA does not say that.  Application protocols can transmit
names in non-nameprep'd mixed-case form.  Applications can store
and display names in non-nameprep'd mixed-case form.  The only
time it is necessary to apply nameprep is just before comparing
two names.  Nameprep is mainly just a way to do case-insensitive
alternate-represenation-insensitive comparisons.

In DNS, in order to perform case-insensitive name comparisons, the
server must perform case-folding, or it must rely on the client to do
it.  It makes no semantic difference whether it's done on the server or
the client, but doing it on the client is better for load balancing,
since case-folding expensive and there are more clients than servers.

As for the names returned from the DNS server, there are two ways to get
the full mixed-case name, either of which will work:  First, DNS could
return both the nameprep'd ACE name and the non-nameprep'd UTF-8 name,
and the application could use whichever one it prefers (IDN-unaware
applications would automatically get the nameprep'd ACE name) (maybe
there could be an optional flag in the request to tell the server not
to bother with the ACE name, as an optimization).  Second, all the ACEs
currently under consideration can represent mixed-case Unicode strings
as mixed-case ACE strings.

> Why could we not use something simple like having the entire line
> encoded in UTF-8?

We could, eventually.  But if you want to send mail to your friend with
an IDN address and also to your other friend whose ISP/employer/school
does not yet use a UTF-8-aware MTA, then you need ACE.

Martin Duerst <duerst@w3.org> wrote:

> Practical books about a programming language often contain examples
> of how to process your mail, and so on.  All these examples are based
> on the fact that the characters can be parsed directly.  Grepping for
> a word in the subject headers in a mail spool file is easy as long as
> the subject is in ASCII.  For anything beyond ASCII, it just fails.

Yes, and this argues for some other working group to define an extension
to RFC 2822 that uses UTF-8 instead of ACE and MIME encoded-strings.
And one of the first places this extension would get used would be, not
on the wire, but in mailbox files, because the interoperability issues
are less severe there.  Eventually this format might get deployed for
on-the-wire use.  This extension is already motivated by Subject fields,
but the demand hasn't been enough for the extension to appear.  Maybe
after IDNs become common the demand for this RFC 2822 extension will
become great enough, but I don't see how IDNs can get off the ground
without ACE.

In summary, I think applications should use UTF-8 when possible, but
ACE must be an option to allow interoperability with existing protocols
that require LDH characters.  It hardly matters what DNS uses.  Nameprep
should be thought of as the mechanism for comparing IDNs, analagous to
the case-insensitive ASCII comparison currently used for LDH names.

AMC