[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] time to move



Eric Brunner-Williams in Portland Maine <brunner@nic-naa.net> wrote:

> > Please remember that DNS is not the problem.  DNS is just one of
> > many protocols that embed domain names.  Others include message
> > headers,
>
> How does dns embed domain names?

Domain names appear in message headers, SMTP commands, URIs, ssh keys,
SSL certificates, and DNS requests and responses.  All I meant was that
DNS is just one of many protocols that exchange messages containing
domain names.  And virtually all such protocols (except DNS, actually)
assume that domain names contain only LDH characters and dots.  What
happens in these protocols if domain names contain other characters?  I
think most of the protocol specifications either don't say (in which
case implementations might do anything) or they say it's an error.

> Is the resolution question (assuming it still exists) equivalent to
> the transparent to code-set-dependent apps question?

I'm not exactly sure what you mean by either one, but they sound
different.  Resolving IDNs is going to be pretty easy no matter what
we do, because DNS is pretty versatile.  ACE makes resolution trivial,
but I do *not* see that as an important argument in favor of ACE.  The
argument for ACE is that it allows IDNs to flow through all the untold
existing protocols and software unharmed.

> I fail to see the point of your second note.

Which was:

    Also note that if ACE is a transition to UTF-8, then there are
    really two transitions: one from LDH-only to ACE, and another from
    ACE to UTF-8.  If ACE is seen as the solution, then there is only
    one transition.

Is the point not obvious?  Transitions are typically difficult and
disruptive.  Two transitions are twice as difficult and disruptive as
one.

> The issue of encoding efficiency is interesting.
> [long defense of UTF-8, XML, and inefficient encodings in general]

I did not mean to imply that UTF-8 is bad in general, or that encoding
efficiency is important in general.  UTF-8 has many amazingly wonderful
properties: all ASCII strings are UTF-8 strings, substrings cannot be
misinterpreted, lexicographic sorting of code points is preserved, and
no look-ahead is required for decoding.  I am very impressed by the
elegance of UTF-8, and would endorse it for almost any situation where
you want to use a single character encoding.

Encoding efficiency is usually not a top concern.  Bandwith grows, disks
grow.  Making things easy and versatile is usually more important than
saving bits.

But in this case, we have a hard limit of 63 bytes per label.  That's
why encoding efficiency matters in this case.

Although I suppose, if you already expect to update every protocol and
every piece of software that uses domain names, you might as well remove
the 63-byte limit.  In that case, I would have no problem with UTF-8.
But I'm extremely skeptical of that whole approach.

AMC