[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Moving forward



C C Magnus Gustavsson <mag@lysator.liu.se> wrote:

> we need to realize that there is only one solution that we might reach
> consensus about: arch-2 or "ACE now - UTF-8 eventually".

I agree if you mean that any solution with a chance of fostering
consensus must allow ACE now and must encourage UTF-8 in the future.  I
suspect that we will be even more likely to reach consensus if we allow
UTF-8 to coexist with ACE as soon as possible, and if we don't require a
complete ACE phase-out any time soon.

How about this model:

Just as there are now multiple ways to represent domain names (ASCII,
EBCDIC, ink on paper, ...) and one especially customary way (ASCII),
there will be many ways to represent IDNs (ACE, UTF-8, UTF-16,
iso-2022-jp, koi8, ink on paper, ...) and two especially customary ways
(ACE and UTF-8).

Obviously any existing protocol that defines a hostname as a sequence
of dot-separated LDH labels has no choice but to use ACE.  New versions
of those protocols may opt to support UTF-8 in addition.  New protocols
must support UTF-8 domain names (I think that follows from an existing
IETF policy regarding UTF-8), which implies that they can also handle
ASCII names, including ACE names.

Senders must not send UTF-8 domain names unless they know that the
receiver can handle it (based on the protocol or protocol-version).
Any program that might sometimes send a domain name to a receiver that
cannot handle UTF-8 names must be prepared to perform ACE encoding.
Notice that this does not apply to new protocols, since new protocols
are required to support UTF-8 names.

Programs that send domain names to receivers known to handle UTF-8 names
may perform ACE decoding on ACEs before sending, but are not required or
even encouraged to do so (because maybe someday ACE will be rare and it
would be useless functionality).

Programs that display domain names to users should perform ACE decoding.
Maybe someday, if ACE becomes rare, this could be weakened from "should"
to "may".

DNS servers must accept both UTF-8 and ACE queries (how it accepts
UTF-8 queries is an open question).  Maybe someday DNS will be the only
protocol left that uses ACE.  If/when that happens, we can revisit the
phase-out question.

"D. J. Bernstein" <djb@cr.yp.to> wrote:

> We can, for example, agree that sendmail's destruction of 8-bit
> characters is a bug

Given that RFC 821 specifies that messages shall be 7-bit, I don't think
it's fair to characterize sendmail's handling of 8-bit characters as a
bug.  We might agree that it should change anyway, but I think that's a
matter for a different working group.

AMC