[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] time to move



Eric Brunner-Williams in Portland Maine <brunner@nic-naa.net> wrote:

> I'm still surprised that ACE could be considered as anything other
> than a "temporary", and controversial fix

Well, sometimes good ideas are surprising. :)  I admit that when I first
heard about ACE, it struck me as an ugly hack.  But once I realized what
an interoperability nightmare it would be to have domain names with
no LDH representation, my opinion changed.

New and updated protocols are free to support UTF-8 if they wish, but
the ACE representation will need to continue to work for a long time,
because domain names can pass through old protocols and old software
along the way.  What compelling reason is there to set a deadline beyond
which ACEs will stop working?

> Thus far, the router, host, link, and even last-mile vendors haven't
> stopped them with stakes, nooses, or other bits of constructive
> criticism, and argued forcefully for "efficiency".

I don't think there is any shortage of bytes on links or in routers.
It's only in domain name labels where there's a limited supply.

> True, the CNNIC presentation and oral comments at -50 did cite label
> octet limits as inconvenient when the encoding is utf8, but they
> obviously made a choice -- an engineering trade-off, and rolled out a
> utf8 architecture.

UTF-8 is not bad for Chinese.  21 Chinese characters is plenty of room,
because there's a lot of information per character (most words are two
characters).  But UTF-8 would impose the same 21-character limit on most
Indian and southeast Asian scripts, and those characters are comparable
to Latin letters in terms of information content--it takes several of
them to make a word or a name.

The greatest thing about UTF-8 is its compatibility with ASCII.  But in
the case of IDN, if you care about compatibility, you need ACE.  If you
don't care about compatibility, and you segregate the IDNs from the LDH
domain names, then UTF-16 is probably better than UTF-8, because then
everyone would get 31 characters per label, which even I would concede
ought to be enough.  I wonder why CNNIC didn't go with UTF-16.  Maybe
they don't plan to segregate the IDNs from the LDH domain names, and
are counting on software that uses UTF-8 internally to "just work" by
accident.  Of course, some things will just break, too.

AMC