[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Just send UTF-8 with nameprep (was: RE: [idn] Reality Check)



Martin Duerst <duerst@w3.org> wrote:

> I'm very sure browser vendors prefer UTF-8 with nameprep over ACE with
> nameprep.

If that's true, then they can opt to rely on updated HTML and HTTP
standards that support UTF-8 domain names (using nameprep to compare
them), deploy browsers and servers that conform to the new standards,
and not bother with ACE at all.  (Of course, browsers that include mail
user agents would also need to rely on updated standards for message
headers and SMTP.)  The only time they would need to use ACE would be
to use a legacy resolver, but if they can convince the operating system
providers to add a UTF-8 API to the resolver then they don't have to use
ACE anywhere.

This argument applies to any application or protocol.  ACE is there
if you want to interoperate with 7-bit protocols/formats, but if you
don't want to, then don't, just go ahead and stick to your new 8-bit
protocols.

There are obviously people here who think ACE is important, and
people who think nameprep is important, and people who think UTF-8 in
application protocols is important, and people who think UTF-8 in DNS
is important.  There are obviously people who think other people are
wasting their time on some of these.

Perhaps we should stop trying to convince people that what they're
working on is a waste, and just get out of each other's way.  We can
end up with a standard for each of the above four items, and they need
not all be ready at the same moment.  As an implementor or application
protocol designer, if you think one of the standards is worthless to
you, then don't use it.  (Except for nameprep--I think everyone needs to
use that one.)

AMC

P.S.

> Not an ACE, but probably worth mentioning anyway:
> 
> UTF-8       21

UTF-8 has a slight edge for Han (AMC-ACE-Z can almost always fit at
least 19 Han characters) and Latin, but it's terrible for Indian scripts
(AMC-ACE-Z can fit about twice as many characters) and pretty bad for
other non-Latin alphabetic scripts (AMC-ACE-Z can fit about 1.5 to 1.7
times as many characters).