[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] UNIX moving to UTF-8



I am not at all arguing against using an ACE. I agree that if the particular
ACE in question is well-defined, and fits DNS constraints better than UTF-8,
it is perfectly appropriate to use it.

All I meant to point out is that
(a) Unicode is in wide, and rapidly increasing use,
(b) the particular form of Unicode that is used depends on the environment,
it is not exclusively UTF-8.

Mark

----- Original Message -----
From: "Keith Moore" <moore@cs.utk.edu>
To: "Mark Davis" <mark@macchiato.com>
Cc: "Eric A. Hall" <ehall@ehsco.com>; "Keith Moore" <moore@cs.utk.edu>; "D.
J. Bernstein" <djb@cr.yp.to>; <idn@ops.ietf.org>
Sent: Thursday, January 25, 2001 21:45
Subject: Re: [idn] UNIX moving to UTF-8


> > People are clearly moving to Unicode. Exactly which UTF they choose (8,
16,
> > 32) is not as important, since they all can be converted to each other
very
> > efficiently and without loss.
>
> And since an ACE is just another encoding of Unicode, you can add ACE to
> that set.
>
> > It is however an overstatement to say that all environments are headed
> > towards UTF-8.
>
> And if there's not a single preferred encoding of Unicode that's being
> widely supported, using ACE for IDNs makes about as much sense as
anything.
> The fact that ACE doesn't happen to use the 0x80 bit doesn't strike me
> as a particularly good reason to rule it out  - especially when (for a
> carefully-designed ACE) the encoding will often be more efficient than
> either UTF-16 or UTF-8.
>
> If UTF-8 is optimized for ASCII-compatibility, the C language (so
> that for instance NUL-terminated strings still work), and to
> minimize the amount of state that must be maintained while decoding;
> ACE can be optimized for space-efficiency, compatibility with protocols
> that were designed for ASCII-only DNS names, and (with nameprep)
> ease of comparison.  It's not as if one is right and the other wrong.
> They are different encodings to accomodate different sets of
> transition issues.
>
> Keith


...

> A huge number of people in non-English speaking countries are using
Unicode,
> *and don't know it*. Anyone using Windows NT/2000, or Microsoft Office is,
> as well as many other products. Many websites use Unicode internally, and
> just convert to the user's codepage, etc.

sure.  but just because someone is using microsoft office that happens
to use some encoding of Unicode doesn't mean they can use it to edit
a "plain text" config file that expects IDNs to be in UTF-8 format.