[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDNs do not use ASCII



"D. J. Bernstein" <djb@cr.yp.to> wrote:

> The ASCII standard specifies, for example, that the byte string
> bq--blah (i.e., \142\161\055\055\142\154\141\150) represents
> characters bq--blah.  Under a typical ACE IDN proposal, however, that
> byte string represents a different sequence of characters, and there's
> another byte string that represents the characters bq--blah.

Strings beginning with the ACE prefix are never ACE-encoded.

Here's the way I think of it now:  In the current system, FOO and
foo and Foo and fOo etc. are all equivalent.  We don't say that FOO
represents foo, or that foo represents FOO.  They are distinct ASCII
strings, but they are equivalent labels.

The new system merely expands the set of strings and expands the notion
of equivalence.  Now <lowercase-pi> and <uppercase-pi> and bq--blah
and BQ--BLAH etc. are all equivalent.  Sometimes we like to think that
bq--blah represents <lowercase-pi>, and sometimes we like to think
that <lowercase-pi> represents bq--blah, but actually it's a symmetric
equivalence relationship.  They are distinct Unicode strings, but they
are equivalent labels, just like FOO and foo above.

Having each equivalence class include at least one element containing
only ASCII characters is convenient for interoperating with legacy
software/protocols/interfaces.

AMC