[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Document Status?
"JFC (Jefsey) Morfin" <jefsey@jefsey.com> wrote:
> I must say that with my limited French speaking IQ I tried to figure
> out the meaning of "ACE" in the proposed text: sorry, but I was
> totally unable to grasp it.
It is defined in the terminology section:
An "internationalized label" is a label composed of characters from
the Unicode character set; note, however, that not every string of
Unicode characters can be an internationalized label.
That much is clear, yes?
To allow internationalized labels to be handled by existing
applications, IDNA uses an "ACE label" (ACE stands for ASCII
Compatible Encoding), which can be represented using only ASCII
characters but is equivalent to a label containing non-ASCII
characters.
In other words, internationalized labels can contain non-ASCII
characters, which can't be handled directly by existing applications
that expect domain labels to be ASCII. Therefore, we instead use an
"ACE label", which is an ASCII label that is equivalent to a non-ASCII
label.
More rigorously, an ACE label is defined to be any label that the
ToUnicode operation would alter.
That one sentence is the full and exact rigorous definition of the term
"ACE label". The rest of the explanation is there only to provide
intuition.
For every internationalized label that cannot be directly
represented in ASCII, there is an equivalent ACE label. An ACE
label always begins with the ACE prefix defined in section 5.
Those are clear, yes?
By the way, the notion of "equivalent label" is also defined in the
terminology section:
In IDNA, equivalence of labels is defined in terms of the ToASCII
operation, which constructs an ASCII form for a given label. Labels
are defined to be equivalent if and only if their ASCII forms
produced by ToASCII match using a case-insensitive ASCII comparison.
Traditional ASCII labels already have a notion of equivalence: upper
case and lower case are considered equivalent. The IDNA notion of
equivalence is an extension of the old notion. Equivalent labels in
IDNA are treated as alternate forms of the same label, just as "foo"
and "Foo" are treated as alternate forms of the same label.
Is that clear enough?
Getting back to "ACE", maybe some examples would help:
The Japanese phrase <sono><supiido><de> (pretend I wrote it using kana,
which are non-ASCII characters) could be an internationalized label. It
is not an ACE label, because it cannot be represented in ASCII. If you
feed it to ToUnicode, it will not be altered, because the check for the
ACE prefix will fail.
There exists a label equivalent to <sono><supiido><de> that can be
represented in ASCII, namely IESG--d9juau41awczczp (where IESG-- means
the ACE prefix, whatever is eventually chosen). This is an ACE label
because it can be represented in ASCII and it is equivalent to a label
containing non-ASCII characters. If you feed IESG--d9juau41awczczp to
ToUnicode, it will be altered (it will become <sono><supiido><de>).
The label helloworld is not an ACE label, because it is not equivalent
to any non-ASCII label. If you feed it to ToUnicode, it will not be
altered, because the check for the ACE prefix will fail.
Those are the three normal cases. There are also a few corner cases,
labels that begin with the ACE prefix but are not ACE labels:
The label IESG--foo-bar-2 is not an ACE label, even though it begins
with the ACE prefix, because it is not equivalent to any non-ASCII label
(because the Punycode part is invalid). If you feed it to ToUnicode, it
will not be altered, because the Punycode decoding step will fail.
The label IESG--3ba is not an ACE label, even though it begins with the
ACE prefix and the Punycode part is valid, because it is not equivalent
to any non-ASCII label (because it is not nameprepped; it decodes to a
capital A with grave accent). If you feed it to ToUnicode, it will not
be altered, because the comparison in step 7 will fail.
AMC