[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] comments on IDNA-04
David Hopwood <david.hopwood@zetnet.co.uk> wrote:
> An ACE label means a type 00 domain label that consists of the ACE tag
> and an output of the ACE encoding algorithm.
Not quite. The simplest fully-precise definition is: An ACE label is a
label that gets altered when ToUnicode is applied to it.
The definition in the IDNA draft preceeds the definition of ToUnicode,
so it's a little more vague: An ACE label is a label that contains only
ASCII characters but represents a label containing non-ASCII characters.
Actually, that definition is a tad sloppy regarding equivalence of
Unicode strings. For example, if X is an all-ASCII ACE label, then the
fullwidth equivalent of X is also an ACE label, even though it's not
all-ASCII. If Y is a simple all-ASCII label (without the ACE prefix),
and Z is the fullwidth equivalent of Y, then Y might appear to satisfy
the definition (Y is an all-ASCII label representing Z, which contains
non-ASCII characters), but Y is not an ACE label.
So what the definition is trying to say is something like: An ACE label
is an ASCII (or ASCII-equivalent) label that represents a non-ASCII (and
non-ASCII-equivalent) label. But ASCII-equivalence would be defined in
terms of nameprep, so we can't very well put that in the Terminology
section.
> In fact the ToASCII algorithm appears to be incorrect, i.e. not what
> was intended, for two reasons:
> - by calling nameprep, it disallows general domain names (i.e. not
> hostnames) that contain both octets >= 0x80 and octets that
> represent non-LDH ASCII.
Currently, nameprep includes one of the host restrictions (the
prohibition of non-LDH ASCII characters) but not the other one (the
prohibition of leading/trailing hyphens). I have been arguing for the
removal of the ASCII characters from nameprep's prohibition table. The
way it is now, if you want to use nameprep with the host restrictions,
it doesn't get the job done, and if you want to use nameprep without
the host restrictions, you can't. ToASCII provides a place (step 3,
immediately following nameprep) where the full host restrictions can be
applied (if applicable).
> The only way to specify case annotation rigourously is for the case
> bit of each character to be an output of nameprep.
The only way to *implement* mixed-case annotation so that it records
the case of the original string is for the case flags to be output from
the nameprep implementation. Yes indeed. There's no way around that,
because nameprep can alter the length of the string. (Of course an
implementation of nameprep can compute and output extra information
while still conforming to the nameprep spec.)
However, I think mixed-case annotation is *specified* rigorously in the
AMC-ACE-Z spec. It tells exactly what the annotations mean, in terms
of what they are asking the decoder to do. There are no requirements
for creating the annotations; they do not necessarily record the
original case; they record recommendations for how the string should be
displayed. Presumably, in most cases, encoders will want to recommend
that the string to be displayed in a way that looks identical to the
original string. But that is not required, and no particular method for
creating annotations to accomplish that goal is required.
AMC