[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] comments on IDNA-05
First let me point out a typo in IDNA-05: "resolv" should be "resolve".
James Seng/Personal <jseng@pobox.org.sg> wrote:
> An ACE label that decodes to ASCII label should be by definition an
> invalid ACE.
Indeed it is. In fact, I wouldn't even call it an ACE label, I'd call
it an ASCII label that begins with the ACE prefix but is not an ACE
label. Another example would be a label that decodes to a Unicode
string that is not nameprepped.
Any existing labels that coincidentally begin with the ACE prefix are
likely to fall into this category (though they might also turn out to be
valid ACE labels).
IDNA-05 says:
> It is imperative that there be only one ASCII encoding for a
> particular host name. ACE is an encoding for host name labels that
> use non-ASCII characters. Thus, a primary master name server MUST NOT
> contain an ACE-encoded label that decodes to an ASCII label.
Hmmm, that requirement could (with small probability) cause an existing
all-ASCII name that happen to begin with the ACE prefix to become
forbidden.
This requirement does not follow from the rules in section 3, and I
don't see why it's necessary. If a master server contains a label X
beginning with the ACE prefix that decodes to an all-ASCII string Y,
then X is not an ACE label. ToUnicode will not alter X, and ToASCII
will not alter Y, and X and Y will not compare equal. They are simply
two distinct all-ASCII labels, neither of which is an ACE label (despite
the fact that one of them happens to begin with the ACE prefix).
IDNA-05 goes on to say:
> The ToASCII operation assures that no such names are ever output from
> the operation.
Oops, that is simply not true. The ToASCII operation can indeed output
such a label; simply feed it such a label as input, and it will return
it unchanged.
Paul, we need to fix this. I think a fixed version would say something
like this:
It is imperative that there be only one ASCII encoding for a
particular host name. ACE is an encoding for host name labels that
use non-ASCII characters. An ACE-encoded label X that decodes to an
all-ASCII label Y is not an alternate form of Y; in fact, X is not
the ACE form of anything, and applying ToUnicode to X will not alter
it. ToASCII will leave both X and Y unchanged, and hence X and Y
are not matching labels (by rule 3).
But this revised paragraph no longer contains any requirements, nor
does it particularly related to DNS servers, so maybe it should just be
removed.
James said:
> Don't think we should make a mistake to define zone file format. No
> reasons Zone file cant be in UTF-8 so long ToASCII/ToACE is applied
> before using it.
That's true for private zone files read by DNS server implementations.
But for master files used as an interchange format between DNS servers
there needs to be a standard format, and that's why RFC 1035 defines
one. These master files are in fact protocol messages, containing
generic domain name slots, and so rule 1 of section 3 applies (ToASCII
must be applied before insertion).
Paul, by the way, IDNA-05 says "zone files (as specified by section 5 of
RFC 1035)", but actually section 5 of RFC 1035 defines "master files",
not "zone files".
> Suggest to define ASCII Compatible Encoding (ACE) before using it.
The first appearance of "ACE" in the draft is the definition of "ACE
label" in the Terminology section.
> I could define ASCII string as a subset of ACE, therefore, ASCII is
> also an ACE. e.g. An ACE without ACE tag is a valid ACE which decodes
> to ASCII range
That's one way to define it, but not the way the IDNA draft defines it.
I think it's more useful to have a term referring to the new non-WYSIWYG
labels being introduced by IDNA, so we use the term "ACE label" for
that.
> s/ToASCII/ToACE/.
As David said, the output of ToASCII is not necessarily an ACE label.
For example, if the input is entirely ASCII, it will be unchanged by
ToASCII.
> ToASCII consists of the following steps:
>
> 1. If all code points in the sequence are in the ASCII range (0..7F)
> then skip to step 3.
>
> JS>> Step 1 seem to be optimization, but not a required step.
It avoids doing nameprep on all-ASCII labels. This is not only a
performance win, but it also prevents case-folding of all-ASCII
labels. Existing standards already recommend that case be preserved
for all-ASCII labels. This recommendation is not being extended to
internationalized labels (because nameprep does case-folding), but if
step 1 were omitted, IDNA-conformant applications would be required
to alter the way they treat all-ASCII labels, contrary to existing
recommendations.
> Suggest s/ACE prefix/ACE tag/. An "ACE tag" could be a uniquely
> defined prefix and/or suffix defined by IANA and not neccessary in the
> form of xx--.
It needs to be a prefix, not a suffix, otherwise an ACE label (using
AMC-ACE-Z) could begin with a hyphen. (The AMC-ACE-Z draft contains an
incorrect statement on this subject, which will be fixed in the next
version.)
I thought there was already consensus on the form xx--, but we could
revisit that question.
David Hopwood <david.hopwood@zetnet.co.uk> wrote:
> using a prefix means that an ACE label can end with a digit
AMC-ACE-Z always ends with a letter. (It could also end with a hyphen
if the input were all-ASCII, but IDNA forbids that.)
AMC