[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: Fwd: Unicode letter ballot
Kenneth Whistler <kenw@sybase.com> wrote:
> And you can't escape the problem by just adding the 5 obsolete code
> points to the stringprep prohibited list,
True, but...
> because that, *too*, would have destabilized your specification: a
> string that was valid before you did that would be invalid after you
> did so.
Actually, adding compatibility characters to the Stringprep prohibited
list would have absolutely no effect, because Stringprep performs
normalization before prohibition.
Let's consider the possible scenarios:
1. The decomposition mappings are changed.
1a. Stringprep/Nameprep track the update, breaking their promise of
backward compatibility.
If someone registers a name using a CNS 11643 string in
combination with the old Nameprep, and later someone tries to look
up the name using the very same CNS 11643 string in combination
with the new Nameprep, it won't match (if it contains any of the
five characters in question). As more clients upgrade to the new
Nameprep, the name will become less and less accessible.
But the old-Nameprepped form of the name (the one that actually
got stored in the database) will look visibly wrong, won't it? If
the registrant had been shown the Nameprepped form and asked for
confirmation, the registration would probably have been aborted.
So maybe this lookup failure would turn out to never happen in
practice.
On the other hand, if someone registers a name using a CNS 11643
string in combination with the new Nameprep, and later someone
tries to look up the name using the very same CNS 11643 string in
combination with the old Nameprep, it won't match. But this time
the Nameprepped form shown to the registrant will look correct, so
names causing this failure might be more likely to be registered
than names causing the previous failure. For this failure, as
more clients get upgraded, the name will become more and more
accessible.
1b. Stringprep/Nameprep do not track Unicode updates; they remain
frozen at a version containing the old mappings.
If someone registers a name using a CNS 11643 string, and later
someone tries to look up the name using the very same CNS 11643
string, it will match. But if a modern normalization operation
gets inserted somewhere (for the heck of it, or because some
other protocol that carries the domain name requires text to be
normalized), it won't match.
Names using the broken compatibility characters might not be
registered in practice, because the Nameprepped form will look
wrong.
Since Nameprep never changes again, there is no transition as
software gets upgraded. Names involving the five characters in
question remain just as broken and undesirable in the far future
as in the near future. It is likely that these five characters
would never be used in domain names in practice, forever.
1c. Stringprep/Nameprep track Unicode updates, but require the use of
NormalizationCorrections.txt to undo any changes to decompositions
since Unicode 3.2.
This is exactly like case 1b except that IDNA can take advantage of
other updates to Unicode, like new characters.
As software gets upgraded, the mappings of the five characters
in question remain the same (broken), so it is likely that these
five characters would never be used in domain names in practice,
forever, same as case 1b.
2. The decomposition mappings are not changed; the characters are
deprecated, and new characters added with the correct decompositions.
2a. Stringprep/Nameprep eventually allow the use of a version of
Unicode that that includes the added characters.
If someone registers a name using a CNS 11643 string in
combination with the old CNS-to-Unicode tables, and later someone
tries to look up the name using the very same CNS 11643 string in
combination with the new CNS-to-Unicode tables, it won't match.
As more clients upgrade to the new CNS-to-Unicode tables, the name
will become less and less accessible.
If the registrant is shown the Nameprepped form of the name, it
will look visibly wrong, so the name will probably not end up
getting registered, so this lookup failure might never happen in
practice.
If someone registers a name using a CNS 11643 string in
combination with the new CNS-to-Unicode tables, and later someone
tries to look up the name using the very same CNS 11643 string in
combination with the old CNS-to-Unicode tables, it won't match,
but this time the Nameprepped form shown to the registrant will
look correct, so names causing this failure might be more likely
to be registered than names causing the previous failure. For
this failure, as more clients get upgraded, the name will become
more and more accessible.
2b. Stringprep/Nameprep freeze on a version of Unicode that does not
include the added characters, never to be updated again.
If someone registers a name using a CNS 11643 string in
combination with the old CNS-to-Unicode tables, and later someone
tries to look up the name using the very same CNS 11643 string in
combination with the new CNS-to-Unicode tables, it won't match.
As more clients upgrade to the new CNS-to-Unicode tables, the name
will become less and less accessible.
If the registrant is shown the Nameprepped form of the name, it
will look visibly wrong, so the name will probably not end up
getting registered, so this lookup failure might never happen in
practice.
If someone tries to register a name using a CNS 11643 string in
combination with the new CNS-to-Unicode tables, it will fail,
because the new CNS-to-Unicode tables will use the new characters,
which are unassigned in the version of Unicode used by Nameprep,
and therefore disallowed in stored strings.
It is likely that these five characters would never be used in
domain names in practice, forever.
It's interesting that cases 1a and 2a are basically identical, and cases
1b and 2b are extremely similar, although they correspond to opposite
outcomes of the vote.
The biggest difference I can see between case 1 and case 2 is that if
the deprecate/add approach is taken (case 2), IDNA's only choices are
to track Unicode updates (2a) or not track them (2b), all or nothing.
But if the fix-decompositions approach is taken (case 1), IDNA has a
third option (1c) of tracking all Unicode updates except changes to
decompositions, via the NormalizationCorrections.txt file.
I find this result to be counter-intuitive. My initial gut feeling was
that the deprecate/add approach was more conservative, and safer. But
apparently not.
AMC