[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] ToUnicode output can be longer than input



Dan Oscarsson <Dan.Oscarsson@kiconsulting.se> wrote:

> IDNA defines a way to compare names in ASCII context because it
> requires names to be in IDNA ACE format.

It requires names to be in ASCII format in IDN-unaware contexts.  It
does not require names to be in ASCII format when they are compared.  It
says:

    Whenever two labels are compared, they MUST be considered to match
    if and only if they are equivalent, that is, their ASCII forms
    (obtained by applying ToASCII) match using a case-insensitive ASCII
    comparison.

That doesn't say you must compare the ASCII forms, it says you must
reach the same answer as if you compared the ASCII forms.  And the rule
doesn't say it applies only in certain contexts, it says "whenever"
two labels are compared.  If that's not clear enough, the point is
underscored in the introduction:

    Applications can also define protocols and interfaces that support
    IDNs directly using non-ASCII representations.  IDNA does not
    prescribe any particular representation for new protocols, but it
    still defines which names are valid and how they are compared.

> Comparing names in an international context must be done using UCS
> characters directly.

I assume that's your opinion.  It's certainly not a requirement of any
standard.

IDNA allows you to perform the comparison any way you
like, provided you get the right answer.  For example,
given any two valid internationalized labels X and Y,
tolower(ToASCII(X)) == tolower(ToASCII(Y)) iff nameprep(ToUnicode(X)) ==
nameprep(ToUnicode(Y)), so you can use either form as a canonical form
for comparisons.

AMC