[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] ToUnicode output can be longer than input

To: idn@ops.ietf.org
Subject: Re: [idn] ToUnicode output can be longer than input
From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>
Date: Thu, 1 May 2003 01:57:27 +0000
In-reply-to: <200304301221.h3UCLOt7008508@valinor.malmo.kicore.net>
References: <200304301221.h3UCLOt7008508@valinor.malmo.kicore.net>
Reply-to: IETF idn working group <idn@ops.ietf.org>
User-agent: Mutt/1.4i

Dan Oscarsson <Dan.Oscarsson@kiconsulting.se> wrote:

> IDNA defines a way to compare names in ASCII context because it
> requires names to be in IDNA ACE format.

It requires names to be in ASCII format in IDN-unaware contexts.  It
does not require names to be in ASCII format when they are compared.  It
says:

    Whenever two labels are compared, they MUST be considered to match
    if and only if they are equivalent, that is, their ASCII forms
    (obtained by applying ToASCII) match using a case-insensitive ASCII
    comparison.

That doesn't say you must compare the ASCII forms, it says you must
reach the same answer as if you compared the ASCII forms.  And the rule
doesn't say it applies only in certain contexts, it says "whenever"
two labels are compared.  If that's not clear enough, the point is
underscored in the introduction:

    Applications can also define protocols and interfaces that support
    IDNs directly using non-ASCII representations.  IDNA does not
    prescribe any particular representation for new protocols, but it
    still defines which names are valid and how they are compared.

> Comparing names in an international context must be done using UCS
> characters directly.

I assume that's your opinion.  It's certainly not a requirement of any
standard.

IDNA allows you to perform the comparison any way you
like, provided you get the right answer.  For example,
given any two valid internationalized labels X and Y,
tolower(ToASCII(X)) == tolower(ToASCII(Y)) iff nameprep(ToUnicode(X)) ==
nameprep(ToUnicode(Y)), so you can use either form as a canonical form
for comparisons.

AMC

References:
- Re: [idn] ToUnicode output can be longer than input
  - From: Dan Oscarsson <Dan.Oscarsson@kiconsulting.se>

Prev by Date: Re: [idn] ToUnicode output can be longer than input
Next by Date: Re: [idn] IDN's with any ASCII character
Previous by thread: Re: [idn] ToUnicode output can be longer than input
Next by thread: RE: [idn] IDN's with any ASCII character
Index(es):
- Date
- Thread