[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] ToUnicode output can be longer than input

To: "IETF idn working group" <idn@ops.ietf.org>
Subject: Re: [idn] ToUnicode output can be longer than input
From: "Edmon Chung" <edmon@neteka.com>
Date: Fri, 25 Apr 2003 09:20:43 -0400
References: <20030424204553.GA5014@nicemice.net>

Hi Adam,

----- Original Message -----
From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>
> For example, consider the input:
>
> x n - - fi fi - a ffl u e n t - s o u ffl - viii - u i c
>
> The spaces are not really there, they just indicate the clusters, which
> represent single code points (ligatures and roman numerals: U+FB01,
> U+FB04, U+2177).  That's 24 code points.

If I counted it correctly, there are 33 "codepoints" in the above ACE
string. (I agree with your assessment however, please see below, but the
example doesnt seem to illustrate your point...)

> The IDNA spec contains an incidental statement that was intended to be
> helpful, in section 4.2:
>
>     The ToUnicode output never contains more code points than its input.
>
> Oops, that's not true, because Nameprep can cause strings to expand.

I can understand this possibility.
Basically, if the length of the Unicode composition for one or more
characters in the string is longer than the ACE composition and the total
excess for all the characters within the string is more than 4 (compensating
the "xn--"), then the ToUnicode output will be longer than the input.

> So the statement needs to be removed or altered if/when the RFC is
> revised.  It would be correct to say that the Punycode decoder cannot
> output more code points than it inputs, but Nameprep can, and therefore
> ToUnicode can.

Seems reasonable.

Edmon

Follow-Ups:
- Re: [idn] ToUnicode output can be longer than input
  - From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>

References:
- [idn] ToUnicode output can be longer than input
  - From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>

Prev by Date: [idn] ToUnicode output can be longer than input
Next by Date: Re: [idn] ToUnicode output can be longer than input
Previous by thread: [idn] ToUnicode output can be longer than input
Next by thread: Re: [idn] ToUnicode output can be longer than input
Index(es):
- Date
- Thread