[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] ToUnicode output can be longer than input



Hi Adam,

----- Original Message -----
From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>
> > > x n - - fi fi - a ffl u e n t - s o u ffl - viii - u i c
> > >
> > > The spaces are not really there, they just indicate the clusters,
which
> > > represent single code points (ligatures and roman numerals: U+FB01,
> > > U+FB04, U+2177).  That's 24 code points.
> >
> > If I counted it correctly, there are 33 "codepoints" in the above ACE
> > string.
>
> fi represents one code point (U+FB01), ffl represents one code point
> (U+FB04), and viii represents one code point (U+2177).  Now if you count
> again, you should count 24.  I'm trying to describe a non-ASCII ACE
> string containing 24 code points, some of which are ASCII and some of
> which are compatibility characters.
>

I understand, your intent, however I think it would be better to find an
example that is a valid Punycode string that when ToUnicode is performed
will exceed the number of codepoints of the original.  Right now, the ACE
string provided is not valid because it contains characters beyond A-z,
0-9, -.

Edmon