[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] ToUnicode output can be longer than input



Edmon Chung <edmon@neteka.com> wrote:

> > x n - - fi fi - a ffl u e n t - s o u ffl - viii - u i c
> >
> > The spaces are not really there, they just indicate the clusters, which
> > represent single code points (ligatures and roman numerals: U+FB01,
> > U+FB04, U+2177).  That's 24 code points.
> 
> If I counted it correctly, there are 33 "codepoints" in the above ACE
> string.

fi represents one code point (U+FB01), ffl represents one code point
(U+FB04), and viii represents one code point (U+2177).  Now if you count
again, you should count 24.  I'm trying to describe a non-ASCII ACE
string containing 24 code points, some of which are ASCII and some of
which are compatibility characters.

AMC