[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] ToUnicode output can be longer than input

To: "IETF idn working group" <idn@ops.ietf.org>
Subject: Re: [idn] ToUnicode output can be longer than input
From: "Edmon Chung" <edmon@neteka.com>
Date: Fri, 25 Apr 2003 16:35:10 -0400

Hi Adam,

----- Original Message -----
From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>
> > > x n - - fi fi - a ffl u e n t - s o u ffl - viii - u i c
> > >
> > > The spaces are not really there, they just indicate the clusters,
which
> > > represent single code points (ligatures and roman numerals: U+FB01,
> > > U+FB04, U+2177).  That's 24 code points.
> >
> > If I counted it correctly, there are 33 "codepoints" in the above ACE
> > string.
>
> fi represents one code point (U+FB01), ffl represents one code point
> (U+FB04), and viii represents one code point (U+2177).  Now if you count
> again, you should count 24.  I'm trying to describe a non-ASCII ACE
> string containing 24 code points, some of which are ASCII and some of
> which are compatibility characters.
>

I understand, your intent, however I think it would be better to find an
example that is a valid Punycode string that when ToUnicode is performed
will exceed the number of codepoints of the original.  Right now, the ACE
string provided is not valid because it contains characters beyond A-z,
0-9, -.

Edmon

Follow-Ups:
- Re: [idn] ToUnicode output can be longer than input
  - From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>

Prev by Date: Re: [idn] ToUnicode output can be longer than input
Next by Date: Re: [idn] ToUnicode output can be longer than input
Previous by thread: Re: [idn] ToUnicode output can be longer than input
Next by thread: Re: [idn] ToUnicode output can be longer than input
Index(es):
- Date
- Thread