[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] ToUnicode output can be longer than input

To: IETF idn working group <idn@ops.ietf.org>
Subject: Re: [idn] ToUnicode output can be longer than input
From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>
Date: Mon, 28 Apr 2003 20:11:36 +0000
In-reply-to: <E199n6c-0004BA-00@psg.com>
References: <E199n6c-0004BA-00@psg.com>
Reply-to: IETF idn working group <idn@ops.ietf.org>
User-agent: Mutt/1.4i

Edmon Chung <edmon@neteka.com> wrote:

> Right now, the ACE string provided is not valid because it contains
> characters beyond A-z, 0-9, -.

It is valid.  Some labels are ASCII and some are not.  Some labels are
ACE and some are not.  All four combinations are possible (ASCII ACE,
ASCII non-ACE, non-ASCII ACE, non-ASCII non-ACE).

An ACE label is formally defined as a label that ToUnicode would alter.
A (valid) internationalized label is formally defined as a label to
which ToASCII can be applied without failing.  It can be shown that all
ACE labels are (valid) internationalized labels.

> I think it would be better to find an example that is a valid Punycode
> string that when ToUnicode is performed will exceed the number of
> codepoints of the original.

The Punycode decoder cannot output more code points than it inputs.

If the input of ToUnicode is ASCII, then Nameprep will not be applied,
and therefore the output of ToUnicode cannot contain more code points
than the input.  It's Nameprep that can cause strings to grow, not the
Punycode decoder.

AMC

References:
- Re: [idn] ToUnicode output can be longer than input
  - From: "Edmon Chung" <edmon@neteka.com>

Prev by Date: Re: [idn] implementations list
Next by Date: Re: [idn] ToUnicode output can be longer than input
Previous by thread: Re: [idn] ToUnicode output can be longer than input
Next by thread: Re: [idn] ToUnicode output can be longer than input
Index(es):
- Date
- Thread