[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Question about a ToUnicode step

To: 'IETF idn working group' <idn@ops.ietf.org>
Subject: Re: [idn] Question about a ToUnicode step
From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>
Date: Mon, 7 Apr 2003 23:06:36 +0000
In-reply-to: <000c01c2fd56$6fca7830$2dbc0e44@0202012>
References: <20030407214211.GE7147@nicemice.net> <000c01c2fd56$6fca7830$2dbc0e44@0202012>
Reply-to: IETF idn working group <idn@ops.ietf.org>
User-agent: Mutt/1.4i

jim@mathies.com wrote:

> I'm trying to understand the purpose behind some steps in 
> ToUnicode.  From the IDNA spec:
> 
> 
> 1. If all code points in the sequence are in the ASCII range (0..7F)
>    then skip to step 3.
> 
> 2. Perform the steps specified in [NAMEPREP] and fail if there is an
>    error. (If step 3 of ToAscii is also performed here, it will not
>    affect the overall behavior of ToUnicode, but it is not
>    necessary.) The AllowUnassigned flag is used in [NAMEPREP].
> 
> 3. Verify that the sequence begins with the ACE prefix, and save a
>    copy of the sequence.
> 
> 
> I'm curious about steps 1 & 2.  I don't understand why nameprep
> is being applied to ASCII domain labels.

Nameprep is *not* being applied to ASCII labels.  That's what step 1
does, it prevents nameprep from being applied to ASCII labels.

> Wouldn't a simple dns character compatibility check in place of steps
> 1 and 2 suffice?

ToUnicode is not only intended to be applied to DNS-compatible labels.
It is intended to be applied to any internationalized label.

Imagine an application wants to display a domain name.  The name might
contain some non-ACE ASCII labels, some ACE ASCII labels, some non-ACE
non-ASCII labels, and some ACE non-ASCII labels.  (A mixture could arise
if the name is constructed from pieces obtained from various places,
like typed user input, the clipboard, a config file, the network, etc.)
The application can simply apply ToUnicode to every label, and the
result will be an equivalent name containing no ACE labels.

By the way, if you're wondering what an ACE non-ASCII label is, the
typical example arises when an ACE label is manually typed using an
input method that outputs fullwidth characters (which are non-ASCII, but
which would be mapped to ASCII by nameprep).

AMC

References:
- Re: [idn] Challenge: longest UTF-8 with valid domain name
  - From: "Adam M. Costello" <idn.amc+0@nicemice.net.RemoveThisWord>
- [idn] Question about a ToUnicode step
  - From: <jim@mathies.com>

Prev by Date: [idn] Question about a ToUnicode step
Next by Date: [idn] ToUnicode output can be longer than input
Previous by thread: [idn] Question about a ToUnicode step
Next by thread: [idn] ToUnicode output can be longer than input
Index(es):
- Date
- Thread