[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Question about a ToUnicode step



jim@mathies.com wrote:

> I'm trying to understand the purpose behind some steps in 
> ToUnicode.  From the IDNA spec:
> 
> 
> 1. If all code points in the sequence are in the ASCII range (0..7F)
>    then skip to step 3.
> 
> 2. Perform the steps specified in [NAMEPREP] and fail if there is an
>    error. (If step 3 of ToAscii is also performed here, it will not
>    affect the overall behavior of ToUnicode, but it is not
>    necessary.) The AllowUnassigned flag is used in [NAMEPREP].
> 
> 3. Verify that the sequence begins with the ACE prefix, and save a
>    copy of the sequence.
> 
> 
> I'm curious about steps 1 & 2.  I don't understand why nameprep
> is being applied to ASCII domain labels.

Nameprep is *not* being applied to ASCII labels.  That's what step 1
does, it prevents nameprep from being applied to ASCII labels.

> Wouldn't a simple dns character compatibility check in place of steps
> 1 and 2 suffice?

ToUnicode is not only intended to be applied to DNS-compatible labels.
It is intended to be applied to any internationalized label.

Imagine an application wants to display a domain name.  The name might
contain some non-ACE ASCII labels, some ACE ASCII labels, some non-ACE
non-ASCII labels, and some ACE non-ASCII labels.  (A mixture could arise
if the name is constructed from pieces obtained from various places,
like typed user input, the clipboard, a config file, the network, etc.)
The application can simply apply ToUnicode to every label, and the
result will be an equivalent name containing no ACE labels.

By the way, if you're wondering what an ACE non-ASCII label is, the
typical example arises when an ACE label is manually typed using an
input method that outputs fullwidth characters (which are non-ASCII, but
which would be mapped to ASCII by nameprep).

AMC