[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] ToUnicode output can be longer than input



Dan Oscarsson <Dan.Oscarsson@kiconsulting.se> wrote:

> > An ACE label is formally defined as a label that ToUnicode would alter.
> > A (valid) internationalized label is formally defined as a label to
> > which ToASCII can be applied without failing.  It can be shown that all
> > ACE labels are (valid) internationalized labels.
> 
> No that is wrong.
> 
> A IDNA ACE label is defined as above.  Not ACE in general.

When I said "ACE label" above I was obviously talking about an IDNA ACE
label, not some more general ACE concept.  I had started out talking
about whether ToUnicode can output more code points than it inputs,
and Edmon questioned whether the example I gave was a valid ACE.  He
obviously meant IDNA ACE, and my statement above was explaining why it
was in fact a valid IDNA ACE.

I think the term ACE was introduced before I arrived on this mailing
list, so I have no memory of its origin, but I do remember the group
deciding that after a particular ACE was selected, it would be known
simply as "ACE".  Now that IDNA is a proposed standard, I would argue
that "ACE" means "IDNA ACE".

> There are several domain names that will fail when ToASCII is used,
> but are still domain names.  They just cannot be handled by IDNA.

There are non-text domain labels to which ToASCII cannot even be
applied, because ToASCII can be applied only to labels that are text.
But among text labels, ToASCII defines which ones are valid.

> IDNA do not define the world and do not define the basic semantics
> of ACE or domain names with non-ASCII characters.  IDNA does only
> define a way to encode domain names so they can be sent over lagacy
> ASCII DNS protocol.  It does not define what domain names work in an
> international context.

The IDNA spec disagrees.  It says:

    This document defines internationalized domain names (IDNs)...

    If an application wants to use non-ASCII characters in domain names,
    IDNA is the only currently-defined option.

    Applications can also define protocols and interfaces that support
    IDNs directly using non-ASCII representations.  IDNA does not
    prescribe any particular representation for new protocols, but it
    still defines which names are valid and how they are compared.

AMC