[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] ToUnicode output can be longer than input



Dan Oscarsson <Dan.Oscarsson@kiconsulting.se> wrote:

> > The IDNA spec disagrees.  It says:
> >
> >    This document defines internationalized domain names (IDNs)...
> >
> >    If an application wants to use non-ASCII characters in domain
> >    names, IDNA is the only currently-defined option.
> >
> >    Applications can also define protocols and interfaces that
> >    support IDNs directly using non-ASCII representations.  IDNA does
> >    not prescribe any particular representation for new protocols,
> >    but it still defines which names are valid and how they are
> >    compared.
>
> Maybe IDNs, as that is a construct of IDNA, but not domain names in
> general.  Domain names in an international context is not defined by
> IDNA,

The *representation* of domain names in an international context is not
defined by IDNA, but IDNA does define a mechanism for deciding which
names are valid in that context, and does define a way to compare names
in that context.

The IETF could produce a second set of definitions in the future, but
until that happens, the IDNA definitions provide the only standard way
to use non-ASCII characters in domain names, even in an international
context.

> I can see no reason to limit the international world due to limits of
> ASCII and the solution selected by IDNA.

The reason is backward compatibility, so that all domain names can be
accessed by all protocols.  That was the primary motivation behind the
whole IDNA approach.  Even as new protocols are introduced that can
represent IDNs directly without using ACE, old protocols will continue
to be used, and it would be a mess if some names worked with some
protocols and not with other protocols.

AMC