[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] IDNA problem statement
Back in May when I sent AD review comments on the IDNA specification
I raised a desire to make the problem IDNA claims to solve more clear
and some relatively recent discussion on the IDN mailing list
has re-emphasized the need for this. I think the text below is
not very controversial in that it tries to summarize what IDNA
does and what it does not. If you see a particular problem with the
text please suggest alternative text.
The intent is to add this text to draft-ietf-idn-idna in section 1.
Erik
1.1 Problem Statement
The IDNA specification solves the problem of extending the repertoire
of characters that can be used in domain names to include the Unicode
repertoire.
IDNA does not extend the service offered by DNS to the applications. Instead,
the applications (and, by implication, the users) continue to see an
exact-match lookup service. Either there is a single exactly-matching name
or there is no match. This model has served the existing applications well,
but it requires that users know the exact spelling of the domain names
that the users type into applications such as web browsers and mail user
agents. The introduction of the larger repertoire of characters potentially
makes the set of misspellings larger, especially given that in some
cases the same appearance, for example on a business card, might visually match
several Unicode code points or several sequences of code points.
IDNA allows the graceful introduction of IDNs not only by avoiding
upgrades to existing DNS infrastructure (servers, caches, stub resolvers),
but also by allowing some rudimentary use of IDNs in applications by
using the ASCII representation of the non-ASCII name labels.
While such names are very user-unfriendly to read and type, and hence are
not suitable for user input, they allow (for instance) replying to email
and clicking on URLs even though the domain name displayed
is incomprehensible to the user. In order to allow user-friendly
input and output of the IDNs, the applications need to be modified to
conform to this specification.
IDNA uses the Unicode character repertoire which avoids the significant
delays that that would be inherent in waiting for a different
and specific character set be defined for IDN purposes by some other
standards developing organization.
1.2 Limitations of IDNA
<EXISTING section 6.6 moved to here>
The IDNA protocol does not solve all linguistic issues with users
inputting names in different scripts. Many important language-based and
script-based mappings are not covered in IDNA and need to be handled
outside the protocol. For example, names that are entered in a mix of
traditional and simplified Chinese characters will not be mapped to a
single canonical name. Another example is Scandinavian names that are
entered with U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS) will not be
mapped to U+00F8 (LATIN SMALL LETTER O WITH STROKE).
<ADDED>
An example of an important issue that is not considered in
detail in IDNA is how to provide a high probability that a user who
is entering a domain name based on visual information (such as from a
business card or billboard) or aural information (such as from a
telephone or radio) would correctly enter the IDN.
This a complex issue relating to languages, input methods on
computers, and so on. Furthermore, the kind of matching and
searching necessary for a high probability of success would not fit
the role of the DNS and its exact matching function.
---