[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Internet Draft uname.txt
- To: <idn@ops.ietf.org>
- Subject: [idn] Internet Draft uname.txt
- From: "James Seng/Personal" <James@Seng.cc>
- Date: Mon, 19 Mar 2001 19:07:44 +0800
- Delivery-date: Mon, 19 Mar 2001 03:29:41 -0800
- Envelope-to: idn-data@psg.com
Here is a prelimary I-D on Internationalized Domain Names and Unique
Identifiers/Names. It is unable to reach the cut-off date but a
presentation has been requested to Marc for the meeting as an
alternative view to the current Nameprep-ACE approach taken by the WG.
Please give your comments.
-James Seng
Internet Draft Authors: Li Ming Tseng
<draft-ietf-idn-uname-00.txt> Jan Ming Ho
xx Mar 2001 Hua Lin Qian
Expires XXX Sep 2001 Kenny Huang
Editor: James Seng
Internationalized Domain Names and Unique Identifiers/Names
Status of this Memo
This document is an Internet-Draft and is in full conformance
with all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its working
groups. Note that other groups may also distribute working
documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
Abstract
One of the biggest technical challenge of Internationalized Domain Names (IDN)
is how to determine if the two given domain names matches. The current
approach to this problem is via a process known as [NAMEPREP].
This document attempts to describe an alternative view and solution to the
IDN matching problem.
1. Introduction
The Chinese Domain Name Consortium (CDNC) has taken a very keen interest in
the IDN, in particular, the uses of chinese script in the domain names. CDNC
are formed by the regional registries (CNNIC, TWNIC, HKNIC and MONIC) and
have experimented doing Chinese Domain Names System for many months.
The primarily motivation for this proposal is due to the lack of support of
Traditional and Simplified Chinese in NAMEPREP. See [HAN] for a discussion of
Traditional/Simplified Han Ideograph problems.
In addition, given the operational experience of the registries, this
proposal will reduce the operational and deployment cost from a TLD managers'
perspective based on the examinations and developments in CDNC.
Backward compatibility, interoperability, scalability, security, operational
and deployment are all elements that must be considered as part of criteria
when designing internationalized domain name system.
2. Background on Legacy Encoding
The most popular Chinese character set used in Taiwan is the industrial
standard "BIG5" and the corresponding one in China is "GBK". BIG5 have
primarily Traditional Chinese characters and GBK have Simplified Chinese.
In addition, the China government has also mandated that all Chinese software
in China must support a new standard that supercede GBK known as GB18030.
Both BIG5 and GBK are widely used in China, Taiwan, Hong Kong and Macao and
supported within many operating systems including Windows. Thus, supporting
these encodings in IDN is essential from a geographical perspective.
3. An overview of current proposals and its problems
3.1. ASCII Compatible Encoding (ACE)
The need of supporting ACE in IDN has been extensively discussed in the IDN
Working Group. Backward compatibility is the strongest advantage of ACE. The
deployment of ACE neither affects the existing naming infrastructure, nor
creates potential damage of current Internet applications. To move the
current Internet to multilingual infrastructure, ACE obviously is the most
appropriate bridging solution.
Although ACE has the advantages mentioned above, but most of the user's
systems support local encoding. User doesn't want to download any special
software or upgrade their software in order to handle multilingual domain
name system. The support of native encoding without altering user's software
has became an important issue for TLD managers'.
3.2. NAMEPREP
The design goal of NAMEPREP is to allow users to enter host names in
applications and have the highest chance of getting the name correct. The
NAMEPREP process comprises of three basic steps, namely "MAP",
"NORMALIZATION" and "PROHIB".
The MAP and NORMALIZATION step aims to reduce the number of possible
representations domain name that should be equivalent. These are based upon
Unicode Technical Reports [UTR15] and [UTR21]. However, when there are
multiple representations of the same domain name but matching changes
depending on languages and context, NAMEPREP will fail in these cases. Of our
interest, Traditional and Simplified Chinese ideograph cannot be handled by
NAMEPREP.
4. Alternative view to the problem space
While the IDN WG has been working very hard to solve the ACE and NAMEPREP in
IDN, it is apparently that there is another view to these problems that may
give us a different approach and solution.
First, there is an assumption that NAMEPREP IDN is ISO10646/Unicode string.
In reality, most IDN is often encoded in legacy encoding and a additional
step have to be taken to covert it to ISO10646/Unicode.
Other than the backward compatibility feature of ACE, ACE is also an
identifier string for an IDN. And the NAMEPREP process is to unify the
various possible representations of IDNs to a single "unique name" for
matching purposes.
In other words, we have a conceptual model.
+-------+ +---------+ (ISO10646)
|XYZ.COM|-->--|Transcode|-->------------+
+-------+ +---------+ +----------------+ +---------------+
: (Legacy) ...---|NAMEPREP/Unified|-->--|ACE/unique name|
+-------+ +---------+ +----------------+ +---------------+
|xyz.com|-->--|Transcode|-->------------+
--------+ +---------+ (ISO10646)
5. Proposal
Given the context of the alternative view to IDN, we can derive another set
of solution using a directory concept.
+-------+ +---------+
|XYZ.COM|-->----| |
+-------+ | | +---------------+
: (Legacy)|Directory|-->--|ACE/unique name|
+-------+ | | +---------------+
|xyz.com|-->----| |
+-------+ +---------+
The purpose of this directory system is to list all the possible
representations of IDNs and unify them to a unique name. This unique name
could be an ACE of the most common representation or NAMEPREPPED ACE.
The content of the directory is build up upon registration whereby registrant
will have to provide a list of equivalence representation of the domain names
they registered.
However, there is still a question of what directory should we use. In this
document, we shall examine a couple of different solutions.
5.1. LDAP as Directory
Lightweight Directory Access Protocol [LDAP] is one of the most widely used
directory protocols. In LDAP, there is a concept of hierarchy similar to the
DNS hierarchy. Hence, it is possible to distribute the content of the
directory across various LDAP servers for scalability and authority control.
For example, each registries who wish to deploy IDN may setup an LDAP server
and to register this LDAP with a "root" LDAP server.
The IDN query process would then look something like this:
a. User Input IDN name into an application
b. Application does a LDAP query to look for unique name
c. Application use unique name to do DNS lookup
Advantages:
- encapsulate the problem in the representation layer and
registration time
- able to handle with unification problems
Disadvantage
- requires all applications to upgrade
- additional LDAP lookup overhead
- policy issues with "root" LDAP server
- requires access to LDAP servers to function, i.e. can't work offline
5.2. CNRP as Directory
Common Name Resolution Protocol [CNRP] is a newly developed protocol in IETF
that does common names resolutions. In CNRP, there is no concept of hierarchy
but there is a referrer scheme. Hence, it is possible to build a distributed
directory system whereby they refer to each another.
The IDN query process would then look something like this:
a. User Input IDN name into an application
b. Application does a CNRP query to look for unique name
c. Application use unique name to do DNS lookup
Advantages:
- encapsulate the problem in the representation layer and
registration time
- able to handle with unification problems
- no policy issues with "root" CNRP server
Disadvantage
- requires all applications to upgrade
- additional CNRP lookup overhead and no assurance that unique name
can be located
- requires access to CNRP servers to function, i.e. can't work offline
5.3. DNS as Directory
Domain Name System [DNS] is a widely established lookup distributed directory.
There is an existing hierarchy structure and resource records are distributed.
In theory, the DNS is able to handle 8-bit binary string.
The IDN query process would then look something like this:
a. User Input IDN name into an application
b. Application does a DNS query to look for unique name which will return
the Resource Record of the unique name together
Advantages:
- encapsulate the problem in the representation layer and
registration time
- able to handle with unification problems
- existing "root" DNS server with existing hierarchy
- does not requires all applications to upgrade
Disadvantage
- unknown behavior on applications which cannot handle 8-bit
- unknown behavior of servers/caching software which cannot handle 8-bit
6. Solution
Given CDNC operational experience that it is difficult to get applications
developers to upgrade, difficult to get users to download new applications
and difficult etc, using DNS as a Directory would be the fastest approach to
deploy IDN for our users.
6.1. Zone file
Because there are multiple encoding and multiple representation of the same
name even within the same encoding, for a single name, there are multiple
binary strings for a single domain name (e.g. ML1, ML2, ML3, ML4).
Hence, we would create the following Resource Records within the name server.
In the Resource Records, it would look like this:
ML1 UNAME ACE1
ML2 UNAME ACE1
ML3 UNAME ACE1
ML3 UNAME ACE1
ACE1 IN A 1.2.3.4.
IN A 1.2.3.4.
A "UNAME" Resource Record is shown here. In practice, it could be CNAME
(except CNAME is unable to handle MX).
6.2. Advantages
The strongest advantage to this solution is that:
a. This does not requires our users to download any special software or
upgrade their software since it is able to handle the native encoding
of the user directly
b. It will work immediately for ccTLD who wish to offers ML.ccTLD services
without any changes at the user client
c. It also retains the compatible with IDNA approach so long we keep the
unique name equivalent to NAMEPREPPED ACE.
d. Existing DNS hierarchy
6.3. Potential Loopholes
There are many loopholes within this solution that we need to take note:
a. Some "smart" localized browser will send out "wrong" binary string due
to different. For example, English Internet Explorer will not be able
to handle Chinese double-byte legacy encoding properly
b. While Chinese have a handful (usually 2 to 3) representation forms for
a single IDN, other languages may have much more complicated
representations which may not be suitable to use this approach. For
example, if case-folding for Latin character is done using this
solution, for a string length of 32 characters, it will requires 2^32
entries in the DNS. But this could be solved in some other means.
c. It might be possible to construct a binary string in some legacy
encoding which gives the same binary representation for another domain
name (a.k.a. binary collision).
Acknowledgement
Author(s)
Li Ming TSENG, Prof
National Central University, TWNIC
Email: tsenglm@cc.ncu.edu.tw
Tel: +886-3-490-4421
Jan Ming HO, Prof
Academia Sinica, TWNIC
Email: hoho@iis.sinica.edu.tw
Tel: +886-2-2788-3799 x 1803
Hua lin QIAN, Prof
Chinese Academy of Science, CNNIC
Email: hlqian@ns.cnc.ac.cn
Tel: +86-10-6256-9960
Kenny HUANG
Asia Infra International Ltd, TWNIC
Email: huangk@alum.sinica.edu
Tel: +886-2-2658-6510
Editor: James SENG
i-DNS.net International
Email: jseng@i-dns.net
Tel: +65-2486-188
Reference
[IDNREQ] Requirements of Internationalized Domain Names, Zita Wenzel,
James Seng, draft-ietf-idn-requirements
[HAN] Han Ideograph (CJK) for Internationalized Domain Names, J. Seng,
Y. Yoneya, K. Huang, K. Kim, draft-ietf-idn-cjk
[LDAP]
[CNRP]
[DNS] Domain Names – Implementation and Specification, P. Mockapetris,
RFC1035
[CJKV] CJKV Information Processing. ISBN 1-56592-224-7
[UTR15] Unicode Normalization Forms, Mark Davis and Martin Duerst,
Unicode Technical Report 15.
[UTR21] Case Mappings, Mark Davis, Unicode Technical Report 21.