[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Internet Draft uname.txt



Here is a prelimary I-D on Internationalized Domain Names and Unique
Identifiers/Names. It is unable to reach the cut-off date but a
presentation has been requested to Marc for the meeting as an
alternative view to the current Nameprep-ACE approach taken by the WG.

Please give your comments.

-James Seng
Internet Draft                                         Authors: Li Ming Tseng
<draft-ietf-idn-uname-00.txt>                                     Jan Ming Ho
xx Mar 2001                                                      Hua Lin Qian
Expires XXX Sep 2001					          Kenny Huang
                                                           Editor: James Seng

       Internationalized Domain Names and Unique Identifiers/Names

Status of this Memo

    This document is an Internet-Draft and is in full conformance 
    with all provisions of Section 10 of RFC2026.

    Internet-Drafts are working documents of the Internet 
    Engineering Task Force (IETF), its areas, and its working 
    groups. Note that other groups may also distribute working 
    documents as Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of
    six months and may be updated, replaced, or obsoleted by other
    documents at any time. It is inappropriate to use Internet-
    Drafts as reference material or to cite them other than as
    "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html

Abstract

One of the biggest technical challenge of Internationalized Domain Names (IDN) 
is how to determine if the two given domain names matches. The current 
approach to this problem is via a process known as [NAMEPREP]. 

This document attempts to describe an alternative view and solution to the 
IDN matching problem.

1. Introduction

The Chinese Domain Name Consortium (CDNC) has taken a very keen interest in 
the IDN, in particular, the uses of chinese script in the domain names. CDNC 
are formed by the regional registries (CNNIC, TWNIC, HKNIC and MONIC) and 
have experimented doing Chinese Domain Names System for many months.

The primarily motivation for this proposal is due to the lack of support of 
Traditional and Simplified Chinese in NAMEPREP. See [HAN] for a discussion of 
Traditional/Simplified Han Ideograph problems. 

In addition, given the operational experience of the registries, this 
proposal will reduce the operational and deployment cost from a TLD managers' 
perspective based on the examinations and developments in CDNC.

Backward compatibility, interoperability, scalability, security, operational 
and deployment are all elements that must be considered as part of criteria 
when designing internationalized domain name system.

2. Background on Legacy Encoding

The most popular Chinese character set used in Taiwan is the industrial 
standard "BIG5" and the corresponding one in China is "GBK". BIG5 have 
primarily Traditional Chinese characters and GBK have Simplified Chinese.
In addition, the China government has also mandated that all Chinese software 
in China must support a new standard that supercede GBK known as GB18030.

Both BIG5 and GBK are widely used in China, Taiwan, Hong Kong and Macao and 
supported within many operating systems including Windows. Thus, supporting 
these encodings in IDN is essential from a geographical perspective.

3. An overview of current proposals and its problems

3.1. ASCII Compatible Encoding (ACE)

The need of supporting ACE in IDN has been extensively discussed in the IDN 
Working Group. Backward compatibility is the strongest advantage of ACE. The 
deployment of ACE neither affects the existing naming infrastructure, nor 
creates potential damage of current Internet applications. To move the 
current Internet to multilingual infrastructure, ACE obviously is the most 
appropriate bridging solution.

Although ACE has the advantages mentioned above, but most of the user's 
systems support local encoding. User doesn't want to download any special 
software or upgrade their software in order to handle multilingual domain 
name system. The support of native encoding without altering user's software 
has became an important issue for TLD managers'.

3.2. NAMEPREP

The design goal of NAMEPREP is to allow users to enter host names in 
applications and have the highest chance of getting the name correct. The 
NAMEPREP process comprises of three basic steps, namely "MAP", 
"NORMALIZATION" and "PROHIB".

The MAP and NORMALIZATION step aims to reduce the number of possible 
representations domain name that should be equivalent. These are based upon 
Unicode Technical Reports [UTR15] and [UTR21]. However, when there are 
multiple representations of the same domain name but matching changes 
depending on languages and context, NAMEPREP will fail in these cases. Of our 
interest, Traditional and Simplified Chinese ideograph cannot be handled by 
NAMEPREP.

4. Alternative view to the problem space

While the IDN WG has been working very hard to solve the ACE and NAMEPREP in 
IDN, it is apparently that there is another view to these problems that may 
give us a different approach and solution.

First, there is an assumption that NAMEPREP IDN is ISO10646/Unicode string. 
In reality, most IDN is often encoded in legacy encoding and a additional 
step have to be taken to covert it to ISO10646/Unicode.

Other than the backward compatibility feature of ACE, ACE is also an 
identifier string for an IDN. And the NAMEPREP process is to unify the 
various possible representations of IDNs to a single "unique name" for 
matching purposes.

In other words, we have a conceptual model.

  +-------+     +---------+   (ISO10646)
  |XYZ.COM|-->--|Transcode|-->------------+
  +-------+     +---------+      +----------------+     +---------------+
       :  (Legacy)         ...---|NAMEPREP/Unified|-->--|ACE/unique name|
  +-------+     +---------+      +----------------+     +---------------+
  |xyz.com|-->--|Transcode|-->------------+
  --------+     +---------+   (ISO10646)

5. Proposal

Given the context of the alternative view to IDN, we can derive another set 
of solution using a directory concept.

  +-------+       +---------+
  |XYZ.COM|-->----|         |     
  +-------+       |         |     +---------------+
       :  (Legacy)|Directory|-->--|ACE/unique name|
  +-------+       |         |     +---------------+
  |xyz.com|-->----|         |
  +-------+       +---------+

The purpose of this directory system is to list all the possible 
representations of IDNs and unify them to a unique name. This unique name 
could be an ACE of the most common representation or NAMEPREPPED ACE.

The content of the directory is build up upon registration whereby registrant 
will have to provide a list of equivalence representation of the domain names 
they registered.

However, there is still a question of what directory should we use. In this 
document, we shall examine a couple of different solutions.

5.1. LDAP as Directory

Lightweight Directory Access Protocol [LDAP] is one of the most widely used 
directory protocols. In LDAP, there is a concept of hierarchy similar to the 
DNS hierarchy. Hence, it is possible to distribute the content of the 
directory across various LDAP servers for scalability and authority control. 
For example, each registries who wish to deploy IDN may setup an LDAP server 
and to register this LDAP with a "root" LDAP server.

The IDN query process would then look something like this:
a.	User Input IDN name into an application
b.	Application does a LDAP query to look for unique name
c.	Application use unique name to do DNS lookup

Advantages:
   - encapsulate the problem in the representation layer and 
     registration time
   - able to handle with unification problems 

Disadvantage 
   - requires all applications to upgrade 
   - additional LDAP lookup overhead
   - policy issues with "root" LDAP server
   - requires access to LDAP servers to function, i.e. can't work offline

5.2. CNRP as Directory

Common Name Resolution Protocol [CNRP] is a newly developed protocol in IETF 
that does common names resolutions. In CNRP, there is no concept of hierarchy 
but there is a referrer scheme. Hence, it is possible to build a distributed 
directory system whereby they refer to each another.

The IDN query process would then look something like this:
a.	User Input IDN name into an application
b.	Application does a CNRP query to look for unique name
c.	Application use unique name to do DNS lookup

Advantages:
   - encapsulate the problem in the representation layer and 
     registration time
   - able to handle with unification problems
   - no policy issues with "root" CNRP server

Disadvantage 
   - requires all applications to upgrade
   - additional CNRP lookup overhead and no assurance that unique name 
     can be located
   - requires access to CNRP servers to function, i.e. can't work offline

5.3. DNS as Directory

Domain Name System [DNS] is a widely established lookup distributed directory. 
There is an existing hierarchy structure and resource records are distributed. 
In theory, the DNS is able to handle 8-bit binary string.

The IDN query process would then look something like this:
a.	User Input IDN name into an application
b.	Application does a DNS query to look for unique name which will return 
the Resource Record of the unique name together

Advantages:
   - encapsulate the problem in the representation layer and 
     registration time
   - able to handle with unification problems
   - existing "root" DNS server with existing hierarchy
   - does not requires all applications to upgrade

Disadvantage 
   - unknown behavior on applications which cannot handle 8-bit 
   - unknown behavior of servers/caching software which cannot handle 8-bit

6. Solution

Given CDNC operational experience that it is difficult to get applications 
developers to upgrade, difficult to get users to download new applications 
and difficult etc, using DNS as a Directory would be the fastest approach to 
deploy IDN for our users.
6.1. Zone file

Because there are multiple encoding and multiple representation of the same 
name even within the same encoding, for a single name, there are multiple 
binary strings for a single domain name (e.g. ML1, ML2, ML3, ML4).

Hence, we would create the following Resource Records within the name server. 
In the Resource Records, it would look like this:

ML1		UNAME		ACE1
ML2		UNAME		ACE1
ML3		UNAME		ACE1
ML3		UNAME		ACE1 

ACE1		IN 	A	1.2.3.4.
		IN	A	1.2.3.4.

A "UNAME" Resource Record is shown here. In practice, it could be CNAME 
(except CNAME is unable to handle MX).

6.2. Advantages

The strongest advantage to this solution is that:
a.	This does not requires our users to download any special software or 
upgrade their software since it is able to handle the native encoding 
of the user directly

b.	It will work immediately for ccTLD who wish to offers ML.ccTLD services 
without any changes at the user client

c.	It also retains the compatible with IDNA approach so long we keep the 
unique name equivalent to NAMEPREPPED ACE.

d.	Existing DNS hierarchy 

6.3. Potential Loopholes

There are many loopholes within this solution that we need to take note:
a.	Some "smart" localized browser will send out "wrong" binary string due 
to different. For example, English Internet Explorer will not be able 
to handle Chinese double-byte legacy encoding properly

b.	While Chinese have a handful (usually 2 to 3) representation forms for 
a single IDN, other languages may have much more complicated 
representations which may not be suitable to use this approach. For 
example, if case-folding for Latin character is done using this 
solution, for a string length of 32 characters, it will requires 2^32 
entries in the DNS. But this could be solved in some other means.

c.	It might be possible to construct a binary string in some legacy 
encoding which gives the same binary representation for another domain 
name (a.k.a. binary collision).

Acknowledgement

Author(s)

Li Ming TSENG, Prof
National Central University, TWNIC
Email: tsenglm@cc.ncu.edu.tw
Tel: +886-3-490-4421

Jan Ming HO, Prof
Academia Sinica, TWNIC
Email: hoho@iis.sinica.edu.tw
Tel: +886-2-2788-3799 x 1803

Hua lin QIAN, Prof
Chinese Academy of Science, CNNIC
Email: hlqian@ns.cnc.ac.cn
Tel: +86-10-6256-9960

Kenny HUANG
Asia Infra International Ltd, TWNIC
Email: huangk@alum.sinica.edu
Tel: +886-2-2658-6510

Editor: James SENG
i-DNS.net International
Email: jseng@i-dns.net
Tel: +65-2486-188

Reference

[IDNREQ]	Requirements of Internationalized Domain Names, Zita Wenzel, 
                James Seng, draft-ietf-idn-requirements

[HAN]		Han Ideograph (CJK) for Internationalized Domain Names, J. Seng,
		Y. Yoneya, K. Huang, K. Kim, draft-ietf-idn-cjk

[LDAP]

[CNRP]

[DNS]		Domain Names – Implementation and Specification, P. Mockapetris,
		RFC1035

[CJKV]          CJKV Information Processing. ISBN 1-56592-224-7

[UTR15] 	Unicode Normalization Forms, Mark Davis and Martin Duerst,
                Unicode Technical Report 15.

[UTR21]         Case Mappings, Mark Davis, Unicode Technical Report 21.