[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] draft about <draft-ietf-idn-uname-01.txt>



Dear IETF & IDN WG:

Attached is <draft-ietf-idn-uname-01.txt>.

This document add a practical case to indicate that using CNAME to
implement
UNAME is workable for Internet application to fetch a unique name. And
show
it could be treated as a further process of NAMEPREP and it is
compatible with
the IDNA aproach.

Thanks for any suggestion and comment!

Erin Chen
TWNIC
 
Internet Draft                                         Authors: Li Ming TSENG
<draft-ietf-idn-uname-01.txt>                    		  Jan Ming HO
13 Jul 2001                                                      Hua Lin QIAN
Expires 13 Jan 2002					          Kenny HUANG                                                               Editor: James SENG

       Internationalized Domain Names and Unique Identifiers/Names

Status of this Memo

    This document is an Internet-Draft and is in full conformance 
    with all provisions of Section 10 of RFC2026.

    Internet-Drafts are working documents of the Internet 
    Engineering Task Force (IETF), its areas, and its working 
    groups. Note that other groups may also distribute working 
    documents as Internet-Drafts.

    Internet-Drafts are draft documents valid for a maximum of
    six months and may be updated, replaced, or obsoleted by other
    documents at any time. It is inappropriate to use Internet-
    Drafts as reference material or to cite them other than as
    "work in progress."

    The list of current Internet-Drafts can be accessed at
    http://www.ietf.org/ietf/1id-abstracts.txt

    The list of Internet-Draft Shadow Directories can be accessed at
    http://www.ietf.org/shadow.html


Terminology

The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
"MAY" in this document are to be interpreted as described in RFC 2119
[RFC2119].

Abstract

One of the biggest technical challenge of Internationalized Domain 
Names (IDN) is how to determine if the two given domain names matches. 
The current approach to this problem is via a process known as 
[NAMEPREP]. 

This document attempts to describe an alternative view and solution to 
the IDN matching problem. It could be treated as a further process of 
NAMEPREP and it is compatible with the IDNA aproach.

There is a practical case to indicate that using CNAME to implement 
UNAME is workable for Internet application to fetch a unique name.

1. Introduction

The Chinese Domain Name Consortium (CDNC) has taken a very keen 
interest in the IDN, in particular, the uses of chinese script in the 
domain names. CDNC are formed by the regional registries (CNNIC, TWNIC, 
HKNIC and MONIC) and have experimented doing Chinese Domain Names 
System for many months.

The primarily motivation for this proposal is due to the lack of 
support of Traditional and Simplified Chinese in NAMEPREP. See [HAN] 
for a discussion of Traditional/Simplified Han Ideograph problems. 

In addition, given the operational experience of the registries, this 
proposal will reduce the operational and deployment cost from a TLD 
managers' perspective based on the examinations and developments in 
CDNC.

Backward compatibility, interoperability, scalability, security, 
operational and deployment are all elements that must be considered as 
part of criteria when designing internationalized domain name system.

2. Background on Legacy Encoding

The most popular Chinese character set used in Taiwan is the 
industrial standard "BIG5" and the corresponding one in China is 
"GBK". BIG5 have primarily Traditional Chinese characters and GBK have 
Simplified Chinese.
In addition, the China government has also mandated that all Chinese 
software in China must support a new standard that supercede GBK known 
as GB18030.

Both BIG5 and GBK are widely used in China, Taiwan, Hong Kong and 
Macao and supported within many operating systems including Windows. 
Thus, supporting these encodings in IDN is essential from a 
geographical perspective.

3. An overview of current proposals and its problems

3.1. ASCII Compatible Encoding (ACE)

The need of supporting ACE in IDN has been extensively discussed in 
the IDN Working Group. Backward compatibility is the strongest 
advantage of ACE. The deployment of ACE neither affects the existing 
naming infrastructure, nor creates potential damage of current 
Internet applications. To move the current Internet to multilingual 
infrastructure, ACE obviously is the most appropriate bridging 
solution.

Although ACE has the advantages mentioned above, but most of the 
user's systems support local encoding. User doesn't want to download 
any special software or upgrade their software in order to handle 
multilingual domain name system. The support of native encoding 
without altering user's software has became an important issue for 
TLD managers'.

3.2. NAMEPREP

The design goal of [NAMEPREP] is to allow users to enter host names 
in applications and have the highest chance of getting the name 
correct. The NAMEPREP process comprises of three basic steps, namely 
"MAP", "NORMALIZATION" and "PROHIB".

The MAP and NORMALIZATION step aims to reduce the number of possible 
representations domain name that should be equivalent. These are 
based upon Unicode Technical Reports [UTR15] and [UTR21]. However, 
when there are multiple representations of the same domain name but 
matching changes depending on languages and context, NAMEPREP will 
fail in these cases. Of our interest, Traditional and Simplified 
Chinese ideograph cannot be handled by NAMEPREP.

4. Alternative view to the problem space

While the IDN WG has been working very hard to solve the ACE and 
NAMEPREP in IDN, it is apparently that there is another view to these 
problems that may give us a different approach and solution.

First, there is an assumption that NAMEPREP IDN is ISO10646/Unicode 
string. In reality, most IDN is often encoded in legacy encoding and 
a additional step have to be taken to covert it to ISO10646/Unicode.

Other than the backward compatibility feature of ACE, ACE is also an 
identifier string for an IDN. And the NAMEPREP process is to unify the 
various possible representations of IDNs to a single "unique name" for 
matching purposes.

In other words, we have a conceptual model.

  +-------+     +---------+   (ISO10646)
  |XYZ.COM|-->--|Transcode|-->------------+
  +-------+     +---------+      +----------------+     +---------------+
       :  (Legacy)         ...---|NAMEPREP/Unified|-->--|ACE/unique name|
  +-------+     +---------+      +----------------+     +---------------+
  |xyz.com|-->--|Transcode|-->------------+
  --------+     +---------+   (ISO10646)

5. Proposal

Given the context of the alternative view to IDN, we can derive another 
set of solution using a directory concept.

  +-------+       +---------+
  |XYZ.COM|-->----|         |     
  +-------+       |         |     +---------------+
       :  (Legacy)|Directory|-->--|ACE/unique name|
  +-------+       |         |     +---------------+
  |xyz.com|-->----|         |
  +-------+       +---------+

In section 3.2., it mentioned there are some ideograph cannot only be 
handled by NAMEPREP's "MAP", "NORMALIZATION" and "PROHIB" essential 
process. To build up a directory system is to doing as a further  
NAMEPREP process. The further process will solve the matching problem.
For example the one to many and many to one mapping.

The purpose of this directory system is to list all the possible 
representations of IDNs and unify them to a unique name. This unique 
name could be an ACE of the most common representation or NAMEPREPPED 
ACE.

The content of the directory is build up upon registration whereby 
registrant will have to provide a list of equivalence representation 
of the domain names they registered.

However, there is still a question of what directory should we use. 
In this document, we shall examine a couple of different solutions.

5.1. LDAP as Directory

Lightweight Directory Access Protocol [LDAP] is one of the most 
widely used directory protocols. In LDAP, there is a concept of 
hierarchy similar to the DNS hierarchy. Hence, it is possible to 
distribute the content of the directory across various LDAP servers 
for scalability and authority control. For example, each registries 
who wish to deploy IDN may setup an LDAP server and to register this 
LDAP with a "root" LDAP server.

The IDN query process would then look something like this:
   a. User Input IDN name into an application
   b. Application does a LDAP query to look for unique name
   c. Application use unique name to do DNS lookup

Advantages:
   - encapsulate the problem in the representation layer and 
     registration time
   - able to handle with unification problems 

Disadvantage 
   - requires all applications to upgrade 
   - additional LDAP lookup overhead
   - policy issues with "root" LDAP server
   - requires access to LDAP servers to function, i.e. can't work 
offline

5.2. CNRP as Directory

Common Name Resolution Protocol [CNRP] is a newly developed protocol 
in IETF that does common names resolutions. In CNRP, there is no 
concept of hierarchy but there is a referrer scheme. Hence, it is 
possible to build a distributed directory system whereby they refer 
to each another.

The IDN query process would then look something like this:
  a. User Input IDN name into an application
  b. Application does a CNRP query to look for unique name
  c. Application use unique name to do DNS lookup

Advantages:
   - encapsulate the problem in the representation layer and 
     registration time
   - able to handle with unification problems
   - no policy issues with "root" CNRP server

Disadvantage 
   - requires all applications to upgrade
   - additional CNRP lookup overhead and no assurance that unique name 
     can be located
   - requires access to CNRP servers to function, i.e. can't work 
offline

5.3. DNS as Directory

Domain Name System [DNS] is a widely established lookup distributed 
directory. There is an existing hierarchy structure and resource 
records are distributed. In theory, the DNS is able to handle 8-bit 
binary string.

The IDN query process would then look something like this:
   a. User Input IDN name into an application
   b. Application does a DNS query to look for unique name which will 
return the       Resource Record of the unique name together

Advantages:
   - encapsulate the problem in the representation layer and 
     registration time
   - able to handle with unification problems
   - existing "root" DNS server with existing hierarchy
   - does not requires all applications to upgrade

Disadvantage 
   - unknown behavior on applications which cannot handle 8-bit 
   - unknown behavior of servers/caching software which cannot handle 
8-bit

6. Solution

Given CDNC operational experience that it is difficult to get 
applications developers to upgrade, difficult to get users to 
download new applications and difficult etc, using DNS as a Directory 
would be the fastest approach to deploy IDN for our users.

6.1. Zone file

Because there are multiple encoding and multiple representation of the 
same name even within the same encoding, for a single name, there are 
multiple binary strings for a single domain name (e.g. ML1, ML2, ML3, 
ML4).

Hence, we would create the following Resource Records within the name 
server. In the Resource Records, it would look like this:

ML1		UNAME		ACE1
ML2		UNAME		ACE1
ML3		UNAME		ACE1
ML4		UNAME		ACE1

ACE1		IN 	A	1.2.3.4.
		IN	A	1.2.3.4.

A "UNAME" Resource Record is shown here. In practice, it could be 
CNAME (except CNAME is unable to handle MX).

6.2. The practical case of implementing UNAME with CNAME

Before the UNAME protocol is defined, in TWNIC IDN testbed, it has 
implimented IDN unique name with CNAME in current stage. When register 
a Traditional Chinese domain name(TCDN) can get another one 
corresponding Simplified Chinese domain name(SCDN). The Traditional 
and Simplified Chinese Conversion is defined in [TSCONV].

The Resource Records is look like this:

TCDN1		CNAME		EDN1
SCDN1		CNAME		EDN1
EDN1		IN	A	IP-of-EDN1

If the EDN1 is not in the same domain with TCDN1 and SCDN1, that the
Resource Record of EDN1 would in the different zone file. The left 
side of CNAME Resource Record would be all of the equivalent ML1, 
ML2 .... Like TCDN1 and SCDN1 are equivalent. The right side of CNAME 
Resource Record would be an unique name of ACE compatible. EDN1
(English Domain Name 1) is a kind of ACE compatible nique name. EDN1 
could be substiude with any kind of ACE compatible unique name. Such 
like xACE encode or random number. Once the xACE is decided by IETF 
IDN WG, the implimentation would adopt the standard. The unique name 
also retains compatible with [IDNA] approach. 

In order to get the unique name EDN1 not the destination IP-of-EDN1,
there would be construct some intermediate server. In TWNIC testbed,
there are Web DNS or DNS proxy as the intermediate server. Any 
application can pass a TCDN1 or SCDN1 to the intermediate server. The 
intermediate server would ask the DNS for the coresponding right side 
which is the unique name. And then pass the unique name EDN1 to the 
application. And then go with the current DNS infrastructure. Once the 
UNAME protocol is defined, there is no more need a intermediate server.

The process could be represented as following:

                                +------+
                                | User |
                                +------+
                                 |    ^
                   Request to AP |    | Response from AP
                   with MDN      |    |                   End system
    +----------------------------|----|----------------------------+
    |                            v                                 |
    |  +--------------------------------------------------------+  |
    |  |                  Application Client                    |  |
    |  +--------------------------------------------------------+  |
    |      |  ^ Nameprepped               |  ^         |  ^        |
    |  MDN |  | ACE compatible            |  |         |  |        |
    |      |  | unique name               |  | IP of   |  |        |
    |      v  |            Nameprepped    |  | unique  |  |        | 
    |  +--------------+    ACE compatible |  | name    |  |        |
    |  | intermediate |    unique name    v  |         |  |        |
    |  |  server      |             +----------+       |  |        |
    |  +--------------+             | Resolver |       |  |        |
    |      |  ^ Nameprepped         +----------+       |  |        |
    |  MDN |  | ACE compatible        |  ^             |  |        |          
    |      |  | unique name           |  |             |  |        |
    |  +--------------+               |  |     Request |  |Response|
    |  | Directory of |               |  |     for     |  |from    |
    |  | DNS          |               |  |     service |  |server  |
    |  +--------------+               |  |             |  |        |
    |                                 |  |             |  |        |
    +---------------------------------|--|-------------|--|--------+
                       Nameprepped ACE|  | IP of       |  |
                       compatible     |  | unique      |  |
                       unique name    v  | name        v  |
                            +-------------+  +---------------------+
                            | DNS servers |  | Application servers |
                            +-------------+  +---------------------+

6.3. Advantages

The strongest advantage to this solution is that:
a. This does not requires our users to download any special software 
or upgrade their software since it is able to handle the native 
encoding of the user directly

b. It will work immediately for ccTLD who wish to offers ML.ccTLD 
services without any changes at the user client

c. It also retains the compatible with IDNA approach so long we keep 
the unique name equivalent to NAMEPREPPED ACE.

d. Existing DNS hierarchy 

6.4. Potential Loopholes

There are many loopholes within this solution that we need to take 
note:

a. Some "smart" localized browser will send out "wrong" binary 
string due to different. For example, English Internet Explorer will 
not be able to handle Chinese double-byte legacy encoding properly.
But if there is the requirement of use double-byte encoding, the 
appropriate application environment is necessary.

b. While Chinese have a handful (usually 2 to 3) representation 
forms for a single IDN, other languages may have much more 
complicated representations which may not be suitable to use this 
approach. For example, if case-folding for Latin character is done 
using this solution, for a string length of 32 characters, it will 
requires 2^32 entries in the DNS. But this could be solved in some 
other means.

c. It might be possible to construct a binary string in some legacy 
encoding which gives the same binary representation for another 
domain name (a.k.a. binary collision). The binary collesion of in the 
same zone could be avoided by registration system and policy. If the 
left side of UNAME (like ML1,ML2, ML3, ML4 or TCDN1, SCDN1) are not in 
the same zone, they would not occur binary collesion. The intermediate 
server would have the ability to decide which zone of DNS directory it 
sould access.

Acknowledgement

Author(s)

Li Ming Tseng, Prof
National Central University, TWNIC
Email: tsenglm@cc.ncu.edu.tw
Tel: +886-3-490-4421

Jan Ming Ho, Prof
Academia Sinica, TWNIC
Email: hoho@iis.sinica.edu.tw
Tel: +886-2-2788-3799 x 1803

Hua lin Qian, Prof
Chinese Academy of Science, CNNIC
Email: hlqian@ns.cnc.ac.cn
Tel: +86-10-6256-9960

Kenny Huang
Asia Infra International Ltd, TWNIC
Email: huangk@alum.sinica.edu
Tel: +886-2-2658-6510

Editor: James SENG
i-DNS.net International
8 Temasek Boulevard
Suntec Tower Three #24-02
Singapore 038988
Email: jseng@i-dns.net
Tel: +65-2486-188

Editor: Erin Chen
Taiwan Network Information Center (TWNIC)
4F-2, No. 9, Sec. 2, Roosevelt Rd., Taipei, 100 Taiwan.
Email: erin@twnic.net.tw
Tel: +886-2-23411313#502

Reference

[IDNREQ]	Requirements of Internationalized Domain Names, Zita Wenzel, 
                James Seng, draft-ietf-idn-requirements

[NAMEPREP]	Preparation of Internationalized Host Names, P. Hoffman, 
                M. Blanchet, draft-ietf-idn-nameprep

[HAN]		Han Ideograph (CJK) for Internationalized Domain Names, 
                J. Seng, Y. Yoneya, K. Huang, K. Kim, draft-ietf-idn-cjk

[LDAP]	        Lightweight Directory Access Protocol (v3), M. Wahl, 
                T. Howes, S. Kille, rfc2251.txt

[CNRP]	        Common Name Resolution Protocol, N. Popp, M. Mealing, 
                M. Moseley, draft-ietf-cnrp

[DNS]		Domain Names - Implementation and Specification, 
                P. Mockapetris, RFC1035

[CJKV]          CJKV Information Processing ISBN 1-56592-224-7

[UTR15] 	Unicode Normalization Forms, Mark Davis and Martin Duerst,
                Unicode Technical Report 15.

[UTR21]         Case Mappings, Mark Davis, Unicode Technical Report 21.

[TSCONV]	Traditonal and Simplified Chinese Conversion, XiaoDong LEE,
		HSU NAI-WEN, Erin Chen, GuoNian SUN, CNNIC, TWNIC, CDNC,
		draft-ietf-idn-tsconv

[IDNA]		Internationalizing Host Names In Applications (IDNA),
		Patrik Faltstrom, Paul Hoffman, draft-ietf-idn-idna                                    Cisco