[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

alpha v0.3



Here is version 0.3. 

Bill's comment will be integrated in next version as I just recieve it.

Thanks to Paul Hoffman who has done a lot of editing on the doc.

-James Seng
             Requirements of Internationalized Domain Names

Status of this Memo

This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.


Abstract

This document describes the requirement for encoding international
characters into DNS names and records. This document is guidance for
developing protocols for internationalised domain names.


1. Introduction

At present, the encoding of Internet domain names is restricted to a
subset of 7-bit ASCII (ISO/IEC 646). HTML, XML, IMAP, FTP, and many
other text based items on the Internet have already been
internationalised. It is important for domain names to be similarly
internationalised.

This document is being discussed on the "idn" mailing list. To join the
list, send a message to <majordomo@ops.ietf.org> with the words
"subscribe idn" in the body of the message. Archives of the mailing
list can also be found at ftp://ops.ietf.org/pub/lists/idn*.

1.1 Definitions and Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].

"IDN" is used in this document as an abbreviation for "internationalized 
domain name". This is defined as a domain name that contains one or more 
characters that are outside the set of characters specified as legal 
characters for domain names in [RFC1034] Section 3.5.

A master server for a zone holds the main copy of that zone. This copy
is sometimes stored in a zone file. A slave server for a zone holds a
complete copy of the records for that zone. A caching server holds
temporary copies of DNS records; it uses records to answer queries
about domain names. Further explaination of these terms can be found in
[RFC1034] and [RFC1996].

Characters mentioned in this document are identified by their position
in the Unicode character set. The notation U+12AB, for example,
indicates the character at position 12AB (hexadecimal) in the Unicode
character set. Note that the use of this notation is not an indication
of a requirement to use Unicode.

Examples quoted in this document should be considered as a method to
further explain the meanings and principles adopted by the document. It
is not a requirement for the protocol to satisfy the examples.

[JS: Need comments on Pauls suggested addition on these. Thanks!]

A character is the smallest component of written language that has
semantic value. A character has a single abstract meaning and/or shape,
but not a specific shape.

A glyph is the specific shape that a character can have when it is
rendered or displayed. A single glyph may correspond to a single
character, or it may correspond to many characters; for example, the
same glyph is used to represent the Latin capital letter "P" and the
Greek capital letter "Rho". Similarly, a single character may
correspond to multiple glyphs due to font, formatting style, national
differences, and other reasons.

A character set (more precisely called a "coded character set" or
"CCS") is a mapping from a set of abstract characters to a set of
integers. Examples of coded character sets include ISO 10646, US-ASCII,
and the ISO 8859 series.

A character encoding scheme or "CES" is a mapping from one or more
coded character sets to a set of octets. Some CESs are associated with
a single CCS; for example, UTF-8 applies only to ISO 10646. Other CESs,
such as ISO 2022, are associated with many CCSs.

A charset is a method of mapping a sequence of octets to a
sequence of abstract characters. A charset is, in effect, a combination
of one or more CCS with a CES. Charset names are registered by the IANA
according to procedures documented in RFC 2278.

A language is a way that humans interact. In written form, a language
is expressed in characters. The same set of characters can often be
used in many languages, and many languages can be expressed using
different scripts. A particular charset may have different glyphs
(shapes) depending on the language being used.

2. General Requirements

2.1 Compatibility and Interoperability

The DNS is essential to the entire Internet. Therefore, the protocol 
must not damage present DNS interoperability. It must make the minimum 
number of changes to existing protocols on all layers of the stack. It 
must continue to allow any system anywhere to resolve any domain name.

The protocol must preserve the basic concept and facilities of domain 
names as described in [RFC1034]. It must maintain a single, global, 
universal, and consistent hierachical namespace.

The same name resolution request must generate the same response,
regardless of the location or localisation settings in the resolver, in
the master server, and in any slave or caching servers involved in the
resolution process.

The protocol should also allow creation of caching servers that do not
understand the charset in which a request or response is encoded. Such
caching servers should work as well for IDNs as they do for current
domain names. The caching server performs correctly if it gives the
essentially the same answer (without the authoritative bit) as the
master server would have if presented with the same request.

The protocol may modify the DNS protocol [RFC1035] and other related 
work undertaken by the DNSEXT WG. However, these changes should be as 
small as possible and any changes must be approved by the DNSEXT WG.

The protocol should be as simple as possible from the user's persective.
Ideally, users should not realize that IDN was added on to the existing
DNS.

The best solution is one that maintains complete compatibility with
current DNS standards as long as it meets the other requirements in
this document.

The protocol should be able to be upgraded at any time with new features 
and retain backwards compatiblity with the current specification.

2.2 Internationalization

Internationalized characters must be allowed to be represented and used
in DNS names and records. The protocol must specify what character 
encoding is used when resolving domain names and how characters are 
encoded in DNS records.

This document does not recommand any character set for I18N. If 
multiple character sets are used in the protocol, then the protocol 
must specify all the character sets being used and for what purpose.

In order to simplify the requirements, it is assumed that characters
from the Unicode character set will be used for the protocol on the
wire. The protocol may use multiple character sets if doing so meets 
all the other requirements in this document. However, this should 
not constrain which character set or sets an IDN implementation may 
use for its user interface, or for the storage of records in a master 
file.

The protocol should not make any assumptions where in the domain name 
that internationalization might appear. In other words, it should not
differentiate between any part of a domain name because this may impose
a restriction on future internationalization efforts.

The protocol should also not make any cultural restrictions in the 
protocol. For example, an IDN implementation which only allows domain 
names to use a single local script would immediately restrict 
multinational organisation.

Because of the wide range of devices that use the DNS and the wide
range of characteristics of international scripts, The protocol should 
allow more than one method of domain name input and display. However, 
there has to be a single way of encoding an internationalized domain 
name within the core of the DNS.

2.3 Localization

The protocol must be able to handle localized requirement of different
languages. For example, IDN must be able to handle bidirectional
writing for scripts such as Arabic.

Historically, "." has been the separator of labels in the domain names.
The protocol should not (but may) use different separators for different
languages.

Most localization can be handled by the user interface. It should not
matter how the domain names are input or presentated, such as in a
reverse order or bidirectional, or with the introduction of a new
separator. However, the final wire format must be in canonical order.

2.4 Canonicalization

Matching rules are a complicated process for IDN. Canonicalization of
characters must follow precise and predictable rules to ensure
consistency. [CHARREQ] is a recommanded as a guide on canonicalization.

The DNS has to match a domain name in a request with a domain name held
in one or more zones. It also needs to sort names into order. It is
expected that some sort of canonicalisation algorithm will be used as
the first step of this process. This section discusses some of the
properties which will be required of that algorithm.

The canonicalization algorithm might specify operations for case,
ligature, and punctuation folding.

In order to retain backwards compatiblity with the current DNS, the 
protocol must retain the case-insensitive comparsion for US-ASCII as 
specified in [RFC1035]. For example, Latin captial letter A (U+0041) 
must match Latin small letter A (U+0061). [UTR-21] describes some of 
the issues with case mapping.

Case folding must not be locale dependent. For example, Latin capital
letter I (U+0049) case folded to lower case in Turkey context will
become Latin small letter dotless I (U+0131). But in English context,
it will become Latin small letter I (U+0069).

If other canonicalization is done, then it must be done before the
domain name is resolved. Further, the canonicalization must be easily
upgradable as new languages and writing systems are added.

Any conversion (case, ligature folding, punctuation folding, ...) from
what the user enters into a client to what the client asks for
resolution must be done identically on all requests from any
client.

If the protocol specifies a canonicalisation algorithm, a caching 
server should perform correctly regardless of how much (or how little) 
of that algorithm it has implemented.

2.5 Operational Issues

Zone files should remain easily editable.

An IDN-capable resolver or server should not generate any more traffic
than a non-IDN-capable resolver or server.

The protoocl should add no new centralized administration for the DNS. 
A domain administrator should be able to create internationalized names 
as easily as adding current domain names.

The character set of a signed zone file should be capable of being the same
as the character set of the unsigned zone file. The protocol must allow 
offline DNSSEC signing. It should be possible to look at the signed file 
and see that it is the same as the unsigned one.

2.6 Others

The protocol may implement internationalised text in TXT resource records.

--- need comments ---
Must allow characters outside the currently-acceptable range
for domain name parts in:
- DNS queries
- DNS RR response
- DNS TXT records
- DNS CNAME records
- DNS PTR records
---


3. Specific Requirements

3.1 Client Requirements

3.2 Server Requirements

3.3 Zone file Requirements


4. Technical Analysis

There are many standard protocols and RFCs which depend on domain names
and have make various assumptions about the characters in them always
conforming to [RFC-1034]. We expect that the protocols listed below to
be affected:

<...list the sets of RFCs which we would like to have an summary...>

The proposed protocol must contain a summary of the technical opinion 
of the IDN working group. (This is stated in the Charter)

5. Security Considerations

Any solution that meets the requirements in this document must not
be less secure than the current DNS. Specifically, the mapping of
internationalized host names to and from IP addresses must have the
same characteristics as the mapping of today's host names.

Specifying requirements for internationalized domain names does not
itself raise any new security issues. However, any change to the DNS
may affect the security of any protocol that relies on the DNS or on
DNS names. A thorough evaluation of those protocols for security
concerns will be needed when they are developed. In particular, IDNs
must be compatible with DNSSEC.

6. References

[RFC2119]   "Key words for use in RFCs to Indicate Requirement
             Levels", rfc2119.txt, March 1997, S. Bradner.

[RFC1034]       "Domain Names - Concepts and Facilities", rfc1034.txt,
             November 1987, P. Mockapetris

[RFC1035]   "Domain Names - Implementation and Specification",
             rfc1035.txt, November 1987, P. Mockapetris

[RFC1996]   "A Mechanism for Prompt Notification of Zone Changes
             (DNS NOTIFY)", rfc1996.txt, August 1996, P. Vixie

[CHARREQ]   "Requirements for string identity matching and String
             Indexing", http://www.w3.org/TR/WD-charreq, July 1998,
             World Wide Web Consortium

[UTR15]     "Unicode Normalization Forms", Unicode Technical Report
             #15, http://www.unicode.org/unicode/reports/tr15/,
             Nov 1999, M. Davis & M. Duerst, Unicode Consortium

[UTR21]     "Case Mappings", Unicode Technical Report #21,
             http://www.unicode.org/unicode/reports/tr21/, Dec 1999,
             M. Davis, Unicode Consortium

[DNSEXT]    "IETF DNS Extensions Working Group",
             namedroppers@internic.net, Olafur Gudmundson, Randy Bush


Appendix A. Acknowledgements

The editor gratefully acknowledges the contributions of:

Harald Tveit Alvestrand <Harald@Alvestrand.no>
Martin Duerst <duerst@w3.org>
Patrik Faltstrom <paf@swip.net>
Andrew Draper <ADRAPER@altera.com>
Bill Manning <bmanning@ISI.EDU>
Paul Hoffman <phoffman@imc.org>
James Seng <jseng@pobox.org.sg>
Randy Bush <randy@psg.com>
Alan Barret <apb@cequrux.com>
Olafur Gudmundsson <ogud@tislabs.com>
Karlsson Kent <keka@im.se>
Dan Oscarsson <Dan.Oscarsson@trab.se>