[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] My draft for internationalisation of DNS
Hi
As I was requested in one of the replys to my comments on this list,
to write an internet draft, I have tried to do that.
It is my first try at writing an internet
draft, so I am sure there is more work to be done before it is ready.
Attached is the basics in a draft specifying
how I think internationalisation of DNS could be done, from all
the discussions and suggestions we have had on the list.
Hopefully it matches what many of us want and have suggested.
Maybe it could be the base of one of the drafts/RFCs that is going to
be the result of this working group.
Dan
Internet Draft Dan Oscarsson
draft-oscarsson-idn-i18ndns.txt Telia ProSoft
Updates: RFC 2181, 1035, 1034, 2535
February 2000
Expires August 2000
Iternationalisation of the Domain Name Service
Status of this memo
This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
There is a very strong world-wide desire to use characters other than
ASCII in the DNS, especially in domain names. Domain names have become
the equivalent of business or product names for many services on the
Internet, so there is a need to make them usable by people whose native
scripts are not representable by ASCII.
This document updates the Domain Name System standard (DNS) [RFC1035] and
specifies how international characters are handled. It is completely
compatible with the current DNS (RFC 1034,1035, 2181, 2535 etc).
1. Introduction
There is an immediate need of using international characters (non-ascii)
in DNS. This means that DNS cannot be extended as this would take
too long time, instead the current ASCII only handling need to
be extended to non-ASCII in a way that can be used without updating
current software.
The basic handling of character data in DNS have several properties
that need to be preserved:
- The DNS itself places only one restriction on the particular labels
that can be used to identify resource records. That one restriction
relates to the length of the label and the full name. The length of
any one label is limited to between 1 and 63 octets. A full domain
name is limited to 255 octets (including the separators).
[RFC 2181]
- Any binary string whatever can be used as the label of any
resource record. Similarly, any binary string can serve as the value
of any record that includes a domain name as some or all of its value
(SOA, NS, MX, PTR, CNAME, and any others that may be added).
Implementations of the DNS protocols must not place any restrictions
on the labels that can be used. In particular, DNS servers must not
refuse to serve a zone because it contains labels that might not be
acceptable to some DNS client programs.
[RFC 2181]
- Names must be compared with case-insensitivity.
[RFC1035]
- The original case should be preserved when possible as data is entered
into the system. This also implies that responses should preserve case
when possible.
[RFC1035]
- The characters in the ASCII character set must still be encoded
as ASCII.
This document specifies the update needed of the DNS protocol, user
interface issues and the effect of other protocols.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
2. The DNS Protocol
The DNS protocol is used when communicating between DNS servers and
other DNS servers or DNS clients. User interface issues like the format
of zone files or how to enter or display domain names are not part
of the protocol.
The update of the protocol defined here can be used immediately as
it is fully compatible with the DNS of today.
2.1 Internationalisation aware software
Internationalisation aware DNS software (i18n aware) is software the
handles the rules for handling international text as defined here. Only
i18n aware software will get all requirements fullfilled. Non-i18n aware
will lose the case preserving requirement. Also only i18n aware
software may perform zone transfers.
I18n aware software identifies itself in a query or a response by
setting the IN bit in the DNS query/response format header. This
bit is the last unallocated bit in the header.
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA|IN|AD|CD| RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
This bit is zero in old servers and resolvers. Thus they identify
themselves as non-i18n aware.
I18n aware software MUST set the IN bit in both queries and responses.
2.2 Character data
Character data need to be able to represent as much as possible of
the characters in the world as well as being compatible with ASCII.
It must also be well defined so that it can easily be compared
in both case and case-insensitive matching and should be compact as
only 63 octets is available without an extention of the protocol.
Therefore character data MUST:
- Be ISO 10646 (UCS) [UCS].
- Be normalised using form KC as defined in Unicode technical
report #15 [UTR15].
If the character data is in a text string that is not used in
character matching, normalisation form C of [UTR15] may be used.
- Encoded using UTF-8 [RFC2279].
Case-insensitive matching MUST:
- Be done by folding the case to lower case using the CaseFolding.txt
mapping as defined in Unicode technical report #21 [UTR21] and
then comparing the data.
Note: Normalisation form KC results in compatible characters
merged into one (for example Greek A to Latin A). This results
in less user confusion (as the Greek A looks like Latin A and
many will assume it is a Latin A).
Note: Case folding to lower case using UTR#21 is not perfect. For
example in Turkey I is lower cased into a dotless i, but UTR#21
does it in the old ASCII way (I -> i). This way we get a well
defined lower caseing that can be used in matching, but it will
not be correct with all languages local rules.
2.3 Rules for character data in queries and responses
There is only one area which non-i18n aware software cannot
handle: case-insensitive matching of i18n data.
Because of this, the IN bit is defined and character data
MUST be handled as follows:
- In all queries all character data that will be used by the DNS-server
to lookup records, MUST be in lower case.
- A request containing an update of the data in the database of the
DNS-server (for example a DNS update) MUST send data in the
original case.
- A DNS-server MUST not send a zone transfer, if the server is
i18n aware and the client is not.
- A DNS-server getting a request from an i18n aware clinet MUST
return data using original case, just like old software does.
- A is8n aware DNS-server getting a request from a non-i18n aware
client MUST return all character data that can be used in character
matching, in lower case.
The result of the above rules results in that old non-i18n aware
DNS software only gets lower cased character data so that it can
still perform character data matching. I18n aware software will
get data as before, preserving case, but can still optimise
character matching as all normal queries will have their data
lower cased.
3. Characters allowed in domain names
The DNS protocol do not place any restriction on characters used in
a domain name. However applications that make use of DNS
data may have restrictions imposed on what particular values are
acceptable in their environment. If the client has such restrictions,
it is solely responsible for validating the data from the DNS to ensure
that it conforms before it makes any use of that data. [RFC 2181]
For example domains, hosts and e-mail addresses are represented in DNS
and may have different rules.
As the whole idea of internationalisation of DNS is to get domain names
with non-ascii, the original recommendation in DNS [RFC 1035] for
host/domain names needs to be updated.
It is recommended that domains, hosts and e-mail addresses all are
extended to allow all letters, digits and some separators of UCS.
[ Should the recommended set based on the Unicode character properties
be included here? ]
4. User interface issues
Locally on a system or in a user interface a different character set
then the one defined to be used in the DNS protocol. Therefore must
software map between the local character set and the character set of
the protocol, so that human beings can understand it.
This means that a zone file that is edited in a text editor by a person
before being loaded into a DNS server must be allowed to me in the local
character set. Software may not assume that the user can edit text
encoded in UTF-8. A zone file transmitted between DNS software that
is not handled by a human, can be transmitted using any format.
When character data is presented to a human or entered by a human,
software must, as good as possible, present it using local character
set and allow it to be entered using the local character set.
It is the resposibility of the software to convert between the local
character set and the one used in the protocol, not the human.
5. Effect on other protocols
As now a domain name may include non-ascii many other protocols
that include domain names need to be updated. For example
are SMTP, HTTP and URLs.
In many protocols domain names are used in headers. It is recommended
that they are updated to be encoded using UCS normalised using form C
or KC of UTR#15 and encoded using UTF-8. And the same format for
other character data of the protocols. This way ugly things like
quoted-printable can be obsoleted.
We can now expect users to want to have e-mail addresses with
non-ascii both before and after the @-sign.
Software need to be updated to follow the user interface recommendations
given above, so that a human will see the characters in their local
character set, if possible.
6. Security Considerations
As always with data, if software does not check for data that can
be a problem, security may be affected. As now more characters
than ASCII is allowed, software only expecting ASCII and with no checks
may now get security problems.
7. References
[RFC1034] Mockapetris, P., "Domain Names - Concepts and Facilities",
STD 13, RFC 1034, November 1987.
[RFC1035] Mockapetris, P., "Domain Names - Implementation and
Specification", STD 13, RFC 1035, November 1987.
[RFC2279] F. Yergeau, "UTF-8, a transformation format of
ISO 10646," RFC 2279, Alis Technologies, January 1998.
[RFC 2181] Elz, R. and R. Bush, "Clarifications to the DNS
Specification", RFC 2181, July 1997.
[RFC 2535] D. Eastlake, "Domain Name System Security Extensions".
RFC 2535, March 1999.
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, RFC 2119.
[UTR15] Mark Davis and Martin Duerst, "Unicode Normalization Forms",
Unicode Technical Report #15,
<http://www.unicode.org/unicode/reports/tr15/>.
[Unicode3] The Unicode Consortium, "The Unicode Standard -- Version
3.0", ISBN 0-201-61633-5. Described at
<http://www.unicode.org/unicode/standard/versions/Unicode3.0.html>.
[UnicodeData] The Unicode Character Database,
<ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt>.
The database is described in
<ftp://ftp.unicode.org/Public/UNIDATA/UnicodeCharacterDatabase.html>.
8. Acknowledgements
Paul Hoffman: draft-hoffman-idn-cidnuc-00.txt
Stuart Kwan, James Gilroy: draft-skwan-utf8-dns-02.txt
Kent Karlsson: Draft on domain name internationalisation.
Discussions by the members of the IDN working group.
9. Author's Address
Dan Oscarsson
Telia ProSoft AB
Box 85
201 20 Malmö
Sweden
E-mail: Dan.Oscarsson@trab.se