[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Using DNS for canonicalization data



Dear all,

From: James Seng <James@Seng.cc>
Subject: [idn] Presentations & RFC2026 
Date: Wed, 26 Jul 2000 07:23:10 +0800

> 1) Try to send your prelimary I-D to mailing list ASAP. It is okay if
>    it is not complete. You can send an update later.

Attached my preliminary I-Draft -- i.e. this is not a real I-Draft
yet. Anyway, any comments and/or suggestions are very welcome. See you
at IDN WG meeting!

Best regards,

-- 
Yoshiro YONEYA <yone@po.ntts.co.jp>
Internet Draft                                        Yoshiro YONEYA
draft-XXXXXXXXXXXX-00.txt                               NTT Software
July XX, 2000
Expires in six months

                Using DNS for canonicalization data

Status of this memo

This document is an Internet-Draft and is in full conformance with all
provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."


     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.


Abstract

This document describes how to provide canonicalization data of
characters in internationalized host names in DNS. The basic idea is
to make a list of usable characters in a zone file which depends on
Top Level Domain (TLD). This document also describes how to refer
data and canonicalize characters using it.

1. Introduction

Definition of characters usable as internationalized host names
basically have regional dependence. It should be a online table
which can be refered dynamically. In other words, it is not
appropriate to adopt regional dependence as standard of
Internationalized Domain Name (IDN). To provide such table as DNS
zone may suit for IDN. By the perspective that regional dependence
should be closely related to TLD, that is registration policy, this
document considers how to define usable characters in a zone based
on TLD.

1.1 Terminology

The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", and
"MAY" in this document are to be interpreted as described in RFC 2119
[RFC2119].


2. Domain name space and character table definition

2.1 Domain name space definition

Domain name space which holds table of characters usable as
internationalized host name in certain `TLD' MAY be located as:

    `TLD'.idn.arpa.

Here, `TLD' is substituted by certain TLD name which provides the
table.

2.2 Character table definition

The table of characters are defined in zone file as:

    `CHAR' IN TXT "`CANON_CHAR'"

Here, `CHAR' is a character which can be used as internationalized
host name label. And `CANON_CHAR' is a character which represents
canonicalized form of `CHAR'. For example,

    A IN TXT "a"

represents `A' in internationalized host name MUST be canonicalized
as `a'.

Also, `.' in `CANON_CHAR' has special meanings. When `CANON_CHAR' is
`.', it represents `CHAR' itself. For example,

    a IN TXT "."

represents `a' in internationalized host name SHOULD be
canonicalized as `a'.

Furthermore, `.' followed with some characters in `CANON_CHAR' has
another special meanings. `.^' means `CHAR' itself except for
beginning of internationalized host name. `.$' means `CHAR' itself
except for ending of internationalized host name. For example,

    - IN TXT ".^"
      IN TXT ".$"

represents `-' in internationalized host name SHOULD not be
beginning of or ending of label, otherwise SHOULD be canonicalized
as `-'.

When `CHAR' is `*', it represents wildcard, and MUST be used with
`.' as `CANON_CHAR'. This notation SHOULD not be used, but MAY be
useful for a fake DNS server which does not have direct connection
to the Internet.

2.3 Meta information labels definition

Clients MAY need some additional information to treat
internationalized host name. Followings are definition of commonly
used information to characterize a certain TLD.

2.3.1 norm-form

This label indicates normalization form, and expresses as follows:

    norm-form IN TXT "`NORMALIZATION_FORM'"

Here, `NORMALIZATION_FORM' is a combination of normalization form
as:

    unicode-form-C    Unicode normalization form C described in
                      [UTR15]
    unicode-form-KC   Unicode normalization form KC
    downcase          Downcase alphabets
    upcase            Upcase alphabets
    "" (Null string)  No normalization
    +                 Combination mark

For example,

    norm-form IN TXT "downcase+unicode-form-KC"

represents that downcase alphabets applied first and then
normalization form-KC.

Combination of normalization form MUST be represented with
combination mark to specify applying order clearly.


2.3.2 unicode-form-version

This label indicates version of [UTR15] to be applied, and expresses
as follows:

    unicode-form-version IN TXT "`VERSION_NUMBER'"

Here, `VERSION_NUMBER' is a number of [UTR15].

2.3.3 unicode-form-url

This label indicates URL of [UniData] correspondng to version number
of unicode-form-version, and expresses as follows:

    unicode-form-url IN TXT "`URL_OF_UNIDATA'"


4. Canonicalization algorithm

DNS clients that uses the table for canonicalization SHOULD adopt
following algorithm.

1) Confirm existence of NS RR of a certain TLD under .idn.arpa.
   If it does not exist, ends lookup.
2) Lookup `norm-form' meta-information label. If it does not exist,
   no normalization is adopted. If it exists, adopt early
   normalization according to its TXT RR value.
   If the value of `norm-form' contains unicode-norm-C or KC,
   clients MAY lookup its version as value of `unicode-form-version'
   TXT RR, compare if it matches the version client knows, and if
   not, complain about version mismatch with the value of
   `unicode-form-url' TXT RR.
3) Lookup each characters in each domain component except for TLD
   and substitute them with obtained TXT RRs. If lookup were failed
   with NXDOMAIN, then leave the original character alone.

Thus, number of queries increases very much that does not match
requirements [#33-03] of [IDNReq].

5. Issues

5.1 Canonicalization of TLDs

This document assumes TLDs are ASCII-only and matching rule is case
insensitive. But when IDN were take place, technically there would
be no objection to internationalized TLDs. That is, name space and
canonicalization algorithm defined in this document SHOULD be
expanded to fit internationalized TLDs.

5.2 Reduction of query traffic

As mentioned above, canonicalization algorithm defined in this
document increases DNS traffic. Even if the effect of DNS caching
servers were expected, it would not be substantial solution of this
issue. To reduce number of queries, not character by character basis
but combination of character basis method SHOULD be defined.
Appendix B shows one of such method.


6. IANA Considerations

This document intend to create new second level domain named IDN
under ARPA.


7. Security Considerations

Lookup method described in this document is character by character
basis so that causes many DNS queries. Denial of Service atteckers
can pretend this method to attack DNS servers.


8. References

[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, RFC 2119.

[UTR15] Mark Davis and Martin Duerst. Unicode Normalization Forms.
Unicode Technical Report #15.
<http://www.unicode.org/unicode/reports/tr15/>.

[UniData] The Unicode Consortium. UnicodeData File.
<ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt>.

[IDNReq] James Seng, "Requirements of Internationalized Domain Names",
draft-ietf-idn-requirement.


9. Acknowledgements

Many advices from David Conrad, James Seng, and JPNIC IDN-TF
members, especially Izuru Sato, and Yasuhiro Morishita.


10. Author Contact Information

Yoshiro YONEYA
NTT Software Corporation
2-15-2 Kohnan, Minato-ku Tokyo 108-6113 Japan
TEL: +81 3 5782 7291
FAX: +81 3 5782 7222
E-Mail: yone@po.ntts.co.jp


Appendix A. Sample of zone file

$ORIGIN some-tld.idn.arpa.
@ IN SOA ns.some-tld.idn.arpa. hostmaster.ns.some-tld.idn.arpa. ( ... )
  IN NS  ns.nic.some-tld.
;
norm-form         IN TXT "unicode-form-C"
norm-form-version IN TXT "3.0"
norm-form-url     IN TXT "ftp://ftp.nic.some-tld/pub/Unicode/";
;
A IN TXT "a"
;         :
Z IN TXT "z"
a IN TXT "."
;         :
z IN TXT "."
0 IN TXT "."
;         :
9 IN TXT "."
- IN TXT ".^"
  IN TXT ".$"

Appendix B. Using DNAME to reduce query

If both DNS server and client are DNAME RR capable, and DNS server
were recursive mode, then using DNAME can reduce queries. In above
sample, change definition of A-Z as follows:

A IN CNAME a
  IN DNAME a
;          :
Z IN CNAME z
  IN DNAME z

And add definition each after a-z0-9- as follows:

a IN TXT   "."
  IN DNAME some-tld.idn.arpa.
;          :
- IN TXT   ".^"
  IN TXT   ".$"
  IN DNAME some-tld.idn.arpa.

That is, characters that have canonical form are defined as CNAME
and DNAME, and characters that are already canonical form are
defined as itself and DNAME to `TLD'.idn.arpa.

Then, client can canonicalize characters in one query. For example,
canonicalization of `some-name.some-tld.' can be done with follows:

  s.o.m.e.-.n.a.m.e.some-tld.idn.arpa