[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] draft-liana-idn-stone is submitted
A New Internet-Draft is available from the on-line Internet-Drafts
directories.
This draft is a work item of the Internationalized Domain Name Working
Group of the IETF.
Title : Establishing a Rosetta Stone of Internet
Author(s) : Liana Ye
Filename : draft-liana-idn-stone-00.txt
Pages : 11
Date : 28-Sep-01
Spoken language is human nature. An IDN system that users want is
always closely associated with linguistic issues. To represent a symbol
of a script consistently, recognizable and easily accessible for
widest acceptance around the world has been the wish of many Internet
engineers. For sensible, thus consistent and lasting IDN Domain Name
identifiers, these symbols have to be indexed on their phonetics, which
is the basic element of linguistics. A joint effort from Unicode
Consortium, Library of Congress, The International Phonetic Association
and IETF IDN working group is necessary to codify transliteration scheme.
Internet Draft Liana Ye
draft-liana-idn-stone-00.txt Y&D ISG
Sept. 29, 2001
Expires in six months (March 2002)
Establishing a Rosetta Stone of Internet
Status of this memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsolete by other documents
at any time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed
at http://www.ietf.org/shadow.html.
Abstract
Spoken language is human nature. An IDN system that users want is
always closely associated with linguistic issues. To represent a symbol
of a script consistently, recognizable and easily accessible for
widest acceptance around the world has been the wish of many Internet
engineers. For sensible, thus consistent and lasting IDN Domain Name
identifiers, these symbols have to be indexed on their phonetics, which
is the basic element of linguistics. A joint effort from Unicode
Consortium, Library of Congress, The International Phonetic Association
and IETF IDN working group is necessary to codify transliteration scheme.
Table of Contents
1. Introduction
2. Available Resources and Issues to be Addressed
2.1 Unicode Table
2.2 ALA-LC Transliteration Table
2.3 The IPA table
2.4 CJK Romanization tables
3. Limits of IDN Identifiers
4. Scope of Transliteration
1. Introduction
Blanket treatment of Unicode is technically feasible, but difficult
for human access, and it is vulnerable to fence against confusion among
similar symbols of different scripts. For example, a Bopomofo symbol
appears among a Chinese character string, it is very difficult for a
program or a human to say the name is not a Japanese name. This common
type of confusion will deem an unusable IDN implementation.
However, the solution for the above example will be trivial if the
name registration context, such as input language module, is retained
by the IDN program, then there will be little chance to interpret the
input as Japanese. Retaining the input context can result into many
formats. This document suggest a format associated with users¡¯ spoken
language in Latin alphabet and numerals, and thus Romanized
transliteration of non-Roman scripts, which is extended from
accumulated works from libraries, dictionary publishers and foreign
language educators, is proposed.
1.1 Context
World-wide desire to use characters other than plain ASCII in
hostnames is bubbling up and accelerating. Hostnames have become
the equivalent of business or product names for many services
on the Internet, here also referred as trade names, for some oriental
users, they are nationwide trademarks in fact. The need to make
them usable by people whose native scripts are not directly
representable by ASCII, the need for network support workers to
diagnose URI [RFC2396], the need for expanded and diverse name server
network to sort and manage zone files, the need for increasing number
of non-native readers, who are not using their native scripts to refer
to trade names in daily activities, and the need to minimize possible
security leaks when international domain names are implemented in
Domain Name Servers (DNS) have to be addressed. ¡°One aspect of the
challenge is to decide how to represent the names users want in the DNS
in a way that is clear, technically feasible, and ensures that a name
always means the same thing.¡± The problem is addressed in [RFC 2825]
when the Internet community is pushed by users to face the rudimental
issue. More detailed requirements on internationalizing hostnames are
described in the IDN Working Group's requirements document [IDNReq].
1.2. Reality of Romanization
To facilitate one DNS symbol set for users of different languages
with above technical and security considerations, a Romanization
process from Non-Roman scripts to US-ASCII is unavoidable. Language
Romanization has been a fact around the globe since Russia
standardized Cyrillic for many eastern European languages in the
1920's, Turkey changed from Arabic to Latin script in 1928, and
China adapted Pinyin as a supplemental phonetic system for Han script
in 1958. Consistent development of transliteration scheme for
non-Roman scripts has lead to the jointed publication of ¡°Romanization
Tables¡± from American Library Association and Library of Congress in
1997 [Translit 97].
In the past three decades, software implementation of such a
process has extended from a user to his qwerty keyboard, from a
keyboard to text editors of various kinds, from text editors
to mail services, from mail services to internet address resolvers.
To unify a fragmented Romanization implementation reality for use
as IDN hostname identifier, a written documentation is overdue to
address issues as basic as stated by [DeFrancis 1989]:
"The adaptation of Latin alphabet to represent a great variety
of spoken languages means of course that the value of specific
symbols varies from language to language. This is true both of
the European adaptations, which in most cases came about rather
haphazardly, and of the more recent creations based on more
carefully thought-out linguistic principles. So it is that the
French 'u' has a different value from that in English. The letter
'j' represents one sound in English 'jam', another in German 'ja'.
The initial sound of English 'sure' is written 'sz', in Polish,
Czech. The sound represented by English 'ts' is written in 'c'
in Polish, Czech, Hungarian, Serbo-Croatian, and Chinese."
Unification of symbol systems always brings about some loss from
the original systems, especially in this fast growing internet
era, and the native language of a household can be lost in only one
generation in a localized bilingual environment. In order to
retain the colorful heritage of the world, means to provide easy
reference to the original sound system should be addressed too.
1.3 Author's Disclaimer
This document is an author¡¯s suggestion grew out of discussions from
the IETF IDN online discussion group. It does not intent to question
the current working goals of any groups or standard bodies, and is
impossible to force any working groups to any directions.
1.4 Terminology
The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED",
and "MAY" in this document are to be interpreted as described in
[RFC2119].
Examples in this document use the notation from the Unicode Standard
[Unicode3] as well as the ISO 10646 names. For example, the letter
"a" may be represented as either "U+0061" or "LATIN SMALL LETTER A".
A non-Roman character also is denoted in its Romanized form and
followed by its English equivalent word in <>. For example, ¡°zhong
<heavy>¡±.
1.5 IDN summary
Transliteration of Unicode symbols to be used in IDN for DNS compatible
identifiers for foreign user friendliness and ease of zone file
management is inevitable. Due to existing resource already in place,
the additional work to simplify and conciliate some differences is
limited in scope, but would benefit Internet communication on the long
term and may benefit the IDN system design immediately.
2. Available Resources and Issues to be Addressed
Character standardization has always been an consistent effort since
human civilization has written languages. Unicode Consortium, Library
of Congress, The International Phonetic Association and various
national education and standardization bodies, as well as dictionary
publishers are all parts of this effort. When IDN Working Group
examines all the available resources for an ease accessible and
non-confusion Internet domain name system, to provide a feasible
technical design to an Internationalized Domain system, it appears, a
systematic, technical oriented study of current resource is necessary.
2.1 Unicode Table
Unicode is an on going effort for Internet era. Unicode concentrates
on the graphic features of a character, while some issues can not be
dealt with efficiency in current Unicode structure [Unicode 3]. For
example a large equivalent traditional Han character with its
simplified character set is not addressed in current Unicode Consortium
[Unicode]. This phenomena is more prominent when multi-script is used
in the same context. For example, the following equivalent set for
Small Latin Letter h, has already become an 9 to 1 case mapping
in the current [nameprep] specification:
0048; 0068; Case map
210B; 0068; Additional folding
210C; 0068; Additional folding
210D; 0068; Additional folding
1D407; 0068; Additional folding
1D43B; 0068; Additional folding
1D46F; 0068; Additional folding
1D4D7; 0068; Additional folding
1D573; 0068; Additional folding
The known equivalent set such as addressed in [nameprep] and Chinese
Traditional and Simplified character set [SC Table86][Tsconv] are well
studied, with standards in place to work from.
More serious question has been asked on symbols, which are look
alike but with unrelated semantics in the context of a domain name,
for example, cases analogize to many look-alike Chinese characters.
Due to the limitation of an text file restriction on the IETF draft
documentation, let us take a simple glyph from Armenian:
Armenian small letters ¡°h¡±, ¡°g¡¯, ¡°f¡±, ¡°o¡±, ¡°n¡±, ¡°u¡± can be considered
as identical to Latin letters. Should these letters be mapped to
above Latin set or not? If they are not mapped to Latin letters,
as in current [nameprep] specification, what will happen when an
Armenian picks up letters from Latin set? Does this lead to a correct
match in a compressed ASCII Compatible encoding (ACE)? How do we know
what is going wrong with the ACE, if this is not a match?
If the above Armenian Letters are mapped to Latin set, then the
¡°0048; 0068; Case map¡± would be increased to 10 to 1 case mapping. If it
is so, then what about the rest letters in Armenian character set. Do
the rest of the letters in Armenian script should be mapped to a look
alike character too? For example, the Armenian upper case ¡°n¡±,
Armenian capital letter VO looks like
Thai Letter KO KAI,
Lao Letter DO, and
Georgian Letter GHAN.
If more consideration in writing style variations, following the case
mapping for Latin in above ¡°0048; 0068; Case map¡± example, which indeed
including writing style variations, then the same Armenian capital
letter VO, is similar with
Bopomofo Letter M,
Hiragana Letter RI, and
Katakana Letter WA.
How can we sort out the original intent of a registrant?
While IDN working group may be forced to take language tagged measure
[ISO639][IDNmap] to retain the users¡¯ language context, to separate
different scripts, there are still questions as which script is allowed
to be mixed with another script, are there any desires to have such a
mix as current American English is reported in [Alphabet]? For example,
can Azerbaijan users have freedom in choice of using Arabic, Cyrillic
or Latin scripts[Translit 97, P.24][Mercury 2001-7-30]? Can Azerbaijan
use the mix of the three scripts like Japanese language? If two
scripts are mixed, where the equivalent symbol set should be defined?
2.2 ALA-LC Transliteration Table
Romanization of non-Roman scripts has been an effective method in
libraries of United States to catalog documents. However, due
to limitations in reality, and historical consistency in recording
these material, the transliteration of symbols depart from popular
usage of one or more locations. For example, Arabic transliteration
is using more diacritics than transliteration used in popular Arabic
teaching text. Taking the Arabic Letter Alef from Unicode table: it
is called Alif in [Translit 97], and transliterated to U+0101, Letter
a with a bar on top. While the same Alef is transliterated to ¡°aa¡±
in text book [Nichlas 86].
In addition, Romanization has to be consistent with the original
phonetics of the spoken language. For example, Library of Congress
has been using a regional Chinese dialect and English Wade-Giles
based phonetic system to transliterate Chinese material until
recently [PinyinConv]. Are there any similar cases should be examined
before IDN implement them into an IDN system?
In addition, transliteration follows certain rules specific to
particular script and/or language. When these rules apply to
transliteration to host names, they have to be simplified. For example,
the majority of white spaces may be omitted for alphabet languages, and
the same rule works for most Arabic languages with a few exceptions
which can not be ignored. The same rule is not directly apply
to CJK in the current [Translit 97] treatment. There is a need for IDN
working group to cooperate with Library of Congress to specify the
rules of transliteration of individual scripts.
2.3 The IPA table
It appears that the scheme followed in [Translit 97] is very close to
IPA specification. However, for each language, a subset of IPA symbols
is used in each transliteration scheme, due to the characteristics of
particular language under treatment. This may imply, a limited number
of diacritic marks is needed for each language, similar with the 26
letter limit in Latin. For ease of use by a common user of a language,
a smaller number of diacritic marks needs to be defined and to be
allowed to represent different diacritic values of a language, so it is
that the French macron, ¡°-¡° may has a different value from that in
Chinese transliteration.
The limit of the number of phonetic and diacritic elements to be
represented in domain names of one language is placed by [STD13] and
the recent ICANN decisions on a stable DNS. The available symbols to
be used in the transliteration are ASCII [a-z0-9] and hyphen ¡°-¡°.
2.4 CJK Romanization Tables
CJK[CJK] phonetics are syllabic based, where a code point in Unicode
is an independent unit in semantics, called ¡°zi¡±, in Chinese, ¡°ji¡± in
Japanese and ¡°ja¡± in Korean, and somewhat similar with ¡°stem¡± in
English. These syllabic based symbols are characters of CJK script and
traditionally are treated as such in computer processing, as well as
in UCS standard. However, in transliteration scheme, the symbols
are often treated in groups of characters, called ¡°ci¡± in Chinese,
¡°shi¡± in Japanese, and ¡°word¡± or ¡°phrase¡± in English.
From the past computer processing of ¡°zi¡±, it seems to be a disjointed
field with the processing of a ¡°ci¡±. Since ¡°zi¡± is ¡°only an input¡±
processing problem, while processing of ¡°ci¡± is computational
¡°linguistic¡± field, which is a sub-field of Artificial Intelligence,
and too complex to be considered for a flat treatment of any types.
Granted that these are two separated computer research fields in the
early days of software development, today, it is difficult to separate
the two, since many of the techniques in treating a ¡°ci¡± benefits
treatment of a ¡°zi¡±. However, this is not to conclude that the
techniques used in CJK character input processing can not be extended
to treat much larger character set such as UCS.
Transliteration scheme followed in [Translit 97] also has shown
differentiated treatments among the three CJK languages. For
Chinese, the transliteration is character based, while Korean and
Japanese are word-based transliteration. Each has its limits to be
used in IDN system as an identifier. The word transliteration is
sufficient in identify a multi-syllabic hostname in DNS system, but
is insufficient in identify a glyph in Unicode. Also it is difficult
to reverse a word to a sequence of characters since much of its
character content has not been preserved. A CJK character content
here refers to the composition scheme of several radicals, also
referred as CJK glyph content. A CJK glyph content is the character
compositions to be explicitly preserved for glyph identification in a
similar way described in [UAX 15]. There are about 1,000 such
characters and radicals [Mao 87] from GB[GB] standard, and have code
points in UCS without definitive guide of use. A consistent
transliterated radical list for composing a transliterated CJK
character¡¯s glyph content for CJK code points is to be confirmed by a
standard body.
The character based transliteration is easily to retain the character
boundary, but would have the same problem in identify a CJK character
without preserving its original glyph content explicitly. The solution
proposed by StepCode [StepCode] provides a progressive transliteration
scheme to preserve a CJK glyph content for each CJK character in
Unicode, and method to extract a DNS hostname consistently from such a
character encoding to obtain a word-like DNS identifier.
The progressive character transliteration scheme can only give one
encoding per character per language, thus a primary encoding has to be
elected from each equivalent transliteration coding set. For example,
the Kanji <business> has two Romaji:¡°gyo1¡± and ¡°go1¡±, one of the two
Romaji has to be elected as its primary transliteration scheme.
3. Limits in IDN identifiers
IDN identifiers are names of an entity in user familiar script. IDN
identifiers give just one name per entity. An IDN identifier is just
one code per character under one language tag, it has no second code
or second guess for the same identifier. The final goal is to arrive
at only one table lookup is necessary to convert an IDN name to a DNS
name, which has eliminated all the ambiguities of an IDN domain name
may imply. If we don¡¯t have such a concept of ¡°one¡± to be
defined on a solid symbolic representation in DNS system, then there
is no ¡°two¡± or ¡°three¡± to be based on, no matter what types of keyword
searching system there may be. In another word, the IDN encoding
system is the process to eliminate ambiguities associated with a
spoken language, with a script, as well as the registrants¡¯ intention.
The IDN system design process is the process to peel off all the
ambiguities and derive at a unique identification of a domain name.
The ambiguities peeling process is obviously start at the outer most
layer.
3.1 One Language per domain name
This layer is readily falling off, like the dry out and cracked onion
skin. However, it is necessary to spell it out as one spoken language
as defined in [ISO 639] per domain name, thus it is REQUIRED only one
language tag possible for one host name.
3.2 One Romanized transliteration per language tag
As we all know, that there are many spoken languages using one script.
The example of Japanese and Korean are using CJK characters are not
in this category, they are different languages as they are defined in
[ISO 639]. However, there are other spoken languages using the same
CJK script, such as Cantonese(as [ISO 639] has decided, which will
not have a language tag in IDN [IDNMap]), they are considered as a
dialect of Chinese. So there is only one phonetic system per language
tag is allowed in IDN, such that Cantonese from China and Kun from
Japan may not be considered. It is REQUIRED that only one Romanized
transliteration scheme per language tag.
3.3 One Presentation Direction for All DNS Identifiers
It is also the fact, that human scripts may be read in different
directions. Latin is read from left to right, Arabic is read from
right to left, Mongolian is read from top to bottom, and Chinese may
be read from all the three directions. For Romanized transliteration
to be used as DNS identifiers, all the names are read from left to
right. It is REQUIRED, that transliterated scripts follow the left to
right reading convention of Latin script.
3.4 One break per word
Word boundary is not the same nor is presented consistently cross
all languages. Many transliteration use a hyphen, an apostrophe, a
space, or a letter to indicate a break between two phonetic units or
two semantic units. For example, transliteration of Chinese is one
character per break, while Korean and Japanese are several characters
per break [Translit 97]. Domain name label is a limited space, while it
is possible to include several words in the English sense without
any break indicators, many non-Roman based scripts need breaks for
correctly reversing transliterated script to the format of the original
script. For example, the name ¡°an¡¯gang¡± refers to a large steel entity
in China, and it has to be interpreted as two characters with a break
at the apostrophe, such that ¡°an-gang¡± does not become ¡°ang-ang¡±.
The definition of a word in the IDN context is a semantic or a sound
unit which has to be independent to its neighbor characters. Two
disjoint semantic or sound parts of a domain name in transliteration
form is REQUIRED to only use a hyphen in preserving the original script
form during transcription of an IDN name.
3.5 One IDN character per equivalent character set
Characters may be defined as equivalent in semantics. For example, the
¡°0048; 0068; Latin Case map¡± in Section 2.1 and the Chinese traditional
to simplified character mapping in [Tsconv] are the scripts have been
examined carefully and have standing standards to refer to[Nameprep]
[SC Table86].
Similar cases exist in other languages too, which have to be carefully
studied before deployment of an IDN implementation. For example, ¡°kan
<can, tin>¡± in Kanji has two semantically equivalent forms, a
traditional Han form and a simplified Kanji form. In the same language
and script of Japanese, another Kanji ¡°so <contest>¡± has exact the same
traditional and simplified forms with Chinese character ¡°zheng <contest>¡±
of the same semantics. If such cases are not well studied and treated
before deployment of an IDN system, then it is foreseeable,
controversial uses of the same IDN identifier will certainly lead to
an unusable system. An inter-language semantic equivalent IDN character
set is REQUIRED to be defined and only one IDN identifier per such a
set is permitted.
As to Korean, most Hangul character corresponding to one or more Hanja
characters from CJK character set. When an equivalent character set
mapping is permitted for all scripts, as it is treated similarly with
Latin case mapping, there are decisions to be made on pros and cons of
such a mapping for the users of different languages. For example, a
Hanja character of Korean has choice to be in an equivalent set with
Hangul, or to be not in such a set. A careful evaluation of equivalent
set is necessary by the users of Korean, Indian, Azerbaijan, etc. as
well. It is REQUIRED only one IDN identifier is permitted from a set
of semantically equivalent IDN characters within one language tag.
3.6 One explicit phonetic value per character
In addition, individual character especially Han characters pronounced
differently in different context, and has different semantics. For
example, the character ¡°chong <double>¡± also pronounced as ¡°zhong
<heavy>¡±, but IDN can only allow one of the pronunciations to be used
as identifier. It is REQUIRED that one transliteration per character
per language tag, and it is RECOMMENDED, that the pronunciation often
used to denote a name is adopted for IDN identifier.
3.7 One Delimiter Digit per Character
A character is a code point in UCS. A character in a transliterated
form is an alpha-numeral string with at least one Latin letter.
When several characters in a sequence of transliterated form of a
particular language, a numeral is used for two functions: diacritic
value and character delimiter when applicable. If a Latin letter has
two diacritics, for example, one on top and the other on the bottom of
the letter, the limit is still enforced. The way to observe the limit
is to either treat the two diacritics together as a special mark, or
omit one of them.
4. Scope of Transliterations
The Internet Rosseta Stone for human script transliteration is to
1) codify CJK character transliteration;
2) codify [Translit 97] transliteration rules for easy word assemble
and disassemble routine;
3) suggest implementation of language tag list with consideration of
the following questions:
a) What does the language tagged user group expecting the IDN domain name look like?
b) Is this language tagged script used interchangeably with another script? Are they
used as a mixed script?
c) How does the concerned script deal with foreign concepts?
d) How does the concerned script deal with foreign sounds?
e) Are there any wishes for the script to be mixed with other scripts? Which ones?
4) provide an easy accessible documentation of embedded differentiation
from a script to its transliteration. The documentation may be
similar to ALA-LC Romanization Tables, with additions of
universal on-line access,
language tag link and included scripts,
phonetics of each symbol in IPA notation, and
its sound file links into exiting IPA language sound database,
diacritic value of each digit for different languages,
hyphenation for embedded word breaks.
It is RECOMEMDED to name this document as ¡°idn-mnemonic¡±.
5. Security Considerations
Much of the security of the Internet relies on the DNS. Thus, any
change to the characteristics of the DNS can change the security of
much of the Internet. Thus, transliteration makes no changes to the
DNS itself, therefor there is no foreseeable security changes to DNS.
6.Internationalization considerations
The proposal is affecting all the domain name users using non-Latin
scripts, and Latin scripts if diacritic marks to be a part of IDN
identifiers.
7. References
[Alphabet] "Repertoires of characters used to write the indigenous languages
of Europe", A CEN Workshop Agreement, Version 2.8, TECHNICAL REPORT,
Draft: 1998-12-14. http://www.egt.ie/alphabets/#1.3
[ASCII] American National Standards Institute (formerly United States
of America Standards Institute), X3.4, 1968, "USA Code for
Information Interchange". (ANSI X3.4-1968)
[CJK] James SENG and etc. ¡°Han Ideograph (CJK) for Internationalized
Domain Names¡±, draft-ietf-idn-cjk-01.txt, 11th Apr 2001.
[DeFrancis 1989] John DeFrancis, "Visible Speech - The Diverse
Oneness of Writing Systems", 1989, ISBN 0-8248-1207-7.
[GB] China National character code exchange standard.
[IDNmap] Liana Ye, ¡°IDN Code Exchange Mapping Structure¡±,
draft-liana-idn-map-00.txt, Sept. 2001.
[IDNReq] Zita Wenzel and James Seng, "Requirements of Internationalized
Domain Names", draft-ietf-idn-requirements. May 2001.)
[IPA] The International Phonetic Alphabet,
http://www2.arts.gla.ac.uk/IPA 1996.
[ISO639][ISO639-2/T] ISO/IEC 639-2 2001 Codes for the Representation of
Names of Languages.
[Mao 87] Mao, Yuhang, ¡°Direct Radical-Consonant Coding of Chinese
Characters¡±, Proceedings 1987 International Conference on Chinese and
Oriental Language Computing, Chicago, USA, 1987.
[Nameprep] Paul Hoffman and Marc Blanchet, "Preparation of
Internationalized Host Names", draft-ietf-idn-nameprep, July 2001.
[Nichlas 86] Nicholas Awde & Putros Samano, ¡°The Arabic Alphabet¡±, 1986,
ISBN 0-8184-0430-2
[PinyinConv] Library of Congress Pinyin Conversion Project, ¡°New
Chinese Romanization Guidelines
http://lcweb.loc.gov/catdir/pinyin/romcover.html#7
[RFC 2026] S. Bradner, ¡°The Internet Standards Process -- Revision 3¡±,
1996, RFC 2026.
[RFC2119] Scott Bradner, "Key words for use in RFCs to Indicate
Requirement Levels", March 1997, RFC 2119.
[RFC2396] Tim Berners-Lee, et. al., "Uniform Resource Identifiers (URI):
Generic Syntax", August 1998, RFC 2396.
[RFC2825] L. Daigle, Ed. ¡°A Tangled Web: Issues of I18N, Domain
Names, and the Other Internet protocols¡±, May 2000, RFC 2825.
[SC Table86] "A Dictionary of Chinese Character Information¡±,
Scientific Publishing, 1988, ISBN 7-03-000869-3/H.3
[StepCode] Liana Ye, "StepCode - A Mnemonic Internationalized Domain
Name Encoding", draft-ietf-idn-step-01.txt.
[STD13] Paul Mockapetris, "Domain names - implementation and
specification", November 1987, STD 13 (RFC 1035).
[Translit 97] Barry, Randall K. 1997. ALA-LC romanization tables:
transliteration schemes for non-Roman scripts. Washington: Library
of Congress Cataloging Distribution Service. ISBN 0-8444-0940-5
[Translit 97, P.24][Mercury 2001-7-30] ALA-LC romanization tables,
¡°Azerbaijani¡± Arabic to Latin table, Page 24.
Aida Sultanova, ¡°Azerbaijan Mandates use of Latin alphabet¡±, San
Jose Mercury News, July 30, 2001.
[Tsconv] XiaoDong LEE, etc. "Traditional and Simplified Chinese Conversion",
draft-ietf-idn-tsconv-00.txt, June 2001.
[UAX15] Mark Davis and Martin Duerst. Unicode Standard Annex #15: ¡°Unicode
Normalization Forms¡±, Version 3.1.0.
<http://www.unicode.org/unicode/reports/tr15/tr15-21.html>
[UCS][UNICODE] The Unicode Consortium, "The Unicode Standard".
Described at http://www.unicode.org/unicode/standard/versions/.
[UNICODE30] The Unicode Consortium, "The Unicode Standard -- Version
3.0", ISBN 0-201-61633-5. Same repertoire as ISO/IEC
10646-1:2000. Described at http://www.unicode.org/unicode/
standard/versions/Unicode3.0.html.
8. Acknowledgements
The author has benefited from energetic discussions regarding IDN
system design issues. Among many comments, special arguments or
instructions which have contributed to inspire the draft of this
document were from James Seng, Eric Brunner, Mark Davis, Patrik
Faltstrom, L.M.Tseng, Soobok Lee, Martin Duerst, Harald Tveit
Alvestrand, Xiaodong Lee, Roozbeh Pournader , Deven Kalra, Adam M.
Costello, Paul Hoffman, Bruce Thomson, and John C Klensin.
9. IANA Considerations
This document requires IANA action for availability of script tag,
and registration for each tag and possibly its sub-field for
phonetic system used.
10. Authors' Contact Information
Liana Ye
Y&D ISG
2607 Read Ave.
Belmont, CA 94002, USA.
(650) 592-7092
liana.ydisg@juno.com