[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] Comments on Nameprep: incomplete preparation
Last call comments to draft-ietf-idn-nameprep-07.txt:
In Unicode 3.1, unified Han ideographs occupy slightly more than 75%
of the total encoded characters; CJK is a major subset of Unicode.
Since Nameprep uses Unicode 3.1 as the repertoire for IDN, CJK-DN
is also a part of IDN. ( http://www.unicode.org/unicode/reports/tr27/ )
As demonstrated in:
http://www.iis.sinica.edu.tw/~wuch/idn/examples/mixinput.htm
(referenced by http://www.imc.org/idn/mail-archive/msg05795.html )
http://www.iis.sinica.edu.tw/~wuch/idn/examples/variant.htm
Many CJK variants in Unicode 3.1 are similar-looking and
easy-confusing for typical users, but they were not adequately considered
in Nameprep. The CJK sub-space of the IDN Identifiable Namespace
implicitly defined by Nameprep will then make IDN ambiguous and
disputatious. Even by registration policy, a standardized preparation for
CJK-DN is still required to maintain the globalness and uniqueness of
IDN without distrusting the IDN namespace.
So draft-ietf-idn-nameprep-07.txt should be revised
to explicitly address that this draft is incomplete and another
preparation for CJK-DN parts is required to achieve
completely-prepared IDN,
or to explicitly address that this draft
does not process Han ideographs of Unicode 31.
Chun-Hsin Wu
** Identifier
Encoded/Represented Prohibited
Language --> Encoding Set --> DNS Identifier Set
What seen/heard
(glyph/script/font) char Identifier
Human name String Domain Name(label)
(Infinite) (Limited) (Restricted)
DNS: English-Only --> US-ASCII --> LDH [\-A-Za-z0-9]
IDN: Multi-lingual --> Unicode 3.1 --> Nameprep-prohibited
Language-level Encoding-level Identifier-level
conversion conversion conversion
** DNS Namespace: based on DNS Identifier Set
Equivalent
DNS Syntactic Namespace =======> DNS Identifiable Namespace
Size~= Sigma (26*2+10+1)^i, i<=63 ~= (26+10+1)^i, i<=63
DNS: Case mapping
NamePrep: Mapping/Normalization ?????
KC form: CJK Compatibility Ideographs
(U+F900 ~ U+FA2D Compatibility Table)
+TSConv: Unified CJK Equivalence
(TSConv: context-free, identifier-level)
Equivalent(IDN1,IDN2)?
Valid(IDN)? =======================> Resolver(IDN)
Trustable IDN Identifiable Namespace:
. Finite (Stable)
. Global (Consistent)
. Unique (Unambiguous)
Design constraints:
. Backward compatibility => ACE-ed
. Forward compatibility
A CJK equivalence table is then necessary for IDN to declare
its IDN identifiable namespace such that it can be trusted and
globally unique, regardless of where the table should be
referenced (client/server).