[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Comments on Nameprep: incomplete preparation



Last call comments to draft-ietf-idn-nameprep-07.txt:

   In Unicode 3.1, unified Han ideographs occupy slightly more than 75%
    of the total encoded characters; CJK is a major subset of Unicode.
   Since Nameprep uses Unicode 3.1 as the repertoire for IDN, CJK-DN
   is also a part of IDN. ( http://www.unicode.org/unicode/reports/tr27/ )

   As demonstrated in:
      http://www.iis.sinica.edu.tw/~wuch/idn/examples/mixinput.htm
      (referenced by http://www.imc.org/idn/mail-archive/msg05795.html )
      http://www.iis.sinica.edu.tw/~wuch/idn/examples/variant.htm
   Many CJK variants in Unicode 3.1 are similar-looking and
   easy-confusing for typical users, but they were not adequately considered
   in Nameprep. The CJK sub-space of the IDN Identifiable Namespace
   implicitly defined by Nameprep will then make IDN ambiguous and
   disputatious. Even by registration policy, a standardized preparation for
   CJK-DN is still required to maintain the globalness and uniqueness of
   IDN without distrusting the IDN namespace.

So draft-ietf-idn-nameprep-07.txt should be revised
    to explicitly address that this draft is incomplete and another
           preparation for CJK-DN parts is required to achieve
           completely-prepared IDN,
    or to explicitly address that this draft
          does not process Han ideographs of Unicode 31.


Chun-Hsin Wu
** Identifier

                  Encoded/Represented       Prohibited
        Language           -->   Encoding Set   -->  DNS Identifier Set
        What seen/heard
        (glyph/script/font)        char                 Identifier
        Human name                 String             Domain Name(label)
        (Infinite)                 (Limited)           (Restricted)

  DNS:  English-Only       -->     US-ASCII     -->   LDH [\-A-Za-z0-9]
  IDN:  Multi-lingual      -->   Unicode  3.1   -->  Nameprep-prohibited

        Language-level           Encoding-level       Identifier-level
         conversion                conversion            conversion


** DNS Namespace: based on DNS Identifier Set

                                  Equivalent
          DNS Syntactic Namespace  =======>  DNS Identifiable Namespace
Size~=  Sigma  (26*2+10+1)^i, i<=63           ~= (26+10+1)^i, i<=63
  DNS:                          Case mapping

  NamePrep:                Mapping/Normalization          ?????
                    KC form: CJK Compatibility Ideographs
                    (U+F900 ~ U+FA2D Compatibility Table)

  +TSConv:                Unified CJK Equivalence
                     (TSConv: context-free, identifier-level)

                            Equivalent(IDN1,IDN2)?
              Valid(IDN)?  =======================>  Resolver(IDN)


   Trustable IDN Identifiable Namespace:
      . Finite (Stable)
      . Global (Consistent)
      . Unique (Unambiguous)

   Design constraints:
      . Backward compatibility => ACE-ed
      . Forward compatibility

   A CJK equivalence table is then necessary for IDN to declare
   its IDN identifiable namespace such that it can be trusted and
   globally unique, regardless of where the table should be
   referenced (client/server).