[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Layer 2 and "idn identities" (was: Re: [idn] what are the IDN identifiers?)



In a message dated 2001-12-03 2:01:56 Pacific Standard Time, 
liana.ydisg@juno.com writes:

> I see that you have not read the I-D yet, and Deng Xiang 
> has replied your Chinese vs. Japanese arguement, I 
> will wait for your comment on the language-tag issue,
> or anything not up to your standard.

Here are some specific concerns related to items in 
draft-Liana-idn-map-00.txt.

| The proposed ACE is a mnemonic encoding scheme,
| and is called StepCode [StepCode]. 

Hasn't AMC-ACE-Z already been chosen as the standard ACE for IDN?  I would be 
surprised if the decision were made to use two different ACEs depending on 
the language, or script, of the encoded text.

| U-s   U-p A-p
| U+0041  U+0061    a      (Latin Letter A case folding)
| U+2fc2  U+2ee5    yv2    (Han character fish for Chinese case folding)

Several Chinese speakers and other experts have already, repeatedly, claimed 
that SC/TC mapping is NOT a 1-1 operation like Latin case folding.  If you 
think your users will be satisfied with the 1-1 solution only, go right 
ahead, but if this turns out to be inadequate and you need to propose a fix, 
get ready to hear a lot of people say "I told you so."

| To facilitate end users for the speed of IDN access as well as
| compatibility with existing applications, it is RECOMMENDED that an IDN
| code exchange table inculdes applicable local display standards
| corresponding with each applicable codepoints in UCS.

Backward mapping tables to convert Unicode to legacy standards, for the 
express purpose of allowing end-user software to delay the transition to 
Unicode?  Does this sound like a solution for the future?

| It is REQURIED to register a language tag with IANA and its
| associated script range whenever it is modified. 

There is already a perfectly good update process in place for both ISO 639 
and RFC 3066.

| To use mixed scripts in one IDN label is NOT RECOMMEMDED for an
| early deployment of IDN.

This immediately outcasts the Japanese, who have every reason to mix 
hiragana, katakana, kanji, and romaji.

|         Alphabet Sys.  Consonant Sys.  Character Sys.
|
| From: 0020            0530            2e80
| to:   052f            1bff            d7af
|
| include:Latin           Armenian        CJK
|         Greek           Hebrew          Kanji
|         Cyrillic        Arabic          Kana
|         IPA             Devanagari      Hangul
|         Vietnamese      Malayalam       Yi
|                         Thai
|                         Lao
|                         Tibetan
|                         ...

Sorry, it's just not that simple.  There are plenty of alphabets and 
alphabetic characters encoded above U+0530.  That's probably why the Unicode 
Consortium, while providing a list of blocks of code points like the 
following:

    # Start Code..End Code; Block Name
    0000..007F; Basic Latin
    0080..00FF; Latin-1 Supplement
    0100..017F; Latin Extended-A

is careful not to imply that ranges of code points are permanently reserved 
for *classifications* of scripts like this.

You can tell that the three ranges listed here are arbitrary and bogus even 
for the CJK scripts, by noting that Korean jamos (alphabetic) are located in 
the "consonant system" block, while the Japanese syllabaries (kana) and 
precomposed Korean syllables are in the "character system" block.

| Some cultures often use more than two scripts within the same group,
| such as Japanese, but rarely using another script especially from a
| different group. 

As noted above, the Japanese use four scripts from two different groups.

| The main issue in IDN-Map
| is to identify character equivalent sets, and reduce the number of
| applicable IDN identifiers by 1) limiting the applicable IDN input code
| points to Plane 0 of Unicode table,

Has anyone else so far proposed that supplementary characters be flat-out 
prohibited from occurring in IDN identifiers?  Why should they be singled out 
as a way to "reduce the number of applicable IDN identifiers"?

| It is RECOMMENDED that reasonable studies are given to each language to
| classify script treatment model, and a cost vs. benifit analysis in select
| a long term script specific processing protocol to be embedded in IDN
| language specific modules.

This won't disrupt the schedule of the working group, will it?

| canonicalization

This word has no clear definition and is carefully avoided by Unicode, as Ken 
Whistler already explained.

| A string mixed with CJK and Kana is Japanese, CJK and Hangul mix is
| Korean. However, an all CJK character string MUST presumed to be in the
| primary language tag, that is Chinese, and registered as the only IDN name,
| unless the registrant requests a second and a third language to access the
| same IDN name.

Nothing prevents an all-Han string of any arbitrary length from being 
Japanese text.  The priority given to Chinese here is not likely to be well 
received by other groups.

| Also, it
| introduces more policy decisions, for example, an all CJK character
| trademark registrant may have to registrate in three languages to ensure
| the legitimacy of the trademark.

Wait just a minute.  Wasn't the whole idea of this language-tagging and 
CJK-folding scheme to PREVENT registrants from having to register an IDN 
identifier more than once?

| After all, a useful tool is to let its
| user to make decisions.

Some tools are interactive, others are not.

Finally, it is not yet clear to me whether the "idn-zh-" tag prefix is 
supposed to be embedded within IDN identifiers or specified separately.  But 
between this additional label and the use of the less efficient StepCode 
instead of ACE-Z, it seems that several bytes out of the precious 63-byte 
limit are required as overheard to support this tagging scheme.  If I 
remember correctly, it is CJK users (Soobok Lee is only the most vocal) who 
are most concerned about the space limitation and who want to find (or 
invent) the most efficient encoding system possible.  Will these other CJK 
users agree to this proposal?

-Doug Ewell
 Fullerton, California