[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] opting out of SC/TC equivalence



Hi, James,

 Thank you for the implementation analysis.  It is very
helpful for bringing  this discussion down to another
level.  I have serveral thoughts on this.

The SC character set has been used for decades and has went
through extensive nationwide testing in China.  SC is stable and they are
properly reflected in Unicode standard.   The question is a definition:
is TC/SC a case folding?  It seems that in this WG, there has no 
consensus on this definition yet. 

The primary reason, I guess, is everyone knows that CJK is a piece of
very hot sweet potato, and quite easy to get burned from it, but it
smells
so good, that none of us wants to give it up at this momment.  :-))
One solution is to push the hot sweet potato to zonefile, but as you have

pointed out, only Chinese zone will do this.  If I am in US, it may be 
hopeless for me  to see Chinese characters on my domain name 
forever.  

However, the implementation as you have described can be 
implemented in [nameprep] with Unicode as the primary reference 
code, provided the character mapping issue has been settled with
your option 1.  And we are back to the case folding definition.  
I offer my definition:  case folding is from a key with one look up 
into a data table, and you can obtain another key from the search. 

Example: 

Unicode		folds to
points
col-1		col-2

A  		a			(case 1)
TC-1  		SC-1			(case 2)
IPA letter ts	t s			(case 3)
Kanji-1		Kana-1			(case 4)
Kanji-1		Kana-2
Kanji-1		Kana-3
Hanja-1		Hangul-1		(case 5)
Hanja-2		Hangul-1
Hanja-3		Hangul-1

Case 1 is the current [nameprep],
Case 2 is the long lasting discussion of TC/SC,
Case 3 is possible mnemonic ACE assignment,
Case 4 and 5 is the extent of allowing TC/SC as 
	case folding into [nameprep].

With the above list, it is questionable about the above are 
case folding.  Because, from col-1 key Kanji-1, there are 
three lookups to get Kana-3 back in Case 4 and three 
lookups from Hangul-1 to Hanja-3 in Case 5. 

The problem is not so much with TC/SC even they are 
one-one mapping, it is with Kanji and Hanja.  Using
transliteration only, as in Case 3, is obvious not good enough,
as you have pointed out in your option 3.  Since what IDN
is realy wanted is an ACE to go into DNS, we can obtain
such an mnemonic ACE by assign another column to
this table, thus:

col-1 		col-2		 col-3 ACE  

A  		a		a			(case 1)
White Space	nil		nil
SC-1				Pinyin-1+part1+part2+part3  (case 2)
TC-1  		SC-1		Pinyin-1+part1+part2+part3  
IPA letter ts	t s		ts			  (case 3)

Kana-1				Romaji-1
Kanji-1		Kana-1		Romaji-1+part1+part2	(case 4)
Kanji-1		Kana-2		Romaji-2+part1+part2	(ruled out)
Kanji-1		Kana-3		Romaji-3+part1+part2	(ruled out)

Hangul-1			Hangul-1
Hanja-1		Hangul-1	Hangul-1+part-1.1	(case 5)
Hanja-2		Hangul-1	Hangul-1+part-2.1
Hanja-3		Hangul-1	Hangul-1+part-3.1

Now any key search from col-1 will get a unique  ACE from col-3,
except Kanji-1 to Kana-3 has to be ruled out by one search only
case folding.  If we can assign any identifier to it, the user has
 to take a one sound per Kanji within its Romaji system.  This 
is the "language" aspect that case folding can not solve.   Well, the 
alternative is a compressed ACE.  But from col-3 to col-1, there is 
always one success search.  

While any key from col-3 will get a display code from col-1.  This
table and search is within the pure technical "case folding" 
definition given above.

What if Hanja-1 = Kanji-1? This can be checked at registration 
time in addition to its ACE, which is checked with DNS.  I am
not certain, if IDN registration is part of this WG charter goal or 
not, and so I shall stop any discussing on its relation with 
[nameprep] or any related issues and hope that no one getting a 
hot sweet potato burn.

One issue on mnemonic ACE is its value assignment, since 
Unicode has given each symbol a Latin name, but the name 
is not intented to be used in case folding, and as a pure 
technical WG, we can assign any value to an identifier.  How 
much freedom do we have in such an assignment?

In any case, I agree with you that this WG needs a list to collect 
each SCRIPT user/engineers' input regarding user's wish list. 
 I've come up with the following questionary for comments.

1. Your script name (refer to a name defined in Unicode):

2. How familar are you with the script? How often do you 
  use the script? 

3. What do you expecting your IDN hostname look like?

4. Does your script is used interchagebly with another script?
   If yes, then which ones? Are they used as a mixed string? 

5. How does your script  dealing with foreign concept? 

6. How does your script  dealing with foreign sounds?

7. Do you wish your script could mix with other scripts?  
  If you do, then which script do you think is most useful
  for your script to accommodating?

8. As an engineer, what part of the use of your script is 
relatively universal such that there will be no serious 
controversal against it to be fixed by IDN implementation?

9. As an engineer, what part of the use of your script is 
often controversal, such that IDN shall avoid it in 
 IDN implementation?

Liana Ye