[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] An ignorant question about TC<-> SC
Hi Ken,
Thanks for your input and below are my reponses:
> I have to disagree. I am certain that labelling what script an
> IDN is in will just cause problems.
>
It will *never* cause any problems because the script labelling of an
IDN by itself does not do anything. The worst case is that it is just
part of the domain name that serves absolutely no purpose (like ".ca"
serves no purpose). However, if and when people decide to make use of
it, then it becomes a very powerful system.
> At the very least, this will introduce an entire new class of
> error conditions, where the label says one thing, but the
> character content of the IDN does not in fact match the label.
>
With a .<traditional> label, if the IDN is entered as simplified
Chinese, then either an error will appear to alert the user or the
simplified Chinese will be converted automatically to traditional
Chinese. An IDN that end with ".ca" abviously cannot provide such a
benefit.
> Furthermore, the example we have been talking about here,
> traditional versus simplified Chinese, is not even a script
> difference in the first place. "Traditional" versus "Simplified"
> in a character set context, and as typically implemented,
> refers to distinctions between Code Page 950 (Big 5) and
> Code Page 936 (GBK, etc.), together with the fonts, input methods,
> message resource files, and such, as needed
> to support them. And either of those character sets is actually
> mixed script, since they both support Latin characters from
> ASCII, as well as the basic Greek alphabet and Bopomofo.
> "Simplified Chinese" also supports the basic Cyrillic alphabet
> and Hiragana and Katakana for Japanese.
>
Depite how certain input method/tables/encodings/files/application/etc
may support others, simplified Chinese is simplified Chinese and
traditional Chinese is traditional Chinese- there are no Greek, no
Japanese, no English, and no funny symbols.
> Even if you are just talking about Traditional versus
> Simplified Chinese characters (ideographs) within the
> Han script subparts of Code Page 950 or Code Page 936, the
> distinction is not as clean as you might think it would be.
> The PRC simplified set, even in its earlier forms in GB 2312,
> contain *some* traditional forms for characters. But the
> current extensions, first for GBK (~ Microsoft Code Page 936),
> and now for GB 18030, incorporate *all* of the Han characters
> from the Unicode 3.0 repertoire, which means that a
> "Simplified" code page for China now contains *all* of the
> traditional characters from Code Page 950, as well as all
> the simplified characters from Unicode 3.0.
>
This is a good point. It serves to illustrate how the line between
Traditional Chinese and Simplified Chinese is getting more blurred
because they are used so often interchangeably- hence the need for
TC<->SC conversion. However, if you are trying to say that there is
no distinction between them because GBK/GB18030 has incorporated
everything then you are absolutely incorrect. Traditional Chinese is
traditional Chinese and simplifed Chinese is simplified Chinese.
> And of course, Unicode data itself encompasses both simplified
> and traditional forms of Chinese ideographs. So what would the
> IDN distinction between simplified and traditional mean if
> data was encoded in Unicode?
>
> Even the identification of scripts is non-trivial. Many
> characters are *shared* between scripts, or are borrowed
> from one script to the next. Cyrillic and Latin have a long
> history of cross-borrowing forms from one script into the
> other, for example, for special uses. And Japanese got all
> its Chinese characters (kanji) in the first place by
> borrowing them from Chinese.
>
Characters that share the same Unicode can definitely be labeled as
different scripts- whether it is Chinese or Japanese or any other.
(Much like "same.com" and "same.ca".) The benefit for this IDN
distinction should be obvious as the same character (with the same
unicode) may have completely different meaning in different scripts.
One labeled as <same>.<traditional> (written in Traditional Chinese
ofcourse) and the other labeled as <same>.<simplified> (written in
Simplfied Chinese).
Thanks
Ben