The test data, as you describe it, is completely
insufficient to draw any conclusions. The purpose of a test would be to
determine whether the combinatorics of mixing TC and SC characters were an real
issue, not simply whether people choose a particular SC vs. TC
character.
To do an accurate survey, you would have to be
much more rigorous, doing at least the following:
- Divide all CJK characters into three classes:
TC-only, SC-only, TC-or-SC.
Post this list so that it can be scrutinized by
others.
- Test all names to see which contain at least one
character from TC-only and one from SC-only.
Post that list so that it can also be scrutinized
by others.
- Of those, see how many would match under a TC-SC
mapping.
Publicize both the mapping and the list so that
they can also be scrutinized by others.
- See what the percentages are: mixed / total and
matching / mixed.
Note that the accuracy of the test also depends
heavily upon the accuracy of the division of characters in the original three
classes, and the accuracy of the TC-SC mapping.
Mark
—————
----- Original Message -----
Sent: Monday, February 04, 2002
02:41
Subject: Re: [idn] Re: Chinese Domain
Name Consortium (CDNC) Declaration
Dear Scott, Before I describe how often are Chinese Names written using a mixture of TC & SC, I explain how the user behavior and the current Input Method Environment of applicaiton which user would face to.
1. In the user behavior of written Chinese in ones daily life, it is VERY often of using mixture of TC & SC. I say VERY often here because it is the user written custom. It is difficult to have a statistics to describe how often. 2. In the current Input Method Environment of applicaiton, user would face to a IME of could type mixture TC & SC very easily. And because of the custom of using mixture TC & SC, the boundary of TC & SC is becoming indistinct. It is also difficult to distinguish them by user.
Although it is difficult to give a general statics. But I can offer some statics of TWNIC idn.tw testbed. In our testbed there are regist
erd 27,665 idn.tw Chinese domain name now.
In TWNIC idn.tw testbed, we have make a experiment on a TC & SC set (U+81FA, U+53F0). May be our experimental statics can used as explain how often. In the experiment we allow registrants to choose one of the (U+81FA, U+53F0) if they need in their Chinese domain name. In our statics, there are 2,311 Chinese domain name choose TC U+81FA and 3,938 choose SC U+53F0
In A Complete Set of Simplified Chinese Characters, there are well defined more then 2,000 set of TC & SC which are frequently used in Chinese daily live.
Erin Chen
Scott
Bradner wrote:
200202031600.g13G07n19482@newdev.harvard.edu
type="cite">Assume Chinese name of 10 glyphs. Each one may take 2 versions, TC or SC.
how often are Chinese names written using a mixture of TC & SC?
Scott
|