[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] opting out of SC/TC equivalence





--On 28. august 2001 13:40 -0700 liana.ydisg@juno.com wrote:

> Hi, James and Chinese experts:
>   You are right on the TC/SC equivalence not in Unicode.
> I know they wanted to put it in long time ago, so I assumed
> it is reflected in there some how.   I have just read a reason
> that it is not in there, because they think it is too difficult to
> put it in.   I happend to have an idea that 1100 half size
> code point may solve part of the problem and another 200
> TC/SC listing completes it.  This can be used in [nameprep].
> What do you think?

I would be happy to see a complete proposal.

draft-ietf-idn-tsconv-00 describes a TC/SC mapping for 2064 traditional/
simplified pairs, saying that other tables are needed for single/many
and many/single mappings.

This means that we have a documented proposal on what to do with 4128 
characters.
In Unicoode 3.0, there are 23.658 *more* characters classified as "Han"; 
Unicode 3.1 adds 42.711 more, and it has been noted here that because of 
the way Chinese linguistics work, it is almost 100% certain that there will 
be more added.
I assume (foolishly) that for some large class of these characters, the 
answer is "don't touch them" when mapping TC/SC - but I have no way of 
telling which characters belong in that class.

If you can come up with a proposal that describes what to do about ALL the 
Han characters in Unicode, I will be very happy to hear it.

Until then, I have to say that I have not seen any complete proposal.
Remember - the implementations of the algorithm for the non-Chinese part of 
the world will mainly be done by non-Chinese-speaking programmers; it's got 
to be simple & complete enough that even I can get it right...

             Harald





> .
> Liana
>
> On Mon, 20 Aug 2001 13:53:38 +0800 "James Seng/Personal" <James@Seng.cc>
> writes:
>> > The SC character set has been used for decades and has went
>> > through extensive nationwide testing in China.  SC is stable and
>> they
>> are
>> > properly reflected in Unicode standard.   The question is a
>> definition:
>> > is TC/SC a case folding?  It seems that in this WG, there has no
>> > consensus on this definition yet.
>>
>> I am not sure what you mean by "properly reflected" in Unicode
>> Standard.
>> If you mean it is in ISO10646 codepoints, then yes, both TC/SC are
>> in
>> the code points. But if you saying Unicode Consortium have proper
>> definition of TC/SC, then I afraid to say there is none.
>>
>
>