[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] opting out of SC/TC equivalence
--On 28. august 2001 13:40 -0700 liana.ydisg@juno.com wrote:
> Hi, James and Chinese experts:
> You are right on the TC/SC equivalence not in Unicode.
> I know they wanted to put it in long time ago, so I assumed
> it is reflected in there some how. I have just read a reason
> that it is not in there, because they think it is too difficult to
> put it in. I happend to have an idea that 1100 half size
> code point may solve part of the problem and another 200
> TC/SC listing completes it. This can be used in [nameprep].
> What do you think?
I would be happy to see a complete proposal.
draft-ietf-idn-tsconv-00 describes a TC/SC mapping for 2064 traditional/
simplified pairs, saying that other tables are needed for single/many
and many/single mappings.
This means that we have a documented proposal on what to do with 4128
characters.
In Unicoode 3.0, there are 23.658 *more* characters classified as "Han";
Unicode 3.1 adds 42.711 more, and it has been noted here that because of
the way Chinese linguistics work, it is almost 100% certain that there will
be more added.
I assume (foolishly) that for some large class of these characters, the
answer is "don't touch them" when mapping TC/SC - but I have no way of
telling which characters belong in that class.
If you can come up with a proposal that describes what to do about ALL the
Han characters in Unicode, I will be very happy to hear it.
Until then, I have to say that I have not seen any complete proposal.
Remember - the implementations of the algorithm for the non-Chinese part of
the world will mainly be done by non-Chinese-speaking programmers; it's got
to be simple & complete enough that even I can get it right...
Harald
> .
> Liana
>
> On Mon, 20 Aug 2001 13:53:38 +0800 "James Seng/Personal" <James@Seng.cc>
> writes:
>> > The SC character set has been used for decades and has went
>> > through extensive nationwide testing in China. SC is stable and
>> they
>> are
>> > properly reflected in Unicode standard. The question is a
>> definition:
>> > is TC/SC a case folding? It seems that in this WG, there has no
>> > consensus on this definition yet.
>>
>> I am not sure what you mean by "properly reflected" in Unicode
>> Standard.
>> If you mean it is in ISO10646 codepoints, then yes, both TC/SC are
>> in
>> the code points. But if you saying Unicode Consortium have proper
>> definition of TC/SC, then I afraid to say there is none.
>>
>
>