[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] opting out of SC/TC equivalence

To: liana.ydisg@juno.com, James@Seng.cc
Subject: Re: [idn] opting out of SC/TC equivalence
From: Harald Tveit Alvestrand <harald@alvestrand.no>
Date: Wed, 29 Aug 2001 10:20:08 +0200
Cc: tsenglm@cc.ncu.edu.tw, huangk@alum.sinica.edu, idn@ops.ietf.org,dhc@dcrocker.net

--On 28. august 2001 13:40 -0700 liana.ydisg@juno.com wrote:

> Hi, James and Chinese experts:
>   You are right on the TC/SC equivalence not in Unicode.
> I know they wanted to put it in long time ago, so I assumed
> it is reflected in there some how.   I have just read a reason
> that it is not in there, because they think it is too difficult to
> put it in.   I happend to have an idea that 1100 half size
> code point may solve part of the problem and another 200
> TC/SC listing completes it.  This can be used in [nameprep].
> What do you think?

I would be happy to see a complete proposal.

draft-ietf-idn-tsconv-00 describes a TC/SC mapping for 2064 traditional/
simplified pairs, saying that other tables are needed for single/many
and many/single mappings.

This means that we have a documented proposal on what to do with 4128 
characters.
In Unicoode 3.0, there are 23.658 *more* characters classified as "Han"; 
Unicode 3.1 adds 42.711 more, and it has been noted here that because of 
the way Chinese linguistics work, it is almost 100% certain that there will 
be more added.
I assume (foolishly) that for some large class of these characters, the 
answer is "don't touch them" when mapping TC/SC - but I have no way of 
telling which characters belong in that class.

If you can come up with a proposal that describes what to do about ALL the 
Han characters in Unicode, I will be very happy to hear it.

Until then, I have to say that I have not seen any complete proposal.
Remember - the implementations of the algorithm for the non-Chinese part of 
the world will mainly be done by non-Chinese-speaking programmers; it's got 
to be simple & complete enough that even I can get it right...

             Harald

> .
> Liana
>
> On Mon, 20 Aug 2001 13:53:38 +0800 "James Seng/Personal" <James@Seng.cc>
> writes:
>> > The SC character set has been used for decades and has went
>> > through extensive nationwide testing in China.  SC is stable and
>> they
>> are
>> > properly reflected in Unicode standard.   The question is a
>> definition:
>> > is TC/SC a case folding?  It seems that in this WG, there has no
>> > consensus on this definition yet.
>>
>> I am not sure what you mean by "properly reflected" in Unicode
>> Standard.
>> If you mean it is in ISO10646 codepoints, then yes, both TC/SC are
>> in
>> the code points. But if you saying Unicode Consortium have proper
>> definition of TC/SC, then I afraid to say there is none.
>>
>
>

Prev by Date: Re: [idn] Re: [idn-nameprep] nameprep and others: hangeulchar
Next by Date: Re: [idn] opting out of SC/TC equivalence
Prev by thread: Re: [idn] opting out of SC/TC equivalence
Next by thread: Re: [idn] opting out of SC/TC equivalence
Index(es):
- Date
- Thread