[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] opting out of SC/TC equivalence

To: "Harald Tveit Alvestrand" <harald@alvestrand.no>, "=?utf-8?B?UGF0cmlrIEbDg8KkbHRzdHLDg8K2bQ==?=" <paf@cisco.com>, <liana.ydisg@juno.com>, "hlqian" <hlqian@cnnic.net.cn>
Subject: Re: [idn] opting out of SC/TC equivalence
From: =?utf-8?B?dHNlbmdsbUDoqIjntrLkuK3lv4Mu5Lit5aSnLnR3?= <tsenglm@cc.ncu.edu.tw>
Date: Sat, 1 Sep 2001 00:44:05 +0800
Cc: <huangk@alum.sinica.edu>, <idn@ops.ietf.org>

> >> The problem is that such a table doesn't seem to exist. I have seen in
> >> one draft (as Harald points out) a table with approximately 2000
> >> characters
> > out
> >> of the 20-40000 which exists. Harald have asked valid questions about
> >> what to do with the other characters.
> >>
> >       If you ask an expert of chinese language to give you this
"completed
> > table", it will be never delivered , because the characters are in an
open
> > set. Even we use chinese, we just use not more than 8000 frequently used
> > characters in our newspaper . So no one will answer this impossible
> > problems.
>
> Please then answer a simpler one:
>
> When the matching algorithm you envision encounters a character where the
> tables do not tell what to do about it, what will the algorithm do to it?
> Pass it through unmodified? Reject it? Does something else which you have
> not defined yet?
>
           The 1-1 mapping table first treated the frequently used
characters in TC and SC . I  have suggest CJK  people to review the level of
easy-to-confuse in font scripts in draft of  tsconv and to remove some pair
that is concerned  in different language using.  I think  no  CJK people
like to use some characters to confuse with each others.  So , this is a
simple character mapping table in UNICODE TR 20 that can be  major concerned
to easy-to-confuse only.   A matching algorithm can be  table driven in
treat her input and stored data to check it is  in the equivalent set  and
convert it to the defined standard scripts, if it is a character not in the
defined equivalent set why it should be convertyed ?

> If you tell us this, and claim that this is acceptable treatment of the
> 20-40.000 characters not in the tables, we can at least discuss this
> proposal (whether such a partial TC/SC mapping is acceptable), instead of
> asking again and again what to do about them.
>
           As Patrik FÃƒÂ¤ltstrÃƒÂ¶m  described , if it is a mechanism in
UNICODE consosium , then they have the version update procedures and IDNA
also treat  the version update in UNICODE , why it can not be updated step
by step  to  increse it ?  The most important thing is tell user in
registration time what character set is treated as equivalence and what is
not  and what is the current rage of total characters can be used .  The
procedure and announcement is important and not the boundary of range is
important. If the range is monotonically increasing and new added will be
fully cross checked , it is in the correct way .
            The most important thing in CJK area is that the allowed
characters set must be announced by official organization of goverment .  So
, each  ccNIC  need time to exchange informations.

> But it is impossible to implement an algorithm which says "this part will
> be worked out later".
>
   The algorithm has no change , it just increase the data scale . If the
range of data is still in this type of algorithm defined  , why it can not
work ?
   The  problem is in version control if you change the function part. If
function is still applied to the new data , there are no problems in
algorithm.

L.M.Tseng

Prev by Date: [idn] Working group focus -- and progress
Next by Date: Re: [idn] Working group focus -- and progress
Prev by thread: Re: [idn] opting out of SC/TC equivalence
Next by thread: Re: [idn] opting out of SC/TC equivalence
Index(es):
- Date
- Thread