[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] opting out of SC/TC equivalence
> >> The problem is that such a table doesn't seem to exist. I have seen in
> >> one draft (as Harald points out) a table with approximately 2000
> >> characters
> > out
> >> of the 20-40000 which exists. Harald have asked valid questions about
> >> what to do with the other characters.
> >>
> > If you ask an expert of chinese language to give you this
"completed
> > table", it will be never delivered , because the characters are in an
open
> > set. Even we use chinese, we just use not more than 8000 frequently used
> > characters in our newspaper . So no one will answer this impossible
> > problems.
>
> Please then answer a simpler one:
>
> When the matching algorithm you envision encounters a character where the
> tables do not tell what to do about it, what will the algorithm do to it?
> Pass it through unmodified? Reject it? Does something else which you have
> not defined yet?
>
The 1-1 mapping table first treated the frequently used
characters in TC and SC . I have suggest CJK people to review the level of
easy-to-confuse in font scripts in draft of tsconv and to remove some pair
that is concerned in different language using. I think no CJK people
like to use some characters to confuse with each others. So , this is a
simple character mapping table in UNICODE TR 20 that can be major concerned
to easy-to-confuse only. A matching algorithm can be table driven in
treat her input and stored data to check it is in the equivalent set and
convert it to the defined standard scripts, if it is a character not in the
defined equivalent set why it should be convertyed ?
> If you tell us this, and claim that this is acceptable treatment of the
> 20-40.000 characters not in the tables, we can at least discuss this
> proposal (whether such a partial TC/SC mapping is acceptable), instead of
> asking again and again what to do about them.
>
As Patrik Fältström described , if it is a mechanism in
UNICODE consosium , then they have the version update procedures and IDNA
also treat the version update in UNICODE , why it can not be updated step
by step to increse it ? The most important thing is tell user in
registration time what character set is treated as equivalence and what is
not and what is the current rage of total characters can be used . The
procedure and announcement is important and not the boundary of range is
important. If the range is monotonically increasing and new added will be
fully cross checked , it is in the correct way .
The most important thing in CJK area is that the allowed
characters set must be announced by official organization of goverment . So
, each ccNIC need time to exchange informations.
> But it is impossible to implement an algorithm which says "this part will
> be worked out later".
>
The algorithm has no change , it just increase the data scale . If the
range of data is still in this type of algorithm defined , why it can not
work ?
The problem is in version control if you change the function part. If
function is still applied to the new data , there are no problems in
algorithm.
L.M.Tseng