[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] opting out of SC/TC equivalence
--On 2001-08-31 18.40 +0800
"=?utf-8?B?dHNlbmdsbUDoqIjntrLkuK3lv4Mu5Lit5aSnLnR3?="
<tsenglm@cc.ncu.edu.tw> wrote:
> I had ask you a question in our discussion why not also put
> alphabet case holding in keyword system ?
The simple answer is that the Unicode Consortium have defined "case
folding", and other kind of equivalence tables. All of them are mechanical
mappings, and we have in the IDN group chosen to include them in the
nameprep algorithm.
If some of them should be removed, or if some should be added, that is a
valid discussion.
If a table existed which handled mappings between TC and SC, we could of
course use it.
The problem is that such a table doesn't seem to exist. I have seen in one
draft (as Harald points out) a table with approximately 2000 characters out
of the 20-40000 which exists. Harald have asked valid questions about what
to do with the other characters.
Further, you should not compare SC-TC equivalence with case folding inside
one script, for example Latin. Instead you should compare with the fact
that the same, and similar, characters exists in Greek, Latin and Cyrillic
scripts. For example the character which looks like 'A'.
We do _NOT_ have solutions to that kind of equivalence between different
scripts for non-chinese characters.
Because of that, I feel that if you don't have a table, then you ask for a
solution for chinese characters which we don't solve for other scripts in
the world.
Further, you say that the number of permutations will be high. Linguistics
I have talked with say that the number of permutations will be low, i.e.
that one label is in most cases either in SC or TC.
I don't know, and don't care, what the correct asnswer is. But what
troubles me is that people which really do know chinese characters and
scripts say so different things. I don't hear one story. That makes
discussions very problematic.
Regards, Patrik