[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] opting out of SC/TC equivalence



Hi ! Fältström :
> >         I had ask you a question in our discussion why not also put
> > alphabet case holding in keyword system ?
>
> The simple answer is that the Unicode Consortium have defined "case
> folding", and other kind of equivalence tables. All of them are mechanical
> mappings, and we have in the IDN group chosen to include them in the
> nameprep algorithm.
>
> If some of them should be removed, or if some should be added, that is a
> valid discussion.
>
       Thanks  your careful concern , I  expect that mechanism can be used
to all scripts to reduce the confusing. The work can not be completed at one
night ! But by this way , that can be improved step by step.
> If a table existed which handled mappings between TC and SC, we could of
> course use it.
>
> The problem is that such a table doesn't seem to exist. I have seen in one
> draft (as Harald points out) a table with approximately 2000 characters
out
> of the 20-40000 which exists. Harald have asked valid questions about what
> to do with the other characters.
>
      If you ask an expert of chinese language to give you this "completed
table", it will be never delivered , because the characters are in an open
set. Even we use chinese, we just use not more than 8000 frequently used
characters in our newspaper . So no one will answer this impossible
problems.  As I have mensioned  most PRC-SC are frequently used and some of
them are quick-written font that are very similar to formal-written font ,
so the problem of easy-to-confuse is serious, especially in Hong-Kong, the
mixing of  SC/TC  will produce more confusing domain name.  We can not
solved all of them , but we can reduce the troubles by doing some character
mapping .
> Further, you should not compare SC-TC equivalence with case folding inside
> one script, for example Latin. Instead you should compare with the fact
> that the same, and similar, characters exists in Greek, Latin and Cyrillic
> scripts. For example the character which looks like 'A'.
>
> We do _NOT_ have solutions to that kind of equivalence between different
> scripts for non-chinese characters.
>
    I  hope  IDN WG can inform the related NIC to propose the mapping table
in UNICODE  to reduce the confusing in their scripts.
> Because of that, I feel that if you don't have a table, then you ask for a
> solution for chinese characters which we don't solve for other scripts in
> the world.
>
> Further, you say that the number of permutations will be high. Linguistics
> I have talked with say that the number of permutations will be low, i.e.
> that one label is in most cases either in SC or TC.
>
        That is true in GB and BIG5 , so UNAME draft show the example of
registration in full SC (pure GB not  GBK) and full TC  not the mixing ,
but the GBK used in Hong-Kong including part of  frequently used TC
characters are  mixing  of  SC and TC .  The  chinese.com  does not limit
the mixing of  SC/TC in UNICODE form.

> I don't know, and don't care, what the correct asnswer is. But what
> troubles me is that people which really do know chinese characters and
> scripts say so different things. I don't hear one story. That makes
> discussions very problematic.
>
        The fact is that there are many part and many case applied in
different situations. The culture of chinese characters is the
characteristics of an open set. That is very different with western model
based on simple basic elements.
        Just  like we present mapping by table first and you may like to
present it by transformation formula first. That is all based on historic
experiences.

Thanks your help to solve our problems.

L.M.Tseng