[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] opting out of SC/TC equivalence



> There are existing converting table already.

If there is an existing converting tables, please provide reference.
This table must be publicly available and ideally without IP on it.

> What I am
> proposing is for GB directly exchange with Unicode in
> [nameprep] such that there will be one step search
> for other users who don't want unicode set as well.

What you suggesting is that you wish to do Nameprep with charset in GB.
This is to provide the unneccessary step to take
GB->Unicode->Nameprep->ACE then ACE->Unicode->GB.

While it seem to make sense on the surface, I think you have
misunderstood the purpose of Nameprep. Nameprep is for matching purpose.
The matching result remains in ACE. If I give you two GB string, you do
a GB->Unicode->Nameprep->ACE then compare the two ACE. You dont reverse
the ACE back to GB.

And also, your proposal to do GB->Nameprep->ACE(or whatever) for
comparison would break the matching. GB->Nameprep->ACE may produce the
same ACE as SJIS->Nameprep->ACE where GB and SJIS may not be the the
same in the first place.

> You are right, most of us are care about this problem and
> contributing to this group for free.  We are discussing this
> sincerely.  However, I have clear feeling, something
> otherwise.  I wish, I am wrong, that we can see real
> solution out of this group.

As I said to another member of the wg, it is okay to form any conspiracy
theory in your own private mind but to say it in public would requires
you to provide substained evidence. Give specific instance or stop been
disruptive.

> David, that is the misconception I have referred to in blaming
> what Unicode has been done on Chinese language.  TC/SC
> is the same script and same language.  It is used in a similar
> way with upper/lower case of Latin.  Just like some people
> want to use uppercase / or printing all the time, but most use
> mixed cases.  TC/SC is larger set, so it is natural to have
> more variety of changes.  But the majority is treated like
> Latin cases.  They are not mixed scripts. Japanese is a
> mixed scripts.  Korean is a mixed script depending on who's
> viewpoint you are subscribing.

I share this misconception as you have above many years back, and blame
Unicode/ISO10646 for their poor handling of Chinese scripts. (See
archives of unicore@unicode.org if you wish to see what stupid statement
I made back them).

A couple of expert Chinese linguistics quickly brought me to the light
to the key problems in TC-SC. As one explain to me, the best way to deal
with TC-SC is to treat them as two separate language all together,
except they have some scripts in common and the grammers are similar. I
know it is weird but once you able to accept this concept, you will find
all the difficulties we have in TC-SC are closer to language translation
and less on codepoint normalization.

I learn my place now after that blunder. Chinese language issues are
best deal by Chinese Linguists, not by Computer Scientists.

> TC/SC is in dictionaries for kids in China.

This is interesting. Please provide reference to this dictionaries for
kids which have TC-SC. I like to see if I can get a copy of it.

> Some people have experiences that
> Chinese translation always takes two versions, then
> TC/SC must be two different languages, that is wrong too.

Have it occur to you that they *may* be actually right? And if they are
wrong, explain to me why they are wrong?

Would you consider Chinese & Japanese same? Probably not but they are
close enough for some Chinese to read a bit of Japanese and vice versa.
Would you consider TC & SC same language? Probably yes, since it is very
much similar (100x more similar than Japanese) but it still have enough
different words to confuse a casual reader.

-James Seng