[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: Unicode is not usable in international context.
> (In contrast, I think I have learned enough by following the past <n>
> months of discussion to tell that the differences between Traditional
> and Simplified Chinese are analogous to spelling differences, and so
> IDN should not try to unify them.)
>
Hi, Alan Barrett:
It is great to hear you have such an impression from the
past months discussion on the list. This shows that
the communication of TC/SC problem in IDN has marked
the differences in our understanding of the real world
problem, and this list is not dominated by Chinese and
Korean users at all.
As a Chinese user with limited education in Han character
culture, but has been interested in computation in Chinese
character processing in the past decades, I like to post
a different picture for your reference.
Chinese characters are larger character set than that of
Latin (20,000+ vs. 52 minus those ancient symbols) and
demonstrates much larger range of symbolic
representation phenomenon of human languages.
These phenomenon shall be and has to be classified
into different levels to be handled by a computational
process, otherwise there is no way you can see any
characters on a computer screen.
The problem of Chinese character processing or we
call it "Chinese information processing" in the 1970's
and "dictioary lookups" dated from centries ago, was
faced with this classification problem started from Qin
dynasty, when our ancestors tryed to communicate
among different kingdoms. Now, We are still faced with
the same problem on how to classify these characters
to make it properly handled in IDN.
To generalizing and call TC/SC is analogous to spelling
differences is wrong, because they are equivalent
in NAMEs as ONE unit of screen display, while spelling
difference in Latin are equivalent in NAMEs as MORE
THAN ONE unit of display, such as "color" vs. "colour".
This difference makes the whole processing of computer-
human text interface into two different category as we are
discussed on this list.
To follow the basic concept of ONE display unit on our
screen, we are discuss allowing one unit of display to be
extented from 52 ASCII characters to 50,000 USC characters.
1. Do we need all 50,000 USC identifiers for IDN? Do we
need 50,000 x 3 characters as some Latin users trying to
do?
No, we do not need so many identifiers. That is the idea of
defining equivalent character sets comes in. That is the gist
of TC/SC must be in IDN debate.
2. Can we handle more than 50,000 characters?
Yes. It is one level above in localized user interface to deal
with.
3. Let us say that by using equivalent character set, we
have dropped 50,000 down to 30,000 USC symbols, can
we separate USC character's language context to avoid
DNS level confusion?
a). Some say, we don't care, that is not IDN problem, that
is other group's problem. We only want IDN passes DNS
without glitches.
b). Some say, we care, USC code point gives that symbol
back. The only viable solution is to use UTF8 or UTF16
to retain the original glyph.
Your position may be consistent with a). I think we are different
in how to divider Chinese characters into different levels for
processing.
People agree with b), have not taken the problem of
look-alike symbols across different language boundary
seriously as it may sound, in addition to other comments
already repeated here.
c). The Chinese group wants TS/SC in IDN rooted in
experences in Chinese character processing and usage
in the last three decades. I am not deny that some of
them also somewhat like the idea of b).
For internet stability and security maintenance, the only
identities cannel has to be in DNS. Without character's
language context, it is impossible to separate Greek/Cryllic/
Latin/Armenian characters consistently cross many levels
of processing of different user applicantions, and the best
examples are CUT & PASTE. The DNS has to have
language context information for machines, the
users as well as system administrators - that is
transparently in the form of language tags, not some
hidden tags just for machines.
It has to be an integral part of a name label to function
correctly through all the cut&paste operations. Sorry, that
I shall stop here, since this discussion is far out of the
group's scope, my posting right can be taken away
soon.
Regards,
Liana Ye