[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Working direction of IDN



I'm sorry, I don't really understand your point.

We recognize that additional characters are in use, and the national
governments of these countries are continuing to encode more characters in
the IRG. However, the frequency of use of those characters is *extremely*
low, and does not prevent the successful deployment of Unicode/10646
solutions for CJK characters. (For example, anyone using Microsoft Office
for Japanese, Chinese or Korean is already using Unicode internally).

When you say "language-based script", I was guessing that you are in favor
of encoding different national standards for encoding Chinese characters.
There are huge disadvantages to the approach of using different encodings,
in that there would be *many* different representations for the same
sequence of characters.

While theoretically one could have mapping tables between these sequences,
in practice every platform's mappings differs from the others. Even when you
think that you are using the same charset, on different platforms you will
get different conversions: that means that you will get data corruption
problems. (For example, in IBM we have collected over 700 unique character
conversion mappings that we have found on different platforms -- see
http://oss.software.ibm.com/cvs/icu/charset/data/xml/).

However, the last part of your message seems to indicate that you don't want
a national standard, but instead want to use some new system that uses
decomposition of Chinese characters. While Unicode/10646 does provide a way
to *describe* characters via such a mechanism (see Ideographic Description
Characters in the Unicode Standard), it is not recommended for encoding
Chinese characters.

Early in Unicode's history, we looked at the possibility of encoding Chinese
characters in this way. This approach requires the use of variable numbers
of code points to represent Chinese characters, and that unless far more
than 1000 base characters used, the lengths of the sequences can get quite
long.

If you are interested in further discussion of this topic, I'd suggest you
contact some of the experts in the IRG.

Mark

----- Original Message -----
From: <liana.ydisg@juno.com>
To: <mark@macchiato.com>
Cc: <idn@ops.ietf.org>; <liana.ydisg@juno.com>
Sent: Thursday, May 24, 2001 13:23
Subject: Re: [idn] Working direction of IDN


> You have addressed the very problem of using ISO 10646 / Unicode as day
> to day
> information/samantic processing of CJK characters.  There is no single
> expert who
> would use all of the characters in the set, why do we care to carry so
> many characters
> in our register?  However, there are sparse needs for the characters,
> which
> the computer does well for human beings.  That is an argument for a
> language
> based script to be coded as IDN names.
>
> Besides, there are studies from China in the 80's, that about 1000
> frequently used Chinese characters which are composed into the larger
> set
> of Chinese characters.
>
> Liana Ye
>
>
> On Thu, 24 May 2001 06:51:25 -0700 "Mark Davis" <markdavis34@home.com>
> writes:
> > There is a lot of misinformation floating around about the support of
> > Chinese, Japanese and Korean characters. ISO 10646 / Unicode
> > supports over
> > 70,000 Chinese characters right now, and work is underway to encode
> > further
> > sets. It has the same repertoire as the Chinese GB18030 and GB
> > 13000.
> >
> > This work is being done by the IRG, which includes representatives
> > of the
> > governments of China, Hong Kong, Singapore, Japan, South Korea,
> > North Korea,
> > Taiwan and Vietnam, plus a representative from the Unicode
> > consortium (cf
> > http://www.info.gov.hk/digital21/eng/structure/intro_irg.html).
> >
> > This group is very careful cataloging, reviewing, and assessing
> > Chinese
> > characters for inclusion into the standard. The only real limitation
> > on the
> > number of Chinese characters in the standard is the ability of this
> > group to
> > process them, because the characters are increasingly obscure (no
> > person
> > knows more than a fraction of the set already encoded).
> >
> > Mark
> >
> > ----- Original Message -----
> > From: <liana.ydisg@juno.com>
> > To: <idn@ops.ietf.org>
> > Cc: <liana.ydisg@juno.com>
> > Sent: Wednesday, May 23, 2001 23:16
> > Subject: [idn] Working direction of IDN
> >
> >
> > > All:
> > >
> > > Answer to James Seng's comment:
> > >
> > > >Correct me if I am wrong...from your email, you are suggesting
> > that IDN
> > > >to be encoded in a strings based on its symbols which makes up
> > the
> > > >script. While this is an intriguing thought, unfortunately, I do
> > not
> > > >think we have sufficient work done in that direction.
> > >
> > > >Our current works basically resolves using ISO10646 as a basis
> > for
> > > >encoding names. Whatever limitation ISO10646 has, it is
> > improving.
> > >
> > > I am suggesting that IDN to be encoded in a strings based on its
> > symbols
> > > which makes up the script.
> > >
> > > Although IDN has no need to concern with trademarks at this time,
> > it is
> > > the idn limitation to support only [ISO10646] bothers me.
> > [ISO10646]
> > > does not support all Chinese characters, since Chinese character
> > set has
> > > more than 100,000 characters already.  There are cases that we can
> > not
> > > find a live person's name on our current computers, don't  mention
> > the
> > > dead ones.  At the speed of computer and network development, when
> > the
> > > applications catch up with us, millions of more data going to be
> > on the
> > > net for accessing.  Are we going to revamp our idn again?  I can
> > suggest
> > > an ACE using a recursive rule to handle any symbols that are not
> > possible
> > > to be simply folded onto ASCII character set.  Such an ACE creats
> > a
> > > virtue character set, which is possible
> > >  to include any characters or icons requested by a user, as long
> > as that
> > > icon has been encoded by an ACE somewhere on the net.  When the
> > IDN can
> > > take more than 63 octets or v6 or v8, the ACE can follow without
> > > troubling the users.  Let alone the idea to send Unicode down to
> > the
> > > wire.  It is not user friendly, it is make no sense for the
> > tradename
> > > servers.
> > >
> > > Liana Ye
> > >
> >
>
>
>
>