[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Working direction of IDN

To: mark@macchiato.com
Subject: Re: [idn] Working direction of IDN
From: liana.ydisg@juno.com
Date: Sat, 26 May 2001 02:52:34 -0700
Cc: idn@ops.ietf.org
Delivery-date: Sat, 26 May 2001 02:41:11 -0700
Envelope-to: idn-data@psg.com
What if we can come up with and agree to use a basicaly *one*
representation
 for the same sequence of characters using the same language label?

>Early in Unicode's history, we looked at the possibility of encoding
Chinese
> characters in this way. This approach requires the use of variable
numbers
> of code points to represent Chinese characters, and that unless far
more
> than 1000 base characters used, the lengths of the sequences can get 
> quite long.

Often used sequences can be short and likely to be used as IDN, the long 
ones are stored as text files, only program needs to deal with them.  
Can you accept this type of variable number of code points? 

Liana Ye


On Fri, 25 May 2001 07:16:15 -0700 "Mark Davis" <markdavis34@home.com>
writes:
> I'm sorry, I don't really understand your point.
> 
> We recognize that additional characters are in use, and the national
> governments of these countries are continuing to encode more 
> characters in
> the IRG. However, the frequency of use of those characters is 
> *extremely*
> low, and does not prevent the successful deployment of Unicode/10646
> solutions for CJK characters. (For example, anyone using Microsoft 
> Office
> for Japanese, Chinese or Korean is already using Unicode 
> internally).
> 
> When you say "language-based script", I was guessing that you are in 
> favor
> of encoding different national standards for encoding Chinese 
> characters.
> There are huge disadvantages to the approach of using different 
> encodings,
> in that there would be *many* different representations for the same
> sequence of characters.
> 
> While theoretically one could have mapping tables between these 
> sequences,
> in practice every platform's mappings differs from the others. Even 
> when you
> think that you are using the same charset, on different platforms 
> you will
> get different conversions: that means that you will get data 
> corruption
> problems. (For example, in IBM we have collected over 700 unique 
> character
> conversion mappings that we have found on different platforms -- see
> http://oss.software.ibm.com/cvs/icu/charset/data/xml/).
> 
> However, the last part of your message seems to indicate that you 
> don't want
> a national standard, but instead want to use some new system that 
> uses
> decomposition of Chinese characters. While Unicode/10646 does 
> provide a way
> to *describe* characters via such a mechanism (see Ideographic 
> Description
> Characters in the Unicode Standard), it is not recommended for 
> encoding
> Chinese characters.
> 
> Early in Unicode's history, we looked at the possibility of encoding 
> Chinese
> characters in this way. This approach requires the use of variable 
> numbers
> of code points to represent Chinese characters, and that unless far 
> more
> than 1000 base characters used, the lengths of the sequences can get 
> quite
> long.
> 
> If you are interested in further discussion of this topic, I'd 
> suggest you
> contact some of the experts in the IRG.
> 
> Mark
> 
> ----- Original Message -----
> From: <liana.ydisg@juno.com>
> To: <mark@macchiato.com>
> Cc: <idn@ops.ietf.org>; <liana.ydisg@juno.com>
> Sent: Thursday, May 24, 2001 13:23
> Subject: Re: [idn] Working direction of IDN
> 
> 
> > You have addressed the very problem of using ISO 10646 / Unicode 
> as day
> > to day
> > information/samantic processing of CJK characters.  There is no 
> single
> > expert who
> > would use all of the characters in the set, why do we care to 
> carry so
> > many characters
> > in our register?  However, there are sparse needs for the 
> characters,
> > which
> > the computer does well for human beings.  That is an argument for 
> a
> > language
> > based script to be coded as IDN names.
> >
> > Besides, there are studies from China in the 80's, that about 1000
> > frequently used Chinese characters which are composed into the 
> larger
> > set
> > of Chinese characters.
> >
> > Liana Ye
> >
> >
> > On Thu, 24 May 2001 06:51:25 -0700 "Mark Davis" 
> <markdavis34@home.com>
> > writes:
> > > There is a lot of misinformation floating around about the 
> support of
> > > Chinese, Japanese and Korean characters. ISO 10646 / Unicode
> > > supports over
> > > 70,000 Chinese characters right now, and work is underway to 
> encode
> > > further
> > > sets. It has the same repertoire as the Chinese GB18030 and GB
> > > 13000.
> > >
> > > This work is being done by the IRG, which includes 
> representatives
> > > of the
> > > governments of China, Hong Kong, Singapore, Japan, South Korea,
> > > North Korea,
> > > Taiwan and Vietnam, plus a representative from the Unicode
> > > consortium (cf
> > > http://www.info.gov.hk/digital21/eng/structure/intro_irg.html).
> > >
> > > This group is very careful cataloging, reviewing, and assessing
> > > Chinese
> > > characters for inclusion into the standard. The only real 
> limitation
> > > on the
> > > number of Chinese characters in the standard is the ability of 
> this
> > > group to
> > > process them, because the characters are increasingly obscure 
> (no
> > > person
> > > knows more than a fraction of the set already encoded).
> > >
> > > Mark
> > >
> > > ----- Original Message -----
> > > From: <liana.ydisg@juno.com>
> > > To: <idn@ops.ietf.org>
> > > Cc: <liana.ydisg@juno.com>
> > > Sent: Wednesday, May 23, 2001 23:16
> > > Subject: [idn] Working direction of IDN
> > >
> > >
> > > > All:
> > > >
> > > > Answer to James Seng's comment:
> > > >
> > > > >Correct me if I am wrong...from your email, you are 
> suggesting
> > > that IDN
> > > > >to be encoded in a strings based on its symbols which makes 
> up
> > > the
> > > > >script. While this is an intriguing thought, unfortunately, I 
> do
> > > not
> > > > >think we have sufficient work done in that direction.
> > > >
> > > > >Our current works basically resolves using ISO10646 as a 
> basis
> > > for
> > > > >encoding names. Whatever limitation ISO10646 has, it is
> > > improving.
> > > >
> > > > I am suggesting that IDN to be encoded in a strings based on 
> its
> > > symbols
> > > > which makes up the script.
> > > >
> > > > Although IDN has no need to concern with trademarks at this 
> time,
> > > it is
> > > > the idn limitation to support only [ISO10646] bothers me.
> > > [ISO10646]
> > > > does not support all Chinese characters, since Chinese 
> character
> > > set has
> > > > more than 100,000 characters already.  There are cases that we 
> can
> > > not
> > > > find a live person's name on our current computers, don't  
> mention
> > > the
> > > > dead ones.  At the speed of computer and network development, 
> when
> > > the
> > > > applications catch up with us, millions of more data going to 
> be
> > > on the
> > > > net for accessing.  Are we going to revamp our idn again?  I 
> can
> > > suggest
> > > > an ACE using a recursive rule to handle any symbols that are 
> not
> > > possible
> > > > to be simply folded onto ASCII character set.  Such an ACE 
> creats
> > > a
> > > > virtue character set, which is possible
> > > >  to include any characters or icons requested by a user, as 
> long
> > > as that
> > > > icon has been encoded by an ACE somewhere on the net.  When 
> the
> > > IDN can
> > > > take more than 63 octets or v6 or v8, the ACE can follow 
> without
> > > > troubling the users.  Let alone the idea to send Unicode down 
> to
> > > the
> > > > wire.  It is not user friendly, it is make no sense for the
> > > tradename
> > > > servers.
> > > >
> > > > Liana Ye
> > > >
> > >
> >
> >
> >
> >
> 
>
Prev by Date: Re: [idn] report of the straw poll
Next by Date: [idn] Let's go forward with IDNA and UTF-8
Prev by thread: Re: [idn] Working direction of IDN
Next by thread: [idn] Why we cannot go directly to UTF-8
Index(es):
- Date
- Thread