[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] reordering strawpoll




----- Original Message ----- 
From: "Bruce Thomson" <bthomson@fm-net.ne.jp>
To: "Soobok Lee" <lsb@postel.co.kr>; <idn@ops.ietf.org>
Sent: Tuesday, November 13, 2001 11:16 AM
Subject: Re: [idn] reordering strawpoll


> Soobok Lee wrote:
> 
> > If you encode each Hangul syllabic in 3 jamos in utf8,
> > it need 3 octets * 3 = 9 octets, while 3 basic latin letter need 3 octets in utf8.
> > 3 times more space!  if there were any real "compaction" on hangul
> > syllable code points, that may be just the bare minimum.
> 
> This statement seems quite misleading. 3 jamos is not
> the average Hangul syllable, it is definitely high-end.
> 3 letters is quite short for an English syllable. If you
> want a more impressive syllable example, try "glyphs".
> 
> The best way to solve the question of information content
> per character is to use statistics from actual text examples.
> You recall that this has already been done on the list with
> the Book of Genesis, which gives a completely different
> result from what you are saying here. Do you have another
> statistical example that supports your opinion?

You may have missed my another answer to Kent which provides 2.6 as the average
researched by Adam. My above examples argument stands even with  2.6 jamos. 
2.6*3octets vs 2.6 latins == 3time more space needed. That was my point.
3 jamos syllable is just for an example. Okay? :-)

Soobok Lee

> 
> From: "Adam M. Costello" 
> 
> > Here are the counts for Genesis chapter 1:
> >
> > King James:     3167 letters
> > Basic English:  3088 letters
> > Chinese Union:   778 ideographs
> > Korean Revised: 1201 Hangul
> 
>