[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] reordering strawpoll
Soobok Lee wrote:
> If you encode each Hangul syllabic in 3 jamos in utf8,
> it need 3 octets * 3 = 9 octets, while 3 basic latin letter need 3 octets in utf8.
> 3 times more space! if there were any real "compaction" on hangul
> syllable code points, that may be just the bare minimum.
This statement seems quite misleading. 3 jamos is not
the average Hangul syllable, it is definitely high-end.
3 letters is quite short for an English syllable. If you
want a more impressive syllable example, try "glyphs".
The best way to solve the question of information content
per character is to use statistics from actual text examples.
You recall that this has already been done on the list with
the Book of Genesis, which gives a completely different
result from what you are saying here. Do you have another
statistical example that supports your opinion?
From: "Adam M. Costello"
> Here are the counts for Genesis chapter 1:
>
> King James: 3167 letters
> Basic English: 3088 letters
> Chinese Union: 778 ideographs
> Korean Revised: 1201 Hangul