[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] reordering strawpoll




----- Original Message ----- 
From: "Kent Karlsson" <kentk@md.chalmers.se>
 > 
> It's a bit strange that this comes from quarters where there is already quite
> a lot of "compaction" in the representations of text.  A single Han ideograph
> expresses "more" than a single letter in other scripts.  And a single Hangul
> syllable character expresses from 2 to 6 letters in one single character.

In hangul natural sentence,it may reach 2.6~7.
In hangul business names, the average number is  2.2 or so, i guess.

> Hangul is fundamentally is an alphabetic script, with 17 consonant letters
> and 11 vowel letters, plus some variant (and historic) letters.

If you encode each Hangul syllabic in 3 jamos in utf8,
it need 3 octets * 3 = 9 octets, while 3 basic latin letter need 3 octets in utf8.
3 times more space!  if there were any real "compaction" on hangul
syllable code points, that may be just the bare minimum.

From What i get from reorering experiments, It became clear that 
long han/hangul code points sequence of length N can be represented 
by 2.0~2.2 * N  latin letters. Without reordering, it would be 3.0~3.1.
33% improvement is possible! Why should we go without reordering
which merely require simple mapping tables with so many benefits?

Soobok Lee