[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] reordering strawpoll
----- Original Message -----
From: "Kent Karlsson" <kentk@md.chalmers.se>
>
> It's a bit strange that this comes from quarters where there is already quite
> a lot of "compaction" in the representations of text. A single Han ideograph
> expresses "more" than a single letter in other scripts. And a single Hangul
> syllable character expresses from 2 to 6 letters in one single character.
In hangul natural sentence,it may reach 2.6~7.
In hangul business names, the average number is 2.2 or so, i guess.
> Hangul is fundamentally is an alphabetic script, with 17 consonant letters
> and 11 vowel letters, plus some variant (and historic) letters.
If you encode each Hangul syllabic in 3 jamos in utf8,
it need 3 octets * 3 = 9 octets, while 3 basic latin letter need 3 octets in utf8.
3 times more space! if there were any real "compaction" on hangul
syllable code points, that may be just the bare minimum.
From What i get from reorering experiments, It became clear that
long han/hangul code points sequence of length N can be represented
by 2.0~2.2 * N latin letters. Without reordering, it would be 3.0~3.1.
33% improvement is possible! Why should we go without reordering
which merely require simple mapping tables with so many benefits?
Soobok Lee