[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] reordering strawpoll
In a message dated 2001-11-12 14:41:29 Pacific Standard Time,
lsb@postel.co.kr writes:
> If you encode each Hangul syllabic in 3 jamos in utf8,
> it need 3 octets * 3 = 9 octets, while 3 basic latin letter need 3 octets
in utf8.
> 3 times more space! if there were any real "compaction" on hangul
> syllable code points, that may be just the bare minimum.
But one paragraph earlier, Soobok stated that each hangul character is
roughly equivalent to (i.e. carries roughly as much information as) 2.2 to
2.7 Latin letters. So the 9 octets of UTF-8 actually encode the equivalent
of 6.6 to 8.1 Latin letters, which means Hangul encoding is 10% to 27% less
efficient than Latin encoding. Representing it as two-thirds (67%) less
efficient is obviously misleading. Such claims only detract attention away
from any merit the reordering plan may have.
> From What i get from reorering experiments, It became clear that
> long han/hangul code points sequence of length N can be represented
> by 2.0~2.2 * N latin letters. Without reordering, it would be 3.0~3.1.
> 33% improvement is possible! Why should we go without reordering
> which merely require simple mapping tables with so many benefits?
James Seng has stated repeatedly that there is no need to reiterate, yet
again, the supposed benefits of reordering. Every proposal, including this
one, has both advantages and disadvantages which must be weighed against each
other.
-Doug Ewell
Fullerton, California