[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: hi



Soobok Lee <lsb@postel.co.kr> wrote:

> I suggest you visit the next link which contains The Revelation of the
> Holy Bible in Chinese (GB) and English (Modern BBE and Old KJV).
>
>
http://www.ccim.org/cgi-user/bible/ob?version=hgb&version=kjv&version=bb
e&book=gen
>
> And then, come back to this WG with new estimations on information
> capacity of han letters, please.

In Genesis chapter 1, I counted the Han ideographs in the Chinese Union
version, and the Latin letters in the King James version.  In both cases
I excluded all other characters (punctuation, spaces, verse numbers,
etc).

Chinese ideographs:  778
English letters:    3168

This suggests that each Chinese ideograph carries the information
content of slighly over four English letters.  Therefore a maximal
Chinese domain label in AMC-ACE-Z (19 ideographs, using about 3 octets
each plus 4 octets for the prefix) holds about as much information as
76-letter English string, which is 21% more information than a maximal
English domain label (63 letters using 1 octet each).

The situation is much worse for Korean.  I think each Hangul character
carries the information of only about 1.5 English letters, but still
takes about 2.9 octets in AMC-ACE-Z, which means a maximal Korean domain
label (20 hangul) holds about as much information as a 30-letter English
string.  Of all the languages I've looked at, Korean is by far the least
dense when encoded using AMC-ACE-Z.

AMC