[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: ave length, best compression etc
There are approximately 2000 very commonly used chinese characters which
we will learn by primary school. a typical chinese can survive day to
day knowing about 4000 to 5000 han ideograph. I believe it is the
similar with Korean and Japanese. Not all ideograph are equal.
But my experience ends here. I am not sure if it is applicable to other
scripts or how it should be. OTOH, there aint many scripts which used
more than 255 block like CJK.
-James Seng
----- Original Message -----
From: "Martin Duerst" <duerst@w3.org>
To: "Soobok Lee" <lsb@postel.co.kr>; "James Seng/Personal"
<James@seng.cc>; <idn@ops.ietf.org>
Sent: Thursday, July 12, 2001 6:07 PM
Subject: Re: [idn] Re: ave length, best compression etc
> At 10:31 01/07/12 +0900, Soobok Lee wrote:
>
> >The most frequent 256 Han letters has cumulative frequency sum
> > of 58.2% and for the cases of top 512,1024,2048 and 4096 ones,
> > it reaches 72.8,85.9,95.4 and 99.4%, respectively.
> > 4096 is roughly close to the size of Simplified Chinese Character
sets.
>
> By the way, how do you weight simplified, traditional, Japanese,
> and Korean when they use different codepoints?
>
> Regards, Martin.
>