[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] summary of reordering discussion



Thanks for your interests and comments.

A korean standard body had  made hangul character frequency statistics  to  choose
most frequent 2350 hangul characters (freq > 0.001%)  for KSC5601 legacy hangul code system.
Reordering table v3.0  may contain  the full- 2350 or top-1024 KSC5601 hangul characters.   
It's the  perfect set that korean gov approved with some statistics.  :-)

I know  CN/TW/JP/KR gov standard body/commitees have made or maintains
han char statistics for similar purposes. CN/TW/KR/JP have 4000~13000 frequent han chars of legacy chinese code system. It's very clear  the  # of common TC characters of CN/TW/KR/JP are less than 4800. And For each TC character in the common set, 
we add its SC/Kanji variants  into the common set. Then we get about 6000~7000 han
characters which  CN/TW/KR/JP GOVs approved as frequent ones and have some statistics on .

But even such authoritative statistics cannot be the optimal ones. ALWAYS sub-optimal,
just have authoritative and maintained sources. Some additional mixing/tuning/training within those statistics are needed . Welcome any suggestions.

Regards,

Soobok Lee

----- Original Message ----- 
From: "xiaodong lee" <lee@whale.cnnic.net.cn>
To: "Soobok Lee" <lsb@postel.co.kr>; "James Seng/Personal" <jseng@pobox.org.sg>; "Bruce Thomson" <bthomson@fm-net.ne.jp>; <idn@ops.ietf.org>
Sent: Friday, November 09, 2001 8:29 PM
Subject: Re: [idn] summary of reordering discussion


> That is great.
> We need to do it and find some org to support it. not to deny it simply.
> If we use some authoritative data to make some result, it will be 
> more useful for people to use.
> ----- Original Message ----- 
> 发件人: "Soobok Lee" <lsb@postel.co.kr>
> 收件人: "James Seng/Personal" <jseng@pobox.org.sg>; "Bruce Thomson" <bthomson@fm-net.ne.jp>; <idn@ops.ietf.org>
> 发送时间: 2001年11月9日 下午 12:41
> 主题: Re: [idn] summary of reordering discussion
> 
> 
> 
> ----- Original Message ----- 
> From: "James Seng/Personal" <jseng@pobox.org.sg>
>  > 
> > This is the biggest problem I have with reordering, ie, the lack to
> > reference a creditable table. And yes, there is no table to reference
> > unfortunately.
> 
> I have found some govermental authorites that published
> its official script character frequency statistics.
> 
> 
> > 
> > -James Seng
> > 
> 
>