[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] summary of reordering discussion
Thanks for your interests and comments.
A korean standard body had made hangul character frequency statistics to choose
most frequent 2350 hangul characters (freq > 0.001%) for KSC5601 legacy hangul code system.
Reordering table v3.0 may contain the full- 2350 or top-1024 KSC5601 hangul characters.
It's the perfect set that korean gov approved with some statistics. :-)
I know CN/TW/JP/KR gov standard body/commitees have made or maintains
han char statistics for similar purposes. CN/TW/KR/JP have 4000~13000 frequent han chars of legacy chinese code system. It's very clear the # of common TC characters of CN/TW/KR/JP are less than 4800. And For each TC character in the common set,
we add its SC/Kanji variants into the common set. Then we get about 6000~7000 han
characters which CN/TW/KR/JP GOVs approved as frequent ones and have some statistics on .
But even such authoritative statistics cannot be the optimal ones. ALWAYS sub-optimal,
just have authoritative and maintained sources. Some additional mixing/tuning/training within those statistics are needed . Welcome any suggestions.
Regards,
Soobok Lee
----- Original Message -----
From: "xiaodong lee" <lee@whale.cnnic.net.cn>
To: "Soobok Lee" <lsb@postel.co.kr>; "James Seng/Personal" <jseng@pobox.org.sg>; "Bruce Thomson" <bthomson@fm-net.ne.jp>; <idn@ops.ietf.org>
Sent: Friday, November 09, 2001 8:29 PM
Subject: Re: [idn] summary of reordering discussion
> That is great.
> We need to do it and find some org to support it. not to deny it simply.
> If we use some authoritative data to make some result, it will be
> more useful for people to use.
> ----- Original Message -----
> 发件人: "Soobok Lee" <lsb@postel.co.kr>
> 收件人: "James Seng/Personal" <jseng@pobox.org.sg>; "Bruce Thomson" <bthomson@fm-net.ne.jp>; <idn@ops.ietf.org>
> 发送时间: 2001年11月9日 下午 12:41
> 主题: Re: [idn] summary of reordering discussion
>
>
>
> ----- Original Message -----
> From: "James Seng/Personal" <jseng@pobox.org.sg>
> >
> > This is the biggest problem I have with reordering, ie, the lack to
> > reference a creditable table. And yes, there is no table to reference
> > unfortunately.
>
> I have found some govermental authorites that published
> its official script character frequency statistics.
>
>
> >
> > -James Seng
> >
>
>