[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING



You dont get my point.

Reordering achieve shorter by putting more oftenly used characters in
one block and others at the back. BUT this also means less oftenly used
characters would result in a *LONGER* label then usual. And who are we
to say these less oftenly used characters are less important or worst,
become invalid (too long to fit) because of this reordering?

-James Seng

----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "James Seng/Personal" <jseng@pobox.org.sg>; <idn@ops.ietf.org>;
"Martin Duerst" <duerst@w3.org>
Sent: Friday, October 19, 2001 12:11 PM
Subject: Re: [idn] call for comments for REORDERING


> Another answer for your concern.
>
> ----- Original Message -----
> From: "James Seng/Personal" <jseng@pobox.org.sg>
> >
> > The bigger concern I have with re-ordering remains in the fact that
> > tables mappings proves efficient with existing IDN names in some
> > registries *BUT* it  does not indicate what performance it would be
like
> > in the future. We do not know what happened when the names space get
> > saturated and would other names which would have been useable
without
> > lsb become un-usuable due to lsb.
> >
>
> 1) saturations in TLD namespaces would require longer names for which
>     REORDERING is designed to give greater benefits/compression ratio.
>
> 2) future variations on character usage frequency in each script
>
>     2.0) the character frequency table are constructed from
>          Verisign GRS' ML.com testbeds.
>          Even for chinese han script, their
>           registrations came from China/TAIWAN/JAPAN/KOREA and other
>            non-asian squatters.
>          Each country of the 4 have their own different han character
>            usage patterns. The reordering table for han , therefore,
>           cannot  for the worst case, the mutual difference in
improvement ratios
>           did not exceed  +- 2% around 20%.
>
>     2.1) this issue is already answered by latest REORDERING I-D 2.0
>          see the enclosed excerpts from it. The influence of this
>          frequency variations is marginal.
>
> 3)
>
> My REORDERING I-D contains experiments with various lengths of han
> frequency tables : 1024,2048,3072,4096. In order to look into
> the influence of INs and OUTs of han characters from the most frequent
> 2048 han characters.
>
> For the two cases of table size 2048 and 4096, there was merely +- 1%
of
> differences in achieved improvements. I also reversed the order of the
table itself,but it produced nearly the same result.
> The partial change in the order of the reordering table does not make
big differences. some will lose and some will win. the net effect is
near to zero.
>
> I believe most of SJIS and KSC5601 han characters are  included in the
> most frequent 4096 han character tables, because their governments
bodies
> selected   a few thousands of most frequent subset of entire han
characters.
>
> i believe that frequency fluctuation of han characters over time  is
> WITHIN the frequent set. INs and OUTs from 4096 ones are rare and does
not invalidate the validity of most frequent 1024 and 2048 ones.
> Moreover, TC/SC/KC characters are put side-by-side to avoid
countriy-specific biases in han reordering table.
>
> non-CJK scripts often haver small set of basic alphabets, and their
> character usage patterns are more stable than those for han/hangeul.
>
> REORDERING does not recommends reordering on shares latin scripts,
> because latin characters are already encoded as it is (in literal
mode,
> the most efficient form ). latin script for europeans (0.6 billions)
are the most favored one in ACE-Z. There shoulbe be some conpensations
for
> non europeans. Han script: 2 billions, Arabic: 0.7 billion, Hindi: 0.5
billion
>
> This new frequency-based reordering is always more efficient than
> original lexicographical ordering in UCS
> even with some fluctuation in future script usage patterns.
>
> We are not pursuing elusive "perfection and optimal" solution.
> REORDERING tables cannot be modified if it is once freezed as
standards.
> Therefore,REORDERING is a sub-optimal solution in its nature but will
be remain
> as a valid and effective solution for a long time .
>
>
> Soobok
>
> ---------------------------
>
> Unified Han and Hangeul
>
>     11172 Hangul syllables and 20912 CJK Unified Han ideographs occupy
>     roughly two thirds of current assigned unicode code points.
>     Their lexicographical ordering makes various ACE compression
>     algorithm work poorly for them, because they are spread evenly
>     through out those wide code blocks.
>
>     According to one usage frequency statistics on hangeul syllables
>     in general hangeul texts, the most frequent 256 Hangul syllables
>     have the cumulative frequency sum of 88.2% and for the case of top
>     512 ones, it reaches 99.9%. That means the maximum variation of
>     code point values(11172) can be shrinked into 512 in reordered
>     hangeul block with a probability of 99.9%.
>
>     Likewise, the most frequent 256 Han letters have the cumulative
>     frequency sum of 58.2% and for the cases of top 512,1024,2048 and
>     4096 ones, it reaches 72.8%,85.9%,95.4% and 99.4%, respectively.
>     That means the maximum variation of code point values (20912) can
>     be shrinked into 2048 with a probability of 95.4%.
>
>     The han/hangul frequency mapping tables are constructed from
>     nameprepped ML.com domains from VGRS MultiLingual testbeds.
>     The frequenet characters in the tables are organized by their
>     increasing frequency order to minimize the AMC-ACE-Z bootstring
>     delta values which can be lowered when bigger code distances are
>     from the lower positions of the sorted labels in AMC-ACE-Z step 2.
>
>     In general,character frequency distributions in any script block
>     may undergo some shifts within the frequent set by the passage of
>     time, but the in and out of some characters from the frequent set
>     are very rare. So, their impacts may be as marginal and negligable
>     as the following comparison of experiment results shows.
>
>     Reorering tables based on most frequent 1024,2048,3072 and 4096
han
>     and hangul letters in increasing frequency order, produced
marginal
>     differences in improvements:
>
>       N is the length of sample labels and other
>       decimal values (in percentage) are the improvement ratios for
>       all the combinations of all N and 4 reordering tables.
>
>
>
>       |  N|  HAN-4096|  HAN-3072|  HAN-2048|  HAN-1024|
>       |  1|     7.07 |     5.49 |     3.58 |      1.64|
>       |  2|    13.61 |    13.22 |    11.57 |      8.06|
>       |  3|    16.26 |    16.05 |    15.10 |     12.26|
>       |  4|    20.80 |    20.71 |    20.19 |     18.11|
>       |  5|    22.17 |    22.03 |    21.47 |     19.41|
>       |  6|    24.85 |    24.77 |    24.41 |     22.48|
>       |  7|    25.52 |    25.40 |    24.99 |     23.17|
>       |  8|    26.47 |    26.36 |    26.00 |     24.15|
>       |  9|    26.54 |    26.46 |    26.04 |     24.26|
>       | 10|    27.47 |    27.40 |    27.01 |     25.09|
>       | 11|    27.30 |    27.26 |    26.85 |     25.12|
>       | 12|    27.74 |    27.64 |    27.41 |     25.60|
>       | 13|    27.27 |    27.17 |    26.78 |     25.28|
>       | 14|    27.48 |    27.35 |    27.08 |     24.94|
>       | 15|    28.60 |    28.43 |    28.56 |     26.54|
>       | 16|    27.70 |    27.84 |    27.70 |     25.51|
>       | 17|    25.68 |    25.68 |    25.43 |     23.70|
>       |ALL|    20.30 |    20.14 |    19.43 |     17.09|
>
>
>     Experiments with two reorering tables in increasing and
descreasing
>     orders for most frequent 2048,4096 han letters,also produced
>     marginal differences in improvements:
>
>      (4096D means: the ordering table is in decreasing frequency
order)
>
>
>       |  N|  HAN-4096| HAN-4096D|  HAN-2048| HAN-2048D|
>       |  1|     7.07 |     7.01 |     3.58 |     3.51 |
>       |  2|    13.61 |    13.44 |    11.57 |    11.27 |
>       |  3|    16.26 |    16.35 |    15.10 |    14.93 |
>       |  4|    20.80 |    20.56 |    20.19 |    19.90 |
>       |  5|    22.17 |    21.80 |    21.47 |    21.12 |
>       |  6|    24.85 |    24.21 |    24.41 |    23.82 |
>       |  7|    25.52 |    24.59 |    24.99 |    24.14 |
>       |  8|    26.47 |    25.68 |    26.00 |    25.36 |
>       |  9|    26.54 |    25.55 |    26.04 |    25.18 |
>       | 10|    27.47 |    26.79 |    27.01 |    26.42 |
>       | 11|    27.30 |    26.82 |    26.85 |    26.36 |
>       | 12|    27.74 |    27.46 |    27.41 |    27.13 |
>       | 13|    27.27 |    26.97 |    26.78 |    26.59 |
>       | 14|    27.48 |    27.31 |    27.08 |    26.99 |
>       | 15|    28.60 |    28.60 |    28.56 |    28.56 |
>       | 16|    27.70 |    27.55 |    27.70 |    27.20 |
>       | 17|    25.68 |    25.93 |    25.43 |    25.68 |
>       |ALL|    20.30 |    20.00 |    19.43 |    19.07 |
>
>     These experiments show that the influences of some fluctations in
>     character frequency distributions in the frequent set of a script
>     would not be so great that could invalidate or outdate this
>     reordering approach in the forseeable future.
>
>     But,to be as neutral and fair as possible in dealing with the
cases
>     with different usage patterns in China,Japan,Korea and Taiwan,
here
>     are provided some provisions for grouping country-specific
variants
>     of certain han letters. Especially, a group of simplified chainese
>     letter (SC) and traditional chinese letter (TC) and Kanji-specific
>     letter (KC) are ranked by the sum of their frequecies and placed
>     side-by-side in the reordering table for Unified Han block.
>     For example, the reordering table looks like:
>      (TC1) (TC2 SC2) (TC3 KC3) (TC4) (TC5 SC5 KC5) (TC6) .....
>     This grouping will serve to prevent the frequency orders from
being
>     skewed toward one of those country-specific usage patterns.
>     The experiments results 27 and 28 in [A3] shows that this
reordering
>     scheme improve 21.95% and 18.50% for SC and TC
labels,respectively.
>
>     According to experiments with huge han/hangeul domain samples,
>     as for 15 or more letters of  han/hangeul domains, AMC-ACE-Z with
>     reordering produced the shortest ACE labels which length
approximate
>     to 2.0*n~2.2*n (n= number of han/hangul code points in a label),
>     33.3% more efficient than bare AMC-ACE-Z without the reordering.
>     This efficiency is close to that of UCS-2 ( 2.0 * n) and much
better
>     than that of UTF8 ( 3.0*n ).
>
>     The appendix [A3] also contains some tuning experiments on ACE-Z's
>     skew and damp parameters. With skew==48 and damp==75, +1.3% in
>     compression ratio was achieved for han domains with some marginal
>     loss of efficiency in non-CJK scripts.
>
> ----------------
> 19. unihan-1024
>
> |  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
> |  1|    4427|      4427|     14957(3.38)|     14711(3.32)| 1.64|
> |  2|   57418|    114836|    384468(3.35)|    353466(3.08)| 8.06|
> |  3|   41335|    124005|    401283(3.24)|    352095(2.84)|12.26|
> |  4|   89296|    357184|   1139404(3.19)|    933070(2.61)|18.11|
> |  5|   21091|    105455|    332420(3.15)|    267893(2.54)|19.41|
> |  6|   15128|     90768|    284134(3.13)|    220263(2.43)|22.48|
> |  7|    5181|     36267|    112576(3.10)|     86487(2.38)|23.17|
> |  8|    3082|     24656|     76272(3.09)|     57854(2.35)|24.15|
> |  9|    1417|     12753|     39319(3.08)|     29779(2.34)|24.26|
> | 10|    1203|     12030|     37136(3.09)|     27817(2.31)|25.09|
> | 11|     474|      5214|     16072(3.08)|     12035(2.31)|25.12|
> | 12|     398|      4776|     14714(3.08)|     10947(2.29)|25.60|
> | 13|     164|      2132|      6532(3.06)|      4881(2.29)|25.28|
> | 14|     122|      1708|      5232(3.06)|      3927(2.30)|24.94|
> | 15|      50|       750|      2283(3.04)|      1677(2.24)|26.54|
> | 16|      29|       464|      1419(3.06)|      1057(2.28)|25.51|
> | 17|       8|       136|       405(2.98)|       309(2.27)|23.70|
>
> |All|  240823|    897561|   2868626(3.20)|   2378268(2.65)|17.09|
>
>
>
> 20. unihan-2048
>
> |  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
> |  1|    4427|      4427|     14957(3.38)|     14422(3.26)| 3.58|
> |  2|   57418|    114836|    384468(3.35)|    339996(2.96)|11.57|
> |  3|   41335|    124005|    401283(3.24)|    340675(2.75)|15.10|
> |  4|   89296|    357184|   1139404(3.19)|    909323(2.55)|20.19|
> |  5|   21091|    105455|    332420(3.15)|    261039(2.48)|21.47|
> |  6|   15128|     90768|    284134(3.13)|    214781(2.37)|24.41|
> |  7|    5181|     36267|    112576(3.10)|     84440(2.33)|24.99|
> |  8|    3082|     24656|     76272(3.09)|     56439(2.29)|26.00|
> |  9|    1417|     12753|     39319(3.08)|     29082(2.28)|26.04|
> | 10|    1203|     12030|     37136(3.09)|     27106(2.25)|27.01|
> | 11|     474|      5214|     16072(3.08)|     11756(2.25)|26.85|
> | 12|     398|      4776|     14714(3.08)|     10681(2.24)|27.41|
> | 13|     164|      2132|      6532(3.06)|      4783(2.24)|26.78|
> | 14|     122|      1708|      5232(3.06)|      3815(2.23)|27.08|
> | 15|      50|       750|      2283(3.04)|      1631(2.17)|28.56|
> | 16|      29|       464|      1419(3.06)|      1026(2.21)|27.70|
> | 17|       8|       136|       405(2.98)|       302(2.22)|25.43|
>
> |All|  240823|    897561|   2868626(3.20)|   2311297(2.58)|19.43|
>
>
>
> 21. unihan-2048-D ( the reordering in decreasing frequency order)
>
> |  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
> |  1|    4427|      4427|     14957(3.38)|     14432(3.26)| 3.51|
> |  2|   57418|    114836|    384468(3.35)|    341134(2.97)|11.27|
> |  3|   41335|    124005|    401283(3.24)|    341362(2.75)|14.93|
> |  4|   89296|    357184|   1139404(3.19)|    912694(2.56)|19.90|
> |  5|   21091|    105455|    332420(3.15)|    262224(2.49)|21.12|
> |  6|   15128|     90768|    284134(3.13)|    216465(2.38)|23.82|
> |  7|    5181|     36267|    112576(3.10)|     85401(2.35)|24.14|
> |  8|    3082|     24656|     76272(3.09)|     56931(2.31)|25.36|
> |  9|    1417|     12753|     39319(3.08)|     29420(2.31)|25.18|
> | 10|    1203|     12030|     37136(3.09)|     27324(2.27)|26.42|
> | 11|     474|      5214|     16072(3.08)|     11835(2.27)|26.36|
> | 12|     398|      4776|     14714(3.08)|     10722(2.24)|27.13|
> | 13|     164|      2132|      6532(3.06)|      4795(2.25)|26.59|
> | 14|     122|      1708|      5232(3.06)|      3820(2.24)|26.99|
> | 15|      50|       750|      2283(3.04)|      1631(2.17)|28.56|
> | 16|      29|       464|      1419(3.06)|      1033(2.23)|27.20|
> | 17|       8|       136|       405(2.98)|       301(2.21)|25.68|
>
> |All|  240823|    897561|   2868626(3.20)|   2321524(2.59)|19.07|
>
>
>
> 22. unihan-3072
>
> |  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
> |  1|    4427|      4427|     14957(3.38)|     14136(3.19)| 5.49|
> |  2|   57418|    114836|    384468(3.35)|    333660(2.91)|13.22|
> |  3|   41335|    124005|    401283(3.24)|    336865(2.72)|16.05|
> |  4|   89296|    357184|   1139404(3.19)|    903458(2.53)|20.71|
> |  5|   21091|    105455|    332420(3.15)|    259189(2.46)|22.03|
> |  6|   15128|     90768|    284134(3.13)|    213746(2.35)|24.77|
> |  7|    5181|     36267|    112576(3.10)|     83977(2.32)|25.40|
> |  8|    3082|     24656|     76272(3.09)|     56168(2.28)|26.36|
> |  9|    1417|     12753|     39319(3.08)|     28917(2.27)|26.46|
> | 10|    1203|     12030|     37136(3.09)|     26962(2.24)|27.40|
> | 11|     474|      5214|     16072(3.08)|     11690(2.24)|27.26|
> | 12|     398|      4776|     14714(3.08)|     10647(2.23)|27.64|
> | 13|     164|      2132|      6532(3.06)|      4757(2.23)|27.17|
> | 14|     122|      1708|      5232(3.06)|      3801(2.23)|27.35|
> | 15|      50|       750|      2283(3.04)|      1634(2.18)|28.43|
> | 16|      29|       464|      1419(3.06)|      1024(2.21)|27.84|
> | 17|       8|       136|       405(2.98)|       301(2.21)|25.68|
>
> |All|  240823|    897561|   2868626(3.20)|   2290932(2.55)|20.14|
>
>
>
> 23. unihan-4096
>
> |  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
> |  1|    4427|      4427|     14957(3.38)|     13899(3.14)| 7.07|
> |  2|   57418|    114836|    384468(3.35)|    332156(2.89)|13.61|
> |  3|   41335|    124005|    401283(3.24)|    336045(2.71)|16.26|
> |  4|   89296|    357184|   1139404(3.19)|    902406(2.53)|20.80|
> |  5|   21091|    105455|    332420(3.15)|    258709(2.45)|22.17|
> |  6|   15128|     90768|    284134(3.13)|    213522(2.35)|24.85|
> |  7|    5181|     36267|    112576(3.10)|     83844(2.31)|25.52|
> |  8|    3082|     24656|     76272(3.09)|     56083(2.27)|26.47|
> |  9|    1417|     12753|     39319(3.08)|     28883(2.26)|26.54|
> | 10|    1203|     12030|     37136(3.09)|     26935(2.24)|27.47|
> | 11|     474|      5214|     16072(3.08)|     11684(2.24)|27.30|
> | 12|     398|      4776|     14714(3.08)|     10632(2.23)|27.74|
> | 13|     164|      2132|      6532(3.06)|      4751(2.23)|27.27|
> | 14|     122|      1708|      5232(3.06)|      3794(2.22)|27.48|
> | 15|      50|       750|      2283(3.04)|      1630(2.17)|28.60|
> | 16|      29|       464|      1419(3.06)|      1026(2.21)|27.70|
> | 17|       8|       136|       405(2.98)|       301(2.21)|25.68|
>
> |All|  240823|    897561|   2868626(3.20)|   2286300(2.55)|20.30|
>
>
>
> 24. unihan-4096-D
>
> |  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
> |  1|    4427|      4427|     14957(3.38)|     13909(3.14)| 7.01|
> |  2|   57418|    114836|    384468(3.35)|    332799(2.90)|13.44|
> |  3|   41335|    124005|    401283(3.24)|    335682(2.71)|16.35|
> |  4|   89296|    357184|   1139404(3.19)|    905086(2.53)|20.56|
> |  5|   21091|    105455|    332420(3.15)|    259944(2.46)|21.80|
> |  6|   15128|     90768|    284134(3.13)|    215353(2.37)|24.21|
> |  7|    5181|     36267|    112576(3.10)|     84893(2.34)|24.59|
> |  8|    3082|     24656|     76272(3.09)|     56682(2.30)|25.68|
> |  9|    1417|     12753|     39319(3.08)|     29273(2.30)|25.55|
> | 10|    1203|     12030|     37136(3.09)|     27189(2.26)|26.79|
> | 11|     474|      5214|     16072(3.08)|     11762(2.26)|26.82|
> | 12|     398|      4776|     14714(3.08)|     10674(2.23)|27.46|
> | 13|     164|      2132|      6532(3.06)|      4770(2.24)|26.97|
> | 14|     122|      1708|      5232(3.06)|      3803(2.23)|27.31|
> | 15|      50|       750|      2283(3.04)|      1630(2.17)|28.60|
> | 16|      29|       464|      1419(3.06)|      1028(2.22)|27.55|
> | 17|       8|       136|       405(2.98)|       300(2.21)|25.93|
>
> |All|  240823|    897561|   2868626(3.20)|   2294777(2.56)|20.00|
>
>
>
> 25. unihan-4096-DAMP075-SKEW48
>
> |  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
> |  1|    4427|      4427|     14957(3.38)|     13899(3.14)| 7.07|
> |  2|   57418|    114836|    375901(3.27)|    324587(2.83)|13.65|
> |  3|   41335|    124005|    394416(3.18)|    330550(2.67)|16.19|
> |  4|   89296|    357184|   1126357(3.15)|    890277(2.49)|20.96|
> |  5|   21091|    105455|    329783(3.13)|    255913(2.43)|22.40|
> |  6|   15128|     90768|    282751(3.12)|    211339(2.33)|25.26|
> |  7|    5181|     36267|    112181(3.09)|     83126(2.29)|25.90|
> |  8|    3082|     24656|     76111(3.09)|     55712(2.26)|26.80|
> |  9|    1417|     12753|     39285(3.08)|     28699(2.25)|26.95|
> | 10|    1203|     12030|     37150(3.09)|     26767(2.23)|27.95|
> | 11|     474|      5214|     16028(3.07)|     11603(2.23)|27.61|
> | 12|     398|      4776|     14712(3.08)|     10567(2.21)|28.17|
> | 13|     164|      2132|      6528(3.06)|      4735(2.22)|27.47|
> | 14|     122|      1708|      5248(3.07)|      3762(2.20)|28.32|
> | 15|      50|       750|      2281(3.04)|      1628(2.17)|28.63|
> | 16|      29|       464|      1425(3.07)|      1017(2.19)|28.63|
> | 17|       8|       136|       404(2.97)|       301(2.21)|25.50|
>
> |All|  240823|    897561|   2835518(3.16)|   2254482(2.51)|20.49|
>
>
>
> 26. unihan-4096-DUDE
>
> |  N|    FREQ|    N*FREQ|  SUM OF DUDE(X)| SUM OF LDUDE(Y)| COMP|
> |  1|    4427|      4427|     17708(4.00)|     17708(4.00)| 0.00|
> |  2|   57418|    114836|    443874(3.87)|    409657(3.57)| 7.71|
> |  3|   41335|    124005|    474117(3.82)|    408039(3.29)|13.94|
> |  4|   89296|    357184|   1361917(3.81)|   1074237(3.01)|21.12|
> |  5|   21091|    105455|    401146(3.80)|    308378(2.92)|23.13|
> |  6|   15128|     90768|    344208(3.79)|    250925(2.76)|27.10|
> |  7|    5181|     36267|    137275(3.79)|     99475(2.74)|27.54|
> |  8|    3082|     24656|     93013(3.77)|     65889(2.67)|29.16|
> |  9|    1417|     12753|     48000(3.76)|     34230(2.68)|28.69|
> | 10|    1203|     12030|     45427(3.78)|     31663(2.63)|30.30|
> | 11|     474|      5214|     19564(3.75)|     13708(2.63)|29.93|
> | 12|     398|      4776|     18013(3.77)|     12468(2.61)|30.78|
> | 13|     164|      2132|      7969(3.74)|      5590(2.62)|29.85|
> | 14|     122|      1708|      6377(3.73)|      4476(2.62)|29.81|
> | 15|      50|       750|      2811(3.75)|      1926(2.57)|31.48|
> | 16|      29|       464|      1749(3.77)|      1213(2.61)|30.65|
> | 17|       8|       136|       508(3.74)|       355(2.61)|30.12|
>
> |All|  240823|    897561|   3423676(3.81)|   2739937(3.05)|19.97|
>
>
>
> 27. unihan-SC-4096 ( SC only or SC+TC mixed )
>
> |  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
> |  1|     769|       769|      2717(3.53)|      2378(3.09)|12.48|
> |  2|   16065|     32130|    108598(3.38)|     92597(2.88)|14.73|
> |  3|   14315|     42945|    139693(3.25)|    116054(2.70)|16.92|
> |  4|   48871|    195484|    623650(3.19)|    491073(2.51)|21.26|
> |  5|   12135|     60675|    190928(3.15)|    147721(2.43)|22.63|
> |  6|   10463|     62778|    196038(3.12)|    146516(2.33)|25.26|
> |  7|    3594|     25158|     77931(3.10)|     57412(2.28)|26.33|
> |  8|    2373|     18984|     58686(3.09)|     42907(2.26)|26.89|
> |  9|    1078|      9702|     29875(3.08)|     21736(2.24)|27.24|
> | 10|     934|      9340|     28786(3.08)|     20855(2.23)|27.55|
> | 11|     392|      4312|     13279(3.08)|      9612(2.23)|27.62|
> | 12|     314|      3768|     11579(3.07)|      8376(2.22)|27.66|
> | 13|     144|      1872|      5724(3.06)|      4158(2.22)|27.36|
> | 14|     104|      1456|      4455(3.06)|      3226(2.22)|27.59|
> | 15|      41|       615|      1868(3.04)|      1348(2.19)|27.84|
> | 16|      25|       400|      1219(3.05)|       887(2.22)|27.24|
> | 17|       7|       119|       353(2.97)|       264(2.22)|25.21|
>
> |All|  111624|    470507|   1495379(3.18)|   1167120(2.48)|21.95|
>
>
>
> 28. unihan-TC-4096 ( TC only )
>
> |  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
> |  1|    3658|      3658|     12240(3.35)|     11521(3.15)| 5.87|
> |  2|   41353|     82706|    275870(3.34)|    239559(2.90)|13.16|
> |  3|   27020|     81060|    261590(3.23)|    219991(2.71)|15.90|
> |  4|   40425|    161700|    515754(3.19)|    411333(2.54)|20.25|
> |  5|    8956|     44780|    141492(3.16)|    110988(2.48)|21.56|
> |  6|    4665|     27990|     88096(3.15)|     67006(2.39)|23.94|
> |  7|    1587|     11109|     34645(3.12)|     26432(2.38)|23.71|
> |  8|     709|      5672|     17586(3.10)|     13176(2.32)|25.08|
> |  9|     339|      3051|      9444(3.10)|      7147(2.34)|24.32|
> | 10|     269|      2690|      8350(3.10)|      6080(2.26)|27.19|
> | 11|      82|       902|      2793(3.10)|      2072(2.30)|25.81|
> | 12|      84|      1008|      3135(3.11)|      2256(2.24)|28.04|
> | 13|      20|       260|       808(3.11)|       593(2.28)|26.61|
> | 14|      18|       252|       777(3.08)|       568(2.25)|26.90|
> | 15|       9|       135|       415(3.07)|       282(2.09)|32.05|
> | 16|       4|        64|       200(3.12)|       139(2.17)|30.50|
> | 17|       1|        17|        52(3.06)|        37(2.18)|28.85|
>
> |All|  129199|    427054|   1373247(3.22)|   1119180(2.62)|18.50|
>
>
>