[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] (new) experiments with *PURE* UniHan/Hangul ML.com with LAMCZ/LDUDE



I  have taken another experiments with *PURE* Unihan/Hangeul ML.com
domains without any scripts from Digits,Hiragana,Katakana and etc.
 
-------------------------------------------------------------------
LAMCZ with pure UniHan ML.com
 

N:            length of a domain label ( # of code points)
FREQ:         number domains of length N
N*FREQ:       sum of # of code points of domains of length N
SUM OF AMCZ:  sum of lengths of AMCZ labels
X:            SUM OF AMCZ / N * FREQ
SUM OF LAMCZ: sum of lengths of LAMCZ labels
Y:            SUM OF LAMCZ / N * FREQ
COMP:         (SUM OF LAMCZ - SUM OF AMCZ) / SUM OF AMCZ * 100
 
|  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
 
|  1|    3735|      3735|     12177(3.26)|     11589(3.10)|4.83|
|  2|   42793|     85586|    281303(3.29)|    249930(2.92)|11.15|
|  3|   28033|     84099|    267446(3.18)|    230951(2.75)|13.65|
|  4|   54607|    218428|    685584(3.14)|    562349(2.57)|17.98|
|  5|   12591|     62955|    195596(3.11)|    157259(2.50)|19.60|
|  6|    7680|     46080|    141927(3.08)|    110465(2.40)|22.17|
|  7|    2761|     19327|     59231(3.06)|     44754(2.32)|24.44|
|  8|    1336|     10688|     32554(3.05)|     24120(2.26)|25.91|
|  9|     641|      5769|     17490(3.03)|     12833(2.22)|26.63|
| 10|     298|      2980|      8962(3.01)|      6570(2.20)|26.69|
| 11|     137|      1507|      4575(3.04)|      3226(2.14)|29.49|
| 12|      57|       684|      2057(3.01)|      1529(2.24)|25.67|
| 13|      25|       325|       983(3.02)|       727(2.24)|26.04|
| 14|       6|        84|       253(3.01)|       181(2.15)|28.46|
| 15|       6|        90|       266(2.96)|       195(2.17)|26.69|
| 17|       1|        17|        48(2.82)|        32(1.88)|33.33|
 
|   |  154707|    542354|   1710452(3.15)|   1416710(2.61)|17.17|
 
 
-------------------------------------------------------------------
LAMCZ with pure Hangul ML.com

N:            length of a domain label ( # of code points)
FREQ:         number domains of length N
N*FREQ:       sum of # of code points of domains of length N
SUM OF AMCZ:  sum of lengths of AMCZ labels
X:            SUM OF AMCZ / N * FREQ
SUM OF LAMCZ: sum of lengths of LAMCZ labels
Y:            SUM OF LAMCZ / N * FREQ
COMP:         (SUM OF LAMCZ - SUM OF AMCZ) / SUM OF AMCZ * 100
 
|  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
 
|  1|    1940|      1940|      7760(4.00)|      7760(4.00)|0.00|
|  2|   16492|     32984|    119927(3.64)|    102659(3.11)|14.40|
|  3|   37406|    112218|    380305(3.39)|    310666(2.77)|18.31|
|  4|   57732|    230928|    756089(3.27)|    587684(2.54)|22.27|
|  5|   36661|    183305|    587547(3.21)|    440929(2.41)|24.95|
|  6|   22090|    132540|    418286(3.16)|    304984(2.30)|27.09|
|  7|   11503|     80521|    251226(3.12)|    180533(2.24)|28.14|
|  8|    4963|     39704|    122742(3.09)|     86642(2.18)|29.41|
|  9|    2104|     18936|     57964(3.06)|     40725(2.15)|29.74|
| 10|     833|      8330|     25332(3.04)|     17599(2.11)|30.53|
| 11|     358|      3938|     11919(3.03)|      8225(2.09)|30.99|
| 12|     123|      1476|      4422(3.00)|      3092(2.09)|30.08|
| 13|      71|       923|      2752(2.98)|      1901(2.06)|30.92|
| 14|      28|       392|      1160(2.96)|       805(2.05)|30.60|
| 15|      18|       270|       798(2.96)|       565(2.09)|29.20|
| 16|      10|       160|       460(2.88)|       342(2.14)|25.65|
| 17|       7|       119|       354(2.97)|       243(2.04)|31.36|
 
|   |  192339|    848684|   2749043(3.24)|   2095354(2.47)|23.78|
 
-------------------------------------------------------------------
LDUDE  with pure UniHan ML.com
 

N:            length of a domain label ( # of code points)
FREQ:         number domains of length N
N*FREQ:       sum of # of code points of domains of length N
SUM OF DUDE:  sum of lengths of DUDE labels
X:            SUM OF DUDE / N * FREQ
SUM OF LDUDE: sum of lengths of LDUDE labels
Y:            SUM OF LDUDE / N * FREQ
COMP:         (SUM OF LDUDE - SUM OF DUDE) / SUM OF DUDE * 100
 
|  N|    FREQ|    N*FREQ|  SUM OF DUDE(X)| SUM OF LDUDE(Y)| COMP|
 
|  1|    3735|      3735|     14940(4.00)|     14940(4.00)|0.00|
|  2|   42793|     85586|    328641(3.84)|    296679(3.47)|9.73|
|  3|   28033|     84099|    318916(3.79)|    268987(3.20)|15.66|
|  4|   54607|    218428|    826020(3.78)|    639417(2.93)|22.59|
|  5|   12591|     62955|    237862(3.78)|    178866(2.84)|24.80|
|  6|    7680|     46080|    173063(3.76)|    122010(2.65)|29.50|
|  7|    2761|     19327|     72750(3.76)|     49710(2.57)|31.67|
|  8|    1336|     10688|     40078(3.75)|     26187(2.45)|34.66|
|  9|     641|      5769|     21554(3.74)|     14009(2.43)|35.01|
| 10|     298|      2980|     11164(3.75)|      7095(2.38)|36.45|
| 11|     137|      1507|      5671(3.76)|      3482(2.31)|38.60|
| 12|      57|       684|      2528(3.70)|      1678(2.45)|33.62|
| 13|      25|       325|      1188(3.66)|       807(2.48)|32.07|
| 14|       6|        84|       309(3.68)|       202(2.40)|34.63|
| 15|       6|        90|       323(3.59)|       204(2.27)|36.84|
| 17|       1|        17|        55(3.24)|        39(2.29)|29.09|
 
|   |  154707|    542354|   2055062(3.79)|   1624312(2.99)|20.96|
 
-------------------------------------------------------------------
LDUDE  with pure Hangul ML.com
 
N:            length of a domain label ( # of code points)
FREQ:         number domains of length N
N*FREQ:       sum of # of code points of domains of length N
SUM OF DUDE:  sum of lengths of DUDE labels
X:            SUM OF DUDE / N * FREQ
SUM OF LDUDE: sum of lengths of LDUDE labels
Y:            SUM OF LDUDE / N * FREQ
COMP:         (SUM OF LDUDE - SUM OF DUDE) / SUM OF DUDE * 100
 
|  N|    FREQ|    N*FREQ|  SUM OF DUDE(X)| SUM OF LDUDE(Y)| COMP|
 
|  1|    1940|      1940|      7760(4.00)|      7760(4.00)|0.00|
|  2|   16492|     32984|    125398(3.80)|    107795(3.27)|14.04|
|  3|   37406|    112218|    420812(3.75)|    328202(2.92)|22.01|
|  4|   57732|    230928|    860712(3.73)|    610053(2.64)|29.12|
|  5|   36661|    183305|    682356(3.72)|    454656(2.48)|33.37|
|  6|   22090|    132540|    492239(3.71)|    313997(2.37)|36.21|
|  7|   11503|     80521|    298365(3.71)|    186251(2.31)|37.58|
|  8|    4963|     39704|    146595(3.69)|     89221(2.25)|39.14|
|  9|    2104|     18936|     69955(3.69)|     42413(2.24)|39.37|
| 10|     833|      8330|     30705(3.69)|     18333(2.20)|40.29|
| 11|     358|      3938|     14452(3.67)|      8628(2.19)|40.30|
| 12|     123|      1476|      5397(3.66)|      3314(2.25)|38.60|
| 13|      71|       923|      3387(3.67)|      2055(2.23)|39.33|
| 14|      28|       392|      1452(3.70)|       893(2.28)|38.50|
| 15|      18|       270|       998(3.70)|       632(2.34)|36.67|
| 16|      10|       160|       582(3.64)|       386(2.41)|33.68|
| 17|       7|       119|       438(3.68)|       272(2.29)|37.90|
 
|   |  192339|    848684|   3161603(3.73)|   2174861(2.56)|31.21|