[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING



thanks for you comment. Here goes my answer.

----- Original Message ----- 
From: "Martin Duerst" <duerst@w3.org>
To: "Soobok Lee" <lsb@postel.co.kr>; <idn@ops.ietf.org>
Sent: Wednesday, October 10, 2001 2:16 PM
Subject: Re: [idn] call for comments for REORDERING


> 
> The additional complexity introduced by reordering is a very
> serious problem. It is true that this complexity is somewhat
> similar to the complexity of e.g. nameprep or conversion from
> a legacy encoding. However, on many platforms, both conversion
> from a legacy encoding as well as many aspects of nameprep
> are available as libraries, and are used for other purposes.

Current Windows 98,2K,XP and Linux  contain   NFKC codes ?

> In particular on constrained devices (mobile phones,...),
> most of nameprep can be simplified a lot if one knows what
> kinds of characters can be input.

In this case, the reordering table _also_ can be simplified
only to that input script blocks .

As for arabic/hindi/kata/hira/tamil/greek/hebrew, 
it adds only +10~+20 lines of simple character mapping array
for each script.
those additional lines of data are less than the # of  comment
lines in the ACE source code.



> 
> On the other hand, the benefits for the users are actually
> very small. Nobody wants to input domain names with 15 or
> more Hanzi or Hangul. Nobody will be able to remember them.
> Writing them down on a napkin will take a long time.
> Every company or organization that has such a long label
> in their domain name, and no shorter alternative, will
> simply not get any contacts directly to their web site.
> If they have a short alternative, why do they need a
> long version? (please note that there is no danger of
> spoofing by somebody else getting the long version :-).

lets' think about the shorter ACE label produced for the native label of 
mean _average_ length. 
The next table for hangul script block says:
  For N=6, 3.12*6 - 2.28*6 = 18 - 13 = 5 characters are saved.

The main benefits of REODERING is not only for very long domains,
but for average ones. It also helps administation,transcription and eye-
comparison on the ACE labels.
  
Moreover, as IDNA I-D recommends, 
ACE labels should be rendered "as it is",
when the decoded native-label contains characters which 
the rendering engine cannot display. For example,
if you have _no_ huge han script font set in your mobile phone (or PC),
the ACE labels for han/hangul email addresses of your friends should
be displayed "as it is". But, it may be often too long to be displayed 
in your narrow phone LCD lines of width 16~20, without REORDERING.


Moreover, If we get ACEed i18n email addresses in the future, 
its native form will look like XXXX@YYYYYY.com. In the case of han/hangul script , 
the sum of lengths of two non-latin strings (XXXX,YYYYYY) may 
exceed 10 very often. 
For such cases, the reordering would help  to save up to 8~9
characters for ACEed email addresess. Big saves..


8. hangul-1024

|  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
|  1|    1953|      1953|      7812(4.00)|      7812(4.00)| 0.00|
|  2|   17149|     34298|    124782(3.64)|    106238(3.10)|14.86|
|  3|   39643|    118929|    403205(3.39)|    323801(2.72)|19.69|
|  4|   62285|    249140|    816093(3.28)|    622067(2.50)|23.77|
|  5|   39675|    198375|    636102(3.21)|    470174(2.37)|26.09|
|  6|   23891|    143346|    452483(3.16)|    326242(2.28)|27.90|
|  7|   12448|     87136|    271953(3.12)|    192139(2.21)|29.35|
|  8|    5441|     43528|    134600(3.09)|     94322(2.17)|29.92|
|  9|    2264|     20376|     62405(3.06)|     43266(2.12)|30.67|
| 10|     895|      8950|     27223(3.04)|     18764(2.10)|31.07|
| 11|     373|      4103|     12420(3.03)|      8511(2.07)|31.47|
| 12|     141|      1692|      5080(3.00)|      3505(2.07)|31.00|
| 13|      77|      1001|      2986(2.98)|      2039(2.04)|31.71|
| 14|      32|       448|      1331(2.97)|       911(2.03)|31.56|
| 15|      20|       300|       884(2.95)|       603(2.01)|31.79|
| 16|      10|       160|       460(2.88)|       337(2.11)|26.74|
| 17|       7|       119|       354(2.97)|       243(2.04)|31.36|

|All|  206304|    913854|   2960173(3.24)|   2220974(2.43)|24.97|


non-CJK script often has long labels that exceeds 15.


1. arabic

|  N|    FREQ|    N*FREQ|  SUM OF AMCZ(X)| SUM OF LAMCZ(Y)| COMP|
|  1|      42|        42|       126(3.00)|       126(3.00)| 0.00|
|  2|      59|       118|       258(2.19)|       249(2.11)| 3.49|
|  3|     363|      1089|      2121(1.95)|      1992(1.83)| 6.08|
|  4|     888|      3552|      6359(1.79)|      5811(1.64)| 8.62|
|  5|    1122|      5610|      9550(1.70)|      8529(1.52)|10.69|
|  6|    1009|      6054|      9890(1.63)|      8620(1.42)|12.84|
|  7|     845|      5915|      9309(1.57)|      8134(1.38)|12.62|
|  8|     378|      3024|      4590(1.52)|      3992(1.32)|13.03|
|  9|     263|      2367|      3523(1.49)|      3063(1.29)|13.06|
| 10|     152|      1520|      2230(1.47)|      1941(1.28)|12.96|
| 11|     130|      1430|      2058(1.44)|      1787(1.25)|13.17|
| 12|     110|      1320|      1873(1.42)|      1614(1.22)|13.83|
| 13|      67|       871|      1230(1.41)|      1040(1.19)|15.45|
| 14|      61|       854|      1211(1.42)|      1015(1.19)|16.18|
| 15|      52|       780|      1085(1.39)|       924(1.18)|14.84|
| 16|      34|       544|       743(1.37)|       630(1.16)|15.21|
| 17|      11|       187|       256(1.37)|       218(1.17)|14.84|
| 18|      19|       342|       465(1.36)|       392(1.15)|15.70|
| 19|       8|       152|       201(1.32)|       175(1.15)|12.94|
| 20|      10|       200|       268(1.34)|       235(1.18)|12.31|
| 21|       3|        63|        85(1.35)|        75(1.19)|11.76|
| 22|       4|        88|       116(1.32)|        99(1.12)|14.66|
| 23|       3|        69|        89(1.29)|        76(1.10)|14.61|
| 24|       2|        48|        62(1.29)|        55(1.15)|11.29|
| 25|       5|       125|       165(1.32)|       143(1.14)|13.33|
| 26|       2|        52|        67(1.29)|        56(1.08)|16.42|
| 27|       2|        54|        73(1.35)|        61(1.13)|16.44|
| 33|       1|        33|        41(1.24)|        37(1.12)| 9.76|
| 34|       1|        34|        45(1.32)|        36(1.06)|20.00|

|All|    5646|     36537|     58089(1.59)|     51125(1.40)|11.99|

Regards,

Soobok Lee

> 
> So this is a solution in search of a real problem,
> not worth bothering the whole world with additional
> complexity.
> 
> 
> Regards,   Martin.
> 
> 
> At 13:11 01/10/10 +0900, Soobok Lee wrote:
> >Hi, all
> >
> >I am ready to receive any criticisms on REORDERING.
> >Any suggestions of improvements or downsides of REORDERING are all welcome.
> >
> >I expect many feedbacks from non-US and non-European participants and 
> >observers.
> >
> >Thanks.
> >
> >Soobok Lee
>