[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING<024301c149a0$c65b1ee0$ec1bd9d2@temp><4.2.0.58.J.20011010135529.03d9e6a0@localhost>



At 15:43 01/10/10 +0900, Soobok Lee wrote:

>From: "Martin Duerst" <duerst@w3.org>

> > The additional complexity introduced by reordering is a very
> > serious problem. It is true that this complexity is somewhat
> > similar to the complexity of e.g. nameprep or conversion from
> > a legacy encoding. However, on many platforms, both conversion
> > from a legacy encoding as well as many aspects of nameprep
> > are available as libraries, and are used for other purposes.
>
>Current Windows 98,2K,XP and Linux  contain   NFKC codes ?

No. My personal preference would be for NFC, anyway, because
most of the additional mappings in NFKC are not really useful,
because they are from characters that are quite difficult
to type in. So these characters could just be forbidden rather
than mapped. The only exception I know is full-width ASCII,
which could either be included in the nameprep mapping step,
or could be done as a front-end 'quality of implementation'
step (similar to how the ideographic fullstop is currently
treated).

Anyway, it's not so much a question of whether current versions
contain that code (there are libraries that do), but how much
other software will use it. There will definitely be other
software that uses NFC, and quite probably also NFKC.
This is not at all the case for reordering.


> > In particular on constrained devices (mobile phones,...),
> > most of nameprep can be simplified a lot if one knows what
> > kinds of characters can be input.
>
>In this case, the reordering table _also_ can be simplified
>only to that input script blocks .

Yes. But NFC or NFKC boils down to nothing for many subsets,
while reordering doesn't.


>As for arabic/hindi/kata/hira/tamil/greek/hebrew,
>it adds only +10~+20 lines of simple character mapping array
>for each script.
>those additional lines of data are less than the # of  comment
>lines in the ACE source code.

The comment lines obviously don't get compiled and
put on a small device.


> > On the other hand, the benefits for the users are actually
> > very small. Nobody wants to input domain names with 15 or
> > more Hanzi or Hangul. Nobody will be able to remember them.
> > Writing them down on a napkin will take a long time.
> > Every company or organization that has such a long label
> > in their domain name, and no shorter alternative, will
> > simply not get any contacts directly to their web site.
> > If they have a short alternative, why do they need a
> > long version? (please note that there is no danger of
> > spoofing by somebody else getting the long version :-).
>
>lets' think about the shorter ACE label produced for the native label of
>mean _average_ length.
>The next table for hangul script block says:
>   For N=6, 3.12*6 - 2.28*6 = 18 - 13 = 5 characters are saved.
>
>The main benefits of REODERING is not only for very long domains,
>but for average ones. It also helps administation,transcription and eye-
>comparison on the ACE labels.

It doesn't really help. Transcription and eye-comparison of
ACE labels is a pain. If they are a bit shorter, it still
just hurts.


>Moreover, as IDNA I-D recommends,
>ACE labels should be rendered "as it is",
>when the decoded native-label contains characters which
>the rendering engine cannot display. For example,
>if you have _no_ huge han script font set in your mobile phone (or PC),
>the ACE labels for han/hangul email addresses of your friends should
>be displayed "as it is". But, it may be often too long to be displayed
>in your narrow phone LCD lines of width 16~20, without REORDERING.

Displaying ACE when nothing else is available is acceptable
as a 'best-effort' solution. But it is very far from a good
solution, and designing the rest of IDN for this doesn't
make sense.

I have no problem with IDNA (if it stays at a SHOULD level),
but on a mobile phone, I would probably just use a few question
marks to make clear that there is something that cannot be
displayed.


>Moreover, If we get ACEed i18n email addresses in the future,
>its native form will look like XXXX@YYYYYY.com. In the case of han/hangul 
>script ,
>the sum of lengths of two non-latin strings (XXXX,YYYYYY) may
>exceed 10 very often.
>For such cases, the reordering would help  to save up to 8~9
>characters for ACEed email addresess. Big saves..

For what? For transcribing ACE? Who wants to do that?
Is that what we are designing IDN for? I very much hope not.


Regards,   Martin.