[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] call for comments for REORDERING<024301c149a0$c65b1ee0$ec1bd9d2@temp><4.2.0.58.J.20011010135529.03d9e6a0@localhost><024b01c15847$11e8b430$0501000a@jamessonyvaio>

To: "Soobok Lee" <lsb@postel.co.kr>,"James Seng/Personal" <jseng@pobox.org.sg>, <idn@ops.ietf.org>
Subject: Re: [idn] call for comments for REORDERING<024301c149a0$c65b1ee0$ec1bd9d2@temp><4.2.0.58.J.20011010135529.03d9e6a0@localhost><024b01c15847$11e8b430$0501000a@jamessonyvaio>
From: Martin Duerst <duerst@w3.org>
Date: Fri, 19 Oct 2001 15:46:26 +0900

At 12:13 01/10/19 +0900, Soobok Lee wrote:

> > And with the existing draft, it does not explain how it going to deal
> > with new codepoints in ISO10646 in the future, nor does it explain the
> > process to implementing them. The critia here is stablity - if new code
> > is added and tables for re-ordering expand, then the algorithm should
> > not make existing names invalided.

I think this is an extremely important concern.

>REORDERING's mapping occurs within each script block, NOT across script 
>blocks.
>Therefore, new additions of script block won't invalidate or collide
>with existing names. Even additions of new rarely used characters in
>existing script block won't affect the performance of REORDERING.
>In this respect, REORDERING maintains stability over time.

For additions of rare characters, in most cases yes.
But let's assume that these are characters that are not
used in one language (A), but quite frequent in another (B).
People using language (B) might then come forward and ask
for a different reordering to fit their language better.

In general, it's easily possible that the statistics you
currently have favor e.g. the major language that uses
that script, but disfavor another language. It could e.g.
be that the reordering for Arabic makes Arabic names shorter,
but makes Farsi or Urdu names longer. Even if there may
be many more Arabic than Farsi or Urdu names, this would
be quite a bit unfair. And we just don't have enough data
at the moment to be able to say this is not the case.

>Current IDNA/nameprep does not prohibit, but discourage including 
>unassigned code points in legal IDN labels, because new normalization/case 
>mappings
>would be defined on them in the future. some ACE labels including unsigned 
>code block (tagalog?) might be proven invalid in the future. Nameprep/NFKC 
>Versioning tag schems using new ACE prefix will be needed in the future, i 
>guess.

Yes. But for the majority of really useful characters, in old
and new scripts, it's rather obvious that they will be allowed.
On the other hand, it's totally unclear how to reorder them.

Also, in case of some implementation mistake in Nameprep/NFKC,
in most cases, it will just make a few names unusable, but
not affect the rest. For reordering, a bug will completely
confuse a whole script.

Also, now we have a testbed, and you just think that the testbed
is representative. But once IDN is running, to run a testbed
for a new script will be difficult, because we need the testbed
data for the reordering statistics, but we need the reordering
for the testbed.

>Therefore, REORDERING  follows the same IDNA/nameprep recommendations on 
>the issues
>of new SCTIPT blocks/unsigned code points.
>
>This statements will be included in the final version of REORDERING I-D.

Well, in theory, this works. But I think it's just too unstable
to work in practice.

Regards,    Martin.

Prev by Date: Re: [idn] call for comments for REORDERING
Next by Date: Re: [idn] call for comments for REORDERING
Prev by thread: Re: [idn] call for comments for REORDERING<024301c149a0$c65b1ee0$ec1bd9d2@temp><4.2.0.58.J.20011010135529.03d9e6a0@localhost>
Next by thread: [idn] Re: A Search-based access model & SC/TC conversion
Index(es):
- Date
- Thread