[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] REORDERING: stability issues and UTC solutions




----- Original Message ----- 
From: "James Seng/Personal" <jseng@pobox.org.sg>
To: "Soobok Lee" <lsb@postel.co.kr>; <idn@ops.ietf.org>; "Martin Duerst" <duerst@w3.org>
Sent: Saturday, October 20, 2001 2:31 AM
Subject: Re: [idn] call for comments for REORDERING


> > maybe problematic and unsafe.
> > What if future NFC/NFKC maps them into other code points ?
> > There will be a mess, too.
> 
> read the design principle of NFC or NFKC. Future addition of scripts and
> addition normalization will *NOT* cause existing normalized string to
> changed.

My  question was that:
  1) newly-approved TAGALOG characters X,Y  have   NFC X -> Y,
  2) old Nameprep/ACE encodes X.com and Y.com as two distinct ACE labels,
      because it don't know about NFC X -> Y
  3) new Nameprep/ACE encodes X.com as the the same ACE label as Y.com's 
      with NFC  X -> Y.

  If X.com and Y.com are taken by two distinct registrants, there will
    be a mess. of course, it should have be blocked by careful registries.
  Even if new ACE libaries are distributed, some old ACE tools will still
    try to send X.com dns queries which may fail.
 

> 
> do you have the same stability principle or ensure that to be the same
> for re-ordering? or one which can be sure?

yes. Like old nameprep bypasses unassigned code points for future TAGALOG script,
REORDERING bypasses,too. No ACE improvement for taglog now.
See below.

> 
> > Current REORDERING does nothing with TAGALOG and adds
> > no new problem into ACE. the problem is alerady inherent
> > in ACE and nameprep version scheme, and is not due to
> > REORDERING.
> 
> But Tagalog may then come back to IDN to claim unfair treatment!
> Are you going to deny them that?

Are you saying ?:
   Tagalog folks may come back  with complaints about old nameprep 
   which has no NFC X->Y, saying "unfair treatment!" ??

Before TAGALOG is added, current nameprep/REORDERING have 
no reponsibility for inevitable missing supports of them.

Now I will propose two solutions.

1) last resort: somewhat tricky

Future TAGALOG may provides two sets of TAGALOG basic alphabets.
One set A in official lexicographical ordering and the other set B 
is in frequecy ordering (sub-optimal one OKAY) with 1:1 NFKC defined 
from A onto B.
Then all tagalog basic alphabets in Set A will be "reordered by NFKC"
in nameprep , not in ACE. A and B share the same font.
Then valid ACE labels of TAGALOG script only contains characters from B.
This may have some problems in comparisons which i have no full analysis.

2) UTC solution

IF UTC accepts REORDERING as an official normalization form like 
NF-REORDERING , then we need no such tricks like above, and
TAGALOG support can be done within that NF in the new 
NAMEPREP steps:  mapping/NFKC/PROHIBIT and then NF-REORDERING .

Frequency tables are always sub-optimal in its nature, and 
marginal frequency fluctuation will occurs to make marginal 
efficieny changes, but in most cases, i am sure, it will benefit
most of TAGALOG labels, and that's why i push REORDERING into UTC. 

In conclusion, 2) is the my preferenece.
It's the best way for acquiring stability and authority and applicability
of REORDERING.


> 
> > Prefix-based version scheme will solve those problems.
> 
> No no no. Lets not get into other scheme which involves playing with
> multiples prefix or suffix as "version tag". Try not to make things more
> complex here.
> 
> The key word here is stability. You have not address it yet.

Answered. with the above solutions we need no version tag..


> 
> And no, there is no such thing as "only compress" algorithm. It is
> mathemtically impossible.

Already answered in the separate thread "[idn] REORDERING : makes labels shorter or longer  ?".


Soobok Lee

> 
> -James Seng
> 
>