[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] REORDERING: stability issues and UTC solutions
----- Original Message -----
From: "James Seng/Personal" <jseng@pobox.org.sg>
To: "Soobok Lee" <lsb@postel.co.kr>; <idn@ops.ietf.org>; "Martin Duerst" <duerst@w3.org>
Sent: Saturday, October 20, 2001 2:31 AM
Subject: Re: [idn] call for comments for REORDERING
> > maybe problematic and unsafe.
> > What if future NFC/NFKC maps them into other code points ?
> > There will be a mess, too.
>
> read the design principle of NFC or NFKC. Future addition of scripts and
> addition normalization will *NOT* cause existing normalized string to
> changed.
My question was that:
1) newly-approved TAGALOG characters X,Y have NFC X -> Y,
2) old Nameprep/ACE encodes X.com and Y.com as two distinct ACE labels,
because it don't know about NFC X -> Y
3) new Nameprep/ACE encodes X.com as the the same ACE label as Y.com's
with NFC X -> Y.
If X.com and Y.com are taken by two distinct registrants, there will
be a mess. of course, it should have be blocked by careful registries.
Even if new ACE libaries are distributed, some old ACE tools will still
try to send X.com dns queries which may fail.
>
> do you have the same stability principle or ensure that to be the same
> for re-ordering? or one which can be sure?
yes. Like old nameprep bypasses unassigned code points for future TAGALOG script,
REORDERING bypasses,too. No ACE improvement for taglog now.
See below.
>
> > Current REORDERING does nothing with TAGALOG and adds
> > no new problem into ACE. the problem is alerady inherent
> > in ACE and nameprep version scheme, and is not due to
> > REORDERING.
>
> But Tagalog may then come back to IDN to claim unfair treatment!
> Are you going to deny them that?
Are you saying ?:
Tagalog folks may come back with complaints about old nameprep
which has no NFC X->Y, saying "unfair treatment!" ??
Before TAGALOG is added, current nameprep/REORDERING have
no reponsibility for inevitable missing supports of them.
Now I will propose two solutions.
1) last resort: somewhat tricky
Future TAGALOG may provides two sets of TAGALOG basic alphabets.
One set A in official lexicographical ordering and the other set B
is in frequecy ordering (sub-optimal one OKAY) with 1:1 NFKC defined
from A onto B.
Then all tagalog basic alphabets in Set A will be "reordered by NFKC"
in nameprep , not in ACE. A and B share the same font.
Then valid ACE labels of TAGALOG script only contains characters from B.
This may have some problems in comparisons which i have no full analysis.
2) UTC solution
IF UTC accepts REORDERING as an official normalization form like
NF-REORDERING , then we need no such tricks like above, and
TAGALOG support can be done within that NF in the new
NAMEPREP steps: mapping/NFKC/PROHIBIT and then NF-REORDERING .
Frequency tables are always sub-optimal in its nature, and
marginal frequency fluctuation will occurs to make marginal
efficieny changes, but in most cases, i am sure, it will benefit
most of TAGALOG labels, and that's why i push REORDERING into UTC.
In conclusion, 2) is the my preferenece.
It's the best way for acquiring stability and authority and applicability
of REORDERING.
>
> > Prefix-based version scheme will solve those problems.
>
> No no no. Lets not get into other scheme which involves playing with
> multiples prefix or suffix as "version tag". Try not to make things more
> complex here.
>
> The key word here is stability. You have not address it yet.
Answered. with the above solutions we need no version tag..
>
> And no, there is no such thing as "only compress" algorithm. It is
> mathemtically impossible.
Already answered in the separate thread "[idn] REORDERING : makes labels shorter or longer ?".
Soobok Lee
>
> -James Seng
>
>