[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] (self-comment)Re: REORDERING: stability issues and UTC solutions



Self-comments:
 
----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
 
1) last resort: somewhat tricky

Future TAGALOG may provides two sets of TAGALOG basic alphabets.
One set A in official lexicographical ordering and the other set B
is in frequecy ordering (sub-optimal one OKAY) with 1:1 NFKC defined
from A onto B.
Then all tagalog basic alphabets in Set A will be "reordered by NFKC"
in nameprep , not in ACE. A and B share the same font.
Then valid ACE labels of TAGALOG script only contains characters from B.
This may have some problems in comparisons which i have no full analysis.

2) UTC solution

IF UTC accepts REORDERING as an official normalization form like
NF-REORDERING , then we need no such tricks like above, and
TAGALOG support can be done within that NF in the new
NAMEPREP steps:  mapping/NFKC/PROHIBIT and then NF-REORDERING .
NF-REORDERING should work only for newly-added SCRIPT BLOCK like TAGALOG.
NF-REORDERING requires separate code points for reordered ones mapped from
 basic alphabets in the same TAGALOG script block like solutions 1).
The only difference between solution 1) and 2)  is whether the REORDERING is
in NFKC-trick or in Official new NF.
 
With these two sets of basic alphabets, old nameprep/ACE+REORDERING libaries
may encode the reordered character set and get the compression transparently
without any code/data upgrades in old applications.
 
If it uses the other un-reordered character set, it will have un-compressed
different lACE label which is , of course, made to be equivalent to the other
reordered ACE label  by zone-master's multiple-registrations provisions.
 
NF-REORDERING should be applied before ACE-encoding and
should be reversed after ACE-decoding for rendering for new added SCRIPT.
 
Existing scripts could be reordered in ACE as  now.
 
Any suggestion welcomed.
 
 
Soobok Lee


Frequency tables are always sub-optimal in its nature, and
marginal frequency fluctuation will occurs to make marginal
efficieny changes, but in most cases, i am sure, it will benefit
most of TAGALOG labels, and that's why i push REORDERING into UTC.

In conclusion, 2) is the my preferenece.
It's the best way for acquiring stability and authority and applicability
of REORDERING.