[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] WG Update





On Thu, 4 Oct 2001 09:50:15 +0800 "James Seng/Personal"
<jseng@pobox.org.sg> writes:
> In the last London meeting, there are still some outstanding issues
> which we agreed to bring it back to the mailing list to discuss.
> 
> First, the issue of nameprep. nameprep have been renamed to 
> stringprep
> and the new draft is available few days ago. Please review
> draft-ietf-idn-nameprep-06.txt.
> 
> And related to nameprep are the jpchar, hanguelchar and tsconv 
> drafts.
> The authors should get together to consider an architecture 
> document.
> We should set a deadline for this to be concluded, probably sometime
> before Salt Lake meeting.
> 
> This architecture document should take into consideration of:
> a. the recommendation from the Unicode Consortium to the WG dated 
> 02Sept
> b. the source and stablity of the referenced work/codepoints
> 

I disagree Unicode Consortium to the WG dated 02Sept 
recommendation.  

Unicode has been very effective to collect scripts and glyphs
of all scripts, and even comes up with Unified CJK character
set, which is essential for IDN implementation.  I call this  
 the FIRST level of look-alike equivalence.  This equivalence
includes subset of secode level equivalency - semantics.  
However, it can not be very effective for dealing with large sets
 of SECOND level semantics, or usage-alike equivalence 
due to mear size in term of UCS space, but it is okay for 
smaller scripts such as Latin scripts. 

The nameprep is a place to take Unicode work and builds
onto it, not simply repeats it. 
 
Unicode can specify look-alike code points within a script 
but not cross scripts, IDN needs to do it;
Unicode can specify usage-alike code points within Latin
 as case mapping, but not for CJK, IDN needs to do it;
Unicode can provide all the glyphs available, but  can not 
provide effective access for large character set without 
some effective access support, for example on CJK 
character set, IDN needs to do it, and nameprep is the 
 place to start on it. 

> (Personally, I would strongly recommend that nameprep remains as it 
> is
> and the rest of the "localization" to be deal at a different level,
> either at the input method or below at the zone file.)
> 

Obviously, I disagree on " nameprep remain as is", since
Nameprep only dealing with Latin languages, it even includes
 code points form UCS Plane 1, resulted into 9 to 1 case
 mapping, which against John's earlier recommendation 
 to limit IDN mapping to Plane 0,  while Chinese TC/SC 
  mostly 1 to 1 mapping and a few are 3 to 1 mapping,
why do they have to be treated differently in nameprep? 

 If nameprep remains as it is, the other languages but 
Latin, Greek, Cyrillic and Armenian, have to go through
 another level of filtering.  This creates not only fair treatment
 issue, but also posts look-alike character consistency 
 treatment problem.  

Within the current nameprep spec first, take Armenian letter 
"n", which is identical to Latin "n", they are identical, but 
we don't have a way to treat this cross scripted look-alike
in nameprep flat mapping.

If we treate this Armenian case in two levels, how shall we 
do it?  Are there any differences the user will see? 


> Secondly, there is a reordering draft draft-ietf-idn-lsb-ace. We 
> have
> already seen some results on reordering but we have not seen much
> discussion on the downside of reordering. In anycase, we would like
> to encourage further discussion on reordering draft as we would like
> to conduct a strawpoll on the draft soon.
> 

In my opinion, draft-ietf-idn-lsb-ace plays a role that can not be 
clearly understood without the IDN architecture is clearly 
defined.  What is the user locality on UCS usage?
1) Language tagged mechanism, covers strong locality users,
 for example, Chinese, Japanese.  There are no proposals 
 reflecting how the Indic languages, Hebrew and Arabic
 languages shall  be handled yet.  Are there any strong local
  standards installed  and data for references?  It seems 
 ISO-8859 users has  voiced something, but no specifics.
2) The scattered script use can be handled by AMC, whatever 
 the name is. 
 
If we use option 2), there are clear benefit for some users, 
but it is worth the extra complexity for all, including Latin?

If we use option 1), then do we divide the UCS in script groups
as they are mentioned in idn-map proposal or not? If we do,
then there is place for localized compression scheme which 
will not effect users who do not need it. 

Liana