[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Which lanuages/scripts to reorder?




----- Original Message ----- 
From: "Erik Nordmark" <Erik.Nordmark@eng.sun.com>
To: "Soobok Lee" <lsb@postel.co.kr>
Cc: "Erik Nordmark" <Erik.Nordmark@eng.sun.com>; "Eric Brunner-Williams in Portland Maine" <brunner@nic-naa.net>; <idn@ops.ietf.org>
Sent: Tuesday, October 23, 2001 10:12 PM
Subject: Re: [idn] Which lanuages/scripts to reorder? 


> > Therefore, your concerns about long path of reordering revisioning is NOT
> > new problem,but rather just one of never-terminaing nameprep/ACE revisioning
> > problem.
> 
> I think they are very different.
> 
> nameprep/stringprep would probably need to be revised if there are Unicode code
> points added before the document becomes an RFC.
> 
> My point about reordering is that it might need to be revised
> each time somebody requests adding support for one of the 
> languages/scripts that are already supported in Unicode, even if Unicode
> doesn't add anything.
> I don't know how many languages/scripts
> that Unicode currently support but the number is presumably a few hundered.

Hundreds of languages, but, Not so many scripts. AFIAK,extended Latin scripts are shared by hundreds of languages as their writing systems. For latin labels which are already most favored by ACE, reordering does not help ( only 1% save on ACE length ). MY REORDERING I-D includes reordering on latin script but recommends not to include it because of its marginal gain on latin.

Each script has its ACE-Z label length constant which is listed in my
REORDERING I-D 2.0.
many scripts has relatively small set of basic alphabets, and their
ACE label length is less than 2*N.  han/hangeul ACE label length is
greater than 3*N without reordering. With reordering, it falls to 2.2*N.

MY REORDERING I-D 2.0 suggests one option of supporting only Han/HANGEUL 
which are most dis-favored scripts in ACE. If we decide to support all
major scripts in the current UNICODE, it will be done in the my I-D 3.0.

if we setup a certain threshold value on the label length constant for each script to determine the necessity of reordering supports, we could also restrict the set of new added scripts for which we should prepare reorderings in a regular fashion. I propose the threshold should be 1.7~ 2.2
with which threshold, only han/hangul/ethiopic/katakana/hiragana would be
supported in reordering.

And many script are archaic or used by too few native-speakers or scholars.
cost/benefits analysis based on the population using each script can be
performed and helps. That's already contained in My REORDERING I-D 2.0.

I enclose some proposed scripts in unicode. Most of them from minorities or 
un-industrialized countries or extinct/archaic ones. 
I expect those scripts would take fair amounts of time to be actively 
used in IDN in the future. That would determine when we will have major upgrades on  
reordering supports and namepre/ACE supports for them.

Soobok Lee
----------

For the BMP 
  a.. Cham 
  b.. Tai (Dai) scripts 
  c.. Glagolitic 
  d.. Coptic 
  e.. Buginese 
  f.. Old Hungarian 
  g.. Phoenician 
  h.. Avestan 
  i.. Tifinagh (Berber) 
  j.. Javanese 
  k.. Lepcha (Rong) 
For Plane 1 (surrogates) 

  a.. Basic Egyptian Hieroglyphics 
  b.. Meroitic 
  c.. Old Persian Cuneiform 
  d.. Tengwar 
  e.. Cirth 
  f.. Brahmi 
  g.. Old Permic 
  h.. South Arabian 
  i.. Pollard 
  j.. Blissymbolics 
  k.. Soyombo 


Approved ones.

Philippine Scripts 

  a.. Tagalog 
  b.. Hanunóo 
  c.. Buhid 
  d.. Tagbanwa 

Pending..
  a.. Linear B (see also the overview for Aegean scripts) 
  b.. Ugaritic Cuneiform 
  c.. Shavian 
  d.. Osmanya 
  e.. Cypriot syllabary (see also the overview for Aegean scripts) 
For the BMP

  a.. Limbu 


> 
>   Erik
>