[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] opting out of SC/TC equivalence

To: shollenbeck@verisign.com
Subject: Re: [idn] opting out of SC/TC equivalence
From: liana.ydisg@juno.com
Date: Thu, 16 Aug 2001 01:05:21 -0700
Cc: idn@ops.ietf.org
Hi, Scott,
  I think you have raised a good question.  However, there is 
no such an " end result should be applicable to _any_ name
 space."  The uppper to low case folding only apply to alphabet
systems, the SC/TC folding only apply to character based 
systems, consonant system has many small scripts all within
128 codepoints.  Following you assertion,  we shall take Latin 
case folding out of [nameprep] too.

From what I have been studing in the past serveral weeks, 
  mixed use of a script is a common situation.  English mixes with
Greek,  French and German in the States, Arabic script used by
several different languages, where each is subset of the Unicode
Aracbic block, Devanagari transcribs Tamil symbols, and the 
current discussion on TC/SC is well known case. 

  We can not affort to have each spoken language to be a 
processing entity in [nameprep], since they are normaly a
subset of a particular script.  

We can allow script based entity to be the processing unit, that 
is CJK, Latin, Cyrillic, Arabic and Indian languages as a few 
script pools.  How many would be there is up to political
debate as it is done in United Nations.  Lately, I have come 
across an article in "San Jose Mercury News" which  says
 "Azerbaijan mandates use of Latin alphabet". From technical side, 
we can say how many of the pools is technically sensible, and
let other languages, such as Tibetan, Vietnanese, Lao to
decide which pool they want to be in, depend on their users as 
well as how their scripts can be best handled technically.  

My initial thought on the pools are:
character based: Chinese, Japanese, Yi
Alphabetic: Latin, Greek, Cyrillic, any IPA based languages
Consonant languages: Indian languages, Arabic, Tibetan

The reason for such pools are simple non-semantic 
handling for [nameprep]:
Character based scripts do have large tables of case folding. 
SC/TC is an example.  I am certain that there will be
Kanji/SC folding and possibly Yi/SC folding too.  Will
there be Hangul/TC folding or  Vietnanese/TC folding? 
I have not heard that but there is historical possibility.  
SC/TC folding affects about 200 one-one folding 
(some 3-1 too), and 146 block to block foldings
affects over 2000 characters. 
 
Alphabet languages all have upper--to-lower case foldings, 
including IPA, which is used for newly created scripts in Africa.  
And they do borrow diacritics from other languages, 
and letters from Greek.  The best example is American 
English.  This case folding affects Unicode block
002-04f,  four sets of scripts: Latin, IPA, Greek and 
Cyrillic, over 100 upper cases.

Consonant languages do not have case foldings but
they borrow symbols from each other.  They transcribe, 
but usally not include Latin or CJK in their writing. 
This group needs codepoint differenciation reducing script
confusion among Armenian, Lao, Thai, Georgian, and 
among a dozent of Indian scripts.  A codepoint verification
of its legal block (normally within 128 codepoints each) shall 
be sufficient.  (With political implication: do they want one 
pure script or more than one?)

I would say we shall divide the languages into several 
language pools, and "the end result should be applicable 
to _any_ name space" within that pool.

Now, I am back to SC/TC folding.  Depending on the
 character encoding used, the folding has four cases:
case 1: transliteration  encoding: no folding;
case 2:GB -to- ACE' : no folding;
case 3:Big5 -to- ACE":  no folding;
case 4:Unicode: block to block and one to one folding.
The first three are realy IDNA output, the last one is the 
case we have to consider in here.

Suppose TC folds to SC, (SC is a smaller than TC)
 in current IDNA> [nameprep]> ACE schem,  then:
GB > Unicode > [nameprep] > ACE;
Big5 > Unicode > fold to SC > [nameprep] > ACE;
Mixed Unicode > fold to SC > [nameprep] > ACE.

So, 
	IDNA?			[nameprep]?		ACE
GB > Unicode SC
Big5>Unicode > SC Unicode	Latin Case folding	ACE encoding
Mixed Unicode > SC Unicode	

If this is the case, then what IDNA is doing? What about other 
three cases IDNA is supposed to handle? What is the purpose for
SC Unicode go through [nameprep] at all, why not directly go
to ACE then?

I do not think the above is a reasonable processing model. 
 Instead all the above should be in [nameprep]. [nameprep]
should be divided into three pool cases (up for debate) and
 treated with somewhat uniformed procedure for each of the three:
alphabet: case folding 
character: case folding, allow GB, Big5, JIS, and KSC to Unicode 
	mapping too
consonant system:  allow ISCII to Unicode mapping and tag 
	identification for script specific process into ACE.

Liana


On Wed, 15 Aug 2001 06:54:35 -0400 "Hollenbeck, Scott"
<shollenbeck@verisign.com> writes:
> >-----Original Message-----
> >From: tsenglm@cc.ncu.edu.tw [mailto:tsenglm@cc.ncu.edu.tw]
> >Sent: Wednesday, August 15, 2001 2:11 AM
> >To: ben; Adam M. Costello; idn@ops.ietf.org
> >Subject: Re: [idn] opting out of SC/TC equivalence
> >
> >
> >       In HongKong , Taiwan,  user use BIG5 code . This code set 
> has no
> >simpified chinese scripts. In China , GB code set has no 
> >traditional chinese
> >scripts . So there are no mixed type of  GB and BIG5 .  But you 
> know
> >VeriSign/NSI  announced ML.com  with any UNICODE can be mixed.
> >That is the key problems.
> >       Any suggestions must be considered  what to do for .COM 
> >in this WG.
> 
> If composing labels of "mixed" Unicode code points is believed to be 
> a key
> problem, that's an issue with current draft documents being 
> developed and
> discussed in this WG -- which clearly permit this method of 
> composition.  If
> such a composition method doesn't make sense in a particular local 
> context,
> it likely won't be widely used to create labels in that local 
> context -- but
> that doesn't imply that the conventions of the local context should 
> be
> applied everywhere else.
> 
> This WG shouldn't attempt to produce solutions particular to a 
> specific name
> space.  The end result should be applicable to _any_ name space.
> 
> <Scott/>
>
Prev by Date: Re: [idn] opting out of SC/TC equivalence
Next by Date: Re: [idn] opting out of SC/TC equivalence
Prev by thread: Re: [idn] opting out of SC/TC equivalence
Next by thread: Re: [idn] opting out of SC/TC equivalence
Index(es):
- Date
- Thread