[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] SC/TC equivalence



All the counter-arguments are valid.

1. SC/TC equivalence is similar to case folding for Latin, only 
it is a lot more larger.  When it is large, there comes with a variety
cases of equivalence.  All the examples mentioned exist for CJK,
but it is not deter the primary characteristics of a case folding.

2. From a user's viewpoint, the requirement is multitude.  
SC is a simpler set of CJK then TC. But users in China want 
TC in their names for artistic, special appealing, and specific
semantics in the TC.  Users outside of China, want SC for 
marketing to China, and simpler names (especial  the 
younger generations) for themselves to master.  Hong Kong 
is a good place to observe both of these users' wish-lists. 

3. To be handled reasonably for the above requirements, it is
reasonable to treat SC/TC as case-insensitive identifiers, 
consistent with DNS tradition, and only provide simple and
portable communication functionality to users from this 
standard body, and let applications deal with secondary 
equivalence issues. 

4. Unicode is large, so [nameprep] is large too.  However,
Unicode is a database.  It is up to us to use it efficiently.   
SC/TC case folding is large too,  but it is just a large table
sitting inside or next to Unicode for a look up search.  If we 
are  thinking [nameprep] in term of database lookup, then 
we can exclude many possible ACE leaks.  For example,
a string of whatever bit-stream comes in.  [nameprep]
trys to interpret it in, say, as 16-bits unit, and limits the 
range of searching say to Hindi script.   Anything outside
this Hindi script range, the input  becomes illegal and is 
kicked back as bad input.   Everything inside Hindi script 
range is filtered out for case folding and etc.  It is unlikely, 
we will have any script look alike confusion with this 
exclusion method. 

5.  If the above input is a CJK string, then we have a huge
range to worry about, it is another C[nameprep], J[nameprep]
and K[nameprep].  CNNIC have plenty of experts in 
handling this indivadually and very efficient too.  It is just an 
AI oriented code table search.

6. Though, [nameprep] becomes at least four [nameprep]s, 
which sounds a lot more complex,  by treating
[nameprep] as a database, and treating [nameprep] 
processing as a switch box,   a divide and conquer scheme
shall handle the problem and each section of the problem 
becomes minimal.

7. As to SC/TC case folding, I am proposing to fold to muemonic
ACE.  The ACE can be recovered to either SC or TC at the 
mercy of the application, or the presentation layer.  It is likely
that servers in the Mainland display everything in SC, and servers
in Taiwan display everything TC, regardless where the original
e-mail comes from.   I would be happy to see either regardless
where the e-mail comes from.   

If someone wants both versions then he can ask for a feature
 from the application to display 
either.   If he wants two different hostnames, he can do 1) what is
available now, by registering one in China, another one in Taiwan.  
2) By using the second step of StepCode to specify particular 
parts of its character, to get a different mnemonic ACE from the
first registered hostname, which shall force the application to
either interprate the ACE differently, or get the original registered
copy.
 If  the application does a good job to make its user happy, then 
there is no need to ask for the pre-folded case.  If someone 
demand the original copy, then the application shall request one
 from the original registrar, (not from [nameprep]) and keep that 
copy in its cache. 

Liana 


On Sat, 11 Aug 2001 19:13:55 +0000 "Adam M. Costello"
<amc@cs.berkeley.edu> writes:
> liana.ydisg@juno.com wrote:
> 
> > [nameprep] is the place for case folding for Latin, then it should 
> be
> > the place for other script folding as well.
> 
> That's a valid argument, but there are also some counter-arguments:
> 
> Existing domain names are already case-insensitive, so IDNs ought to
> be case-insensitive for consistency with the current standard.  This
> consistency constraint does not apply to SC/TC equivalence.
> 
> With very few exceptions, case mapping is one-to-one.  SC/TC mapping 
> is
> more complex more often.
> 
> The case mapping rules are already defined in the Unicode standard.
> This is not true for SC/TC mapping.
> 
> Some people have expressed a desire to be able to register the
> simplified and traditional versions of a name as separate domain 
> names.
> No one has expressed an analogous desire to be able to register the
> all-uppercase and all-lowercase versions of a name as separate 
> domain
> names.
> 
> AMC
>