[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] opting out of SC/TC equivalence



-----BEGIN PGP SIGNED MESSAGE-----

liana.ydisg@juno.com wrote:
> I think you have raised a good question.  However, there is
> no such an "end result should be applicable to _any_ name
> space."  The uppper to low case folding only apply to alphabet
> systems, the SC/TC folding only apply to character based
> systems, consonant system has many small scripts all within
> 128 codepoints.  Following you assertion,  we shall take Latin
> case folding out of [nameprep] too.

Case folding for domain names in Latin scripts is a historical accident.
If it wasn't there, no users of those scripts would miss it - they
would just always type lowercase. However, given that case folding is
defined for ASCII, that effectively forces the decision to extend it
to other Latin characters, because it would be inconsistent to have a
folding for, say, E -> e, but not E-acute -> e-acute. And given that
decision for Latin characters, the obvious approach is to use the
standard Unicode case-folding algorithm, which also applies to other
cased alphabetic scripts.

Note that there are plenty of identifier standards that are case-sensitive
and work fine (e.g. most parts of an URI). Most end-users probably don't
even notice the difference between case-sensitive and case-insensitive
identifiers.

So, use of alphabetic case folding does not imply anything about whether
TC and SC should be folded; that's a separate decision. There's a
"slippery slope" argument here: if TC/SC are folded, why not Latin <->
Cyrillic transliteration (used for Serbo-Croatian or in Azerbaijan, for
example), or any of the other cases where a language can be written in
more than one script? I see that later in your post you suggest Kanji/SC
and other foldings.

I think that that way lies pointless complication and unnecessary delay
in finalizing the standard, for a feature that is only of minor benefit
to users. In the case of guessing a web site name, if a user can't find
the name immediately then he/she will just use a search engine, and the
search engine will work regardless of what script is used. In other cases,
domain-name-based identifiers such as email addresses and URIs can't
realistically be guessed, so will come from either:

 - another electronic document; this presents no problems
 - a written source; again this presents no problems as long as the user
   has an appropriate input method for the script
 - a spoken source; in this case the script has to be specified to
   avoid ambiguity in general, but the context will often make it
   clear anyway.

- -- 
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO3tPnDkCAxeYt5gVAQG6JQgAwwu/t1PqSlE1gGIfujUNVyIFqDg/PXYG
giKRDVAgy18EpcLdmz6rnFT2jXaiGtqggK/znjx2HxcmZa5vLV8JeehEVTnZaJl1
qWSRgHWpZg2B2YNO1ZrtrpmwWMPJ9yyn+0GOXI8EUqKmSbAV3W5v9Vvdn8Cx0Yp+
/0zcKTAh6jtGBAIKVmVCRLy5q5yxPRnotzenyWxFxy3rYU35+CvMOMj3FGPk4OO0
17XyhFfJUA0kYILIYhmbWoQmreo8sjMqHw9vtQPG6chjPop+fxyMCX68Oqt4J9sj
ZkRcQY3og/RpB9iGXA5eYDZDdyb11I8CuAmPL8v2Fg5NIiUNclHeeQ==
=d9kT
-----END PGP SIGNATURE-----