[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] SC/TC equivalence
All the counter-arguments are valid.
1. SC/TC equivalence is similar to case folding for Latin, only
it is a lot more larger. When it is large, there comes with a variety
cases of equivalence. All the examples mentioned exist for CJK,
but it is not deter the primary characteristics of a case folding.
2. From a user's viewpoint, the requirement is multitude.
SC is a simpler set of CJK then TC. But users in China want
TC in their names for artistic, special appealing, and specific
semantics in the TC. Users outside of China, want SC for
marketing to China, and simpler names (especial the
younger generations) for themselves to master. Hong Kong
is a good place to observe both of these users' wish-lists.
3. To be handled reasonably for the above requirements, it is
reasonable to treat SC/TC as case-insensitive identifiers,
consistent with DNS tradition, and only provide simple and
portable communication functionality to users from this
standard body, and let applications deal with secondary
equivalence issues.
4. Unicode is large, so [nameprep] is large too. However,
Unicode is a database. It is up to us to use it efficiently.
SC/TC case folding is large too, but it is just a large table
sitting inside or next to Unicode for a look up search. If we
are thinking [nameprep] in term of database lookup, then
we can exclude many possible ACE leaks. For example,
a string of whatever bit-stream comes in. [nameprep]
trys to interpret it in, say, as 16-bits unit, and limits the
range of searching say to Hindi script. Anything outside
this Hindi script range, the input becomes illegal and is
kicked back as bad input. Everything inside Hindi script
range is filtered out for case folding and etc. It is unlikely,
we will have any script look alike confusion with this
exclusion method.
5. If the above input is a CJK string, then we have a huge
range to worry about, it is another C[nameprep], J[nameprep]
and K[nameprep]. CNNIC have plenty of experts in
handling this indivadually and very efficient too. It is just an
AI oriented code table search.
6. Though, [nameprep] becomes at least four [nameprep]s,
which sounds a lot more complex, by treating
[nameprep] as a database, and treating [nameprep]
processing as a switch box, a divide and conquer scheme
shall handle the problem and each section of the problem
becomes minimal.
7. As to SC/TC case folding, I am proposing to fold to muemonic
ACE. The ACE can be recovered to either SC or TC at the
mercy of the application, or the presentation layer. It is likely
that servers in the Mainland display everything in SC, and servers
in Taiwan display everything TC, regardless where the original
e-mail comes from. I would be happy to see either regardless
where the e-mail comes from.
If someone wants both versions then he can ask for a feature
from the application to display
either. If he wants two different hostnames, he can do 1) what is
available now, by registering one in China, another one in Taiwan.
2) By using the second step of StepCode to specify particular
parts of its character, to get a different mnemonic ACE from the
first registered hostname, which shall force the application to
either interprate the ACE differently, or get the original registered
copy.
If the application does a good job to make its user happy, then
there is no need to ask for the pre-folded case. If someone
demand the original copy, then the application shall request one
from the original registrar, (not from [nameprep]) and keep that
copy in its cache.
Liana
On Sat, 11 Aug 2001 19:13:55 +0000 "Adam M. Costello"
<amc@cs.berkeley.edu> writes:
> liana.ydisg@juno.com wrote:
>
> > [nameprep] is the place for case folding for Latin, then it should
> be
> > the place for other script folding as well.
>
> That's a valid argument, but there are also some counter-arguments:
>
> Existing domain names are already case-insensitive, so IDNs ought to
> be case-insensitive for consistency with the current standard. This
> consistency constraint does not apply to SC/TC equivalence.
>
> With very few exceptions, case mapping is one-to-one. SC/TC mapping
> is
> more complex more often.
>
> The case mapping rules are already defined in the Unicode standard.
> This is not true for SC/TC mapping.
>
> Some people have expressed a desire to be able to register the
> simplified and traditional versions of a name as separate domain
> names.
> No one has expressed an analogous desire to be able to register the
> all-uppercase and all-lowercase versions of a name as separate
> domain
> names.
>
> AMC
>