[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] opting out of SC/TC equivalence





On Fri, 31 Aug 2001 04:28:55 +0100 David Hopwood
<david.hopwood@zetnet.co.uk> writes:
> -----BEGIN PGP SIGNED MESSAGE-----
> 
> liana.ydisg@juno.com wrote:
> > The first step of our solution is to open [nameprep]
> > to let local standards in for easy code exchange.  This
> > is an equal access issue.
> 
> What is the problem with converting local standards to Unicode
> as a first step? There's nothing difficult in that; in fact it's
> considerably simpler than what you seem to be proposing.

There are existing converting table already.  What I am
proposing is for GB directly exchange with Unicode in
[nameprep] such that there will be one step search 
for other users who don't want unicode set as well.  
Because these local display code do have script imformation
in it already.  When it is taken from screen it has been 
[nameprep] already in a local sense, but not in Unicode.  If you 
convert it to Unicode then [nameprep] it again, it is double 
processing. 

> 
> > The best place to demonstrate
> > such a feature is the "cut and paste" of an IDN URL
> > e-mail entry.  It shall be a universal treatment at the end.
> > 
> > The second step is to define the ranges of different
> > user groups who need their primary script tag.  No
> > mix scripts is allowed at this stage.  But we can take
> > wish list.  For example, English may wish to include
> > Greek, Korean may want Hanja on its wish list.
> 
> Why should there be any restrictions on mixing scripts (other
> than mixing left-to-right and right-to-left scripts, possibly)?
> What benefit does this give?

This way, we can limit some jump ahead people to grep 
names in other scripts and cause later trouble for upgrating
[nameprep].  When there enough testing data, and shows
a stable police, then we can let second script or third script
to be enabled.  This is a common practice in bring up a
new system.  And I would be more conservative to only 
let a few scripts to be online at a time.  

> 
> >  That way, we can identify conflicts of codepoints and
> > possible confusion of codepoints, and start with a
> > careful first step.  A survey on the existing registered
> > names for possible conflict when Unicode is
> > used for limited scripts is helpful to start a clean first
> > deployment of [nameprep].
> > 
> > Someone wants to lunch an unmature version of
> > [nameprep] are very smart.  They probally have put
> > a lot of money behind this working group already , where
> > we are just a decoration voices to lead people believing
> > that this is an open discussion, democratic process.
> ...
> > My feeling on this, there are more politics and money
> > then technical issues hidden behind us.
> 
> Enough with the paranoid conspiracy theories. I'm certainly not
> making any money out of contributing to this group, and I doubt
> the other contributors are either. There are serious technical
> problems with some of the things you're suggesting - that's what
> people are concerned about.
> 

You are right, most of us are care about this problem and
contributing to this group for free.  We are discussing this 
sincerely.  However, I have clear feeling, something 
otherwise.  I wish, I am wrong, that we can see real 
solution out of this group.

> > So your first step TC/SC simple mapping in [nameprep]
> > which should be on the same footing with Latin case
> > folding being pushed away by all kind of nonsense
> > arguments.
> 
> TC/SC equivalence is not analogous to case folding, period. It's
> closer to Cyrillic <-> Latin equivalence for Serbo-Croatian and
> Azeri (except that it is more complicated in the general case).
> If it was exactly the same as case folding, then we shouldn't
> include it, because case folding is only in nameprep for consistency
> with the folding of a-z and A-Z in RFC 1034/1035 (which is in turn
> only there for compatibility with the original HOSTS.TXT 
> specification).
> If hostnames had originally been case-sensitive, that would not
> cause any problem for users, who would in practice always type
> lowercase.
> 
> If there is an argument for doing some subset of TC/SC folding, it
> will have to stand on it own merits, not by analogy.
> 
David, that is the misconception I have referred to in blaming 
what Unicode has been done on Chinese language.  TC/SC
is the same script and same language.  It is used in a similar
way with upper/lower case of Latin.  Just like some people
want to use uppercase / or printing all the time, but most use 
mixed cases.  TC/SC is larger set, so it is natural to have 
more variety of changes.  But the majority is treated like
Latin cases.  They are not mixed scripts. Japanese is a
mixed scripts.  Korean is a mixed script depending on who's
viewpoint you are subscribing. 

TC/SC is in dictionaries for kids in China.  But they are not 
in Unicode. This supprised me, since I know their charter
goal stated in version 1, says they will deal with it.  Now, 
it is version 3 already. I just read a few days ago, it is 
considered too difficult to put them in.  Some people use
this as excuse to block TC/SC to be in [nameprep], 
technically, yes.  Some people have experiences that 
Chinese translation always takes two versions, then
TC/SC must be two different languages, that is wrong too.
The reason for the two versions are incompatible software
takes 90% blame, 10% difference in expression for the
same concept, which is normal for anything produced 
by different people of the same language.

The second point about [nameprep] for Latin is, it takes
care of diacritics for Latin too. Just because Latin is the
easiest script to deal with then it gets the high speed 
treatment with [nameprep].  What about Arabic? If we
started with Arabic then IDN Arabic is still the only one 
get high speed treatment?  The IDN is supposed to
be universal, to my mind, that means all the scripts should
be complete in the same amount of processing cycles,
at least from a grand view. 

Your Cyrillic <-> Latin equivalence are differen scripts,
and I call it transliteration.  I would call SC<-> Pinyin is
equivalence in that term too. So does Kanji <->Romaji
and Hangul<->Hanja.    

Liana

> - -- 
> David Hopwood <david.hopwood@zetnet.co.uk>
> 
> Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
> RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 
> 15 01
> Nothing in this message is intended to be legally binding. If I 
> revoke a
> public key but refuse to specify why, it is because the private key 
> has been
> seized under the Regulation of Investigatory Powers Act; see 
> www.fipr.org/rip
> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: 2.6.3i
> Charset: noconv
> 
> iQEVAwUBO48D6zkCAxeYt5gVAQHQjQgAyFhzfEuCiqezWFyJNadIQ8RAjEjEVChM
> 3HRaC72TGQllSjjhgyYjCWheitDCI5jWNePjXBN7LDBgeZyTVPjONd4HzzAKBXOy
> GRWv/lIxn3ZSRbRxEQi+KaKlUYhFJG8S1dzUS73ShX2fXgF1Hfpb/J4KGd2+mbJ/
> IT2LxerZ8D4hB2BcCiGtc7zA3stj7TthccL+mUTkixXXhqNWGjrymDVNNaEZ/USu
> MXCx6iCIbuWEvc6zp24r5zVgmx09y7dJXeuVLrNoogdzUw4IWlo9BE3sryDqDo36
> CjVwWstSEqi3PVmu6G5l/n0WC35HqNfIRCQTyxuTQzcoSxASU35c1g==
> =8Iol
> -----END PGP SIGNATURE-----
>