[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Layer 2 and "idn identities" (was: Re: [idn] what are the IDN identifiers?)
It doesn't matter from input processing,
because you know the users' language. It is
defined by the user when they select the
language symbol set.
But when you turn those symbols into codepoints
and strip the language context, then comparing
them in languageless context of IDN, the problem is
arised. That is the reason I have to put Latin
together with Armenian to make the CJK problem
a little easier for people do not know CJK. There are
6 lower case letters similar with Latin [a-z].
Liana
On Wed, 5 Dec 2001 12:09:00 -0800 "Michel Suignard"
<michelsu@microsoft.com> writes:
> Liana, stop associating Armenian with Latin in your explanation. All
> writing systems based on Latin and the single writing system using
> the
> Armenian share nothing except maybe some punctuation. It doesn't
> make
> sense to make a parallel between
> (Latin + Armenian + Cyrillic + Hebrew) and (CJK), because in the
> first
> group no writing system share characters between the subsets
> (although
> I would even object at creating such a 'logical' collection of
> largely
> unrelated scripts), while in CJK all writing systems use characters
> in
> the same CJK blocks.
>
> I doesn't help to explain the issue of TC/SC which is a valid
> concern
> for CJK users by using flawed analogy to a non existing model.
>
> I myself see a need to help Chinese users deal with TC/SC, but I
> don't
> see it to belong in the scope currently covered by IDN.
>
> Michel
>
> -----Original Message-----
> From: liana Ye [mailto:liana.ydisg@juno.com]
> Sent: Wednesday, December 05, 2001 8:14 AM
> To: DougEwell2@cs.com
> Cc: idn@ops.ietf.org; maynard@pobox.org.sg; bthomson@fm-net.ne.jp
> Subject: Re: Layer 2 and "idn identities" (was: Re: [idn] what are
> the
> IDN identifiers?)
>
>
>
> On Wed, 5 Dec 2001 01:17:36 EST DougEwell2@cs.com writes:
> > In a message dated 2001-12-04 20:11:19 Pacific Standard Time,
> > maynard@pobox.org.sg writes:
> >
> > >> SC/TC equivalence itself is far simpler than the "four winds,
> two
> > eggs"
> > >> equivalences, and has quite a bit of merit. I won't express any
>
> > >> real opinion on it until I study it further.
> > >
> > > It is not so simple as to be able to be done _accurately_ by an
> > code-based1-1
> > > bit-string matching process. There are semantic, syntactic and
> > contextual
> > > considerations that require at the very least a morphological
> > analysis
> > process
> > > in order for TC/SC to be done with a reasonable amount of
> accuracy
> > (i.e.
> > > orthographically).
> >
> > Thanks for saying with some authority what I have apparently been
> > unable to
> > communicate effectively, namely that TC/SC is not merely a 1-1
> > operation
> > comparable to Latin case folding.
> >
> > -Doug Ewell
> > Fullerton, California
> >
>
> Excuse me for jump in, I have been keep silent on this
> view, and I'd like to comment on this issue now.
>
> TC/SC is not merely a 1-1 operation, if you only compare it
> with Latin case folding in what the names imply:
>
> TC/SC is a subset of Han, and Han is subset of C,J,K.
> Latin is a super set of English, French,....
>
> Can you see the flaw on such a comparison?
>
> So when you look at Latin in the context of UCS code points,
> since UCS is the set we are hoping to use blanketly in IDN, then
> Latin
> is a subset of (Latin + Armenian + Cyrillic + Hebrew) since I think
> this
> is the area that Latin is mostly likely be
> used too.
>
> So this means if you compare TC/SC set of 1-1 cases
> then the Latin is 1-1.
>
> If you compare TC/SC with 1-n, n-1, 1-1, that is in Chinese,
> then Latin should be put into UCS Plane 0, 1, 2 too.
> So this Latin is n-1, 1-n too.
>
> If you compare TC/SC in the sense of C,J,K block,
> then Latin + Armenian is the minimum case to think about.
>
> Cheers.
>
> Liana
>