[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] An ignorant question about TC<-> SC



Hi John,

The code re-ordering is transfering code from one code area to
the same area in different order. Therefore the situation you point
out is not going to happen.

Kenny Huang

> -----Original Message-----
> From: owner-idn@ops.ietf.org [mailto:owner-idn@ops.ietf.org]On 
> Behalf Of John C Klensin
> Sent: Tuesday, October 23, 2001 8:59 PM
> To: idn@ops.ietf.org
> Subject: [idn] An ignorant question about TC<-> SC
> 
> 
> While reading David's NFC versus NFKC note, I had an odd thought.
> I've been dissatisfied, as have many others, with the notion that
> TC <-> SC mapping is analogous to case mapping in Roman-derived
> alphabets.   Arguments about whether that analogy applies have
> helped to make the discussion of what is, to me, a very difficult
> topic even more obscure.
> 
> To quote the Unicode standard, "Serbo-Croatian is a single
> language with paired alphabets".  This is a definition with which
> native speakers of the language agree (although, when tensions in
> the Balkans are high, I assume some of them are not completely
> happy about it).  Would it be constructive to think about Chinese
> as "one language, two alphabets"?  If it is, then nameprep or a
> related process ought to be able to map back and forth between
> the Roman-based characters usually used in Croatian contexts and
> the Cyrillic characters usually used in Serbian ones (people do
> this all the time, and certainly expect the two to match).
> 
> Of course, the analogy is not exact (these things never are):
> perhaps partially because there are just fewer characters to deal
> with, there are no cases in which there are potential ambiguities
> in the mappings.  On the other hand, one problem is more severe
> than in the Chinese case: in the general case, a Serbo-Croatian
> string written in Cyrillic cannot be distinguished, on a
> character string basis, from uses of Cyrillic for other languages
> (e.g., Russian), which should not be mapped and, similarly, a
> string written in Roman-based characters cannot be distinguished,
> on a character string basis, from the Roman-based characters of
> another language (English?) which, again, cannot be mapped.
> 
> In either case, the mapping becomes readily plausible if the
> language, in addition to the content of the character string, is
> known, but is hard to think about without causing side-effects in
> other languages if not.
> 
> Is that helpful?
>      john
>