[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] SC/TC equivalence



Soobok Lee <lsb@postel.co.kr> wrote:

>    1) compatibility decomposition
>    2) canonical composition
>
> What is the difference between these equivalences and SC/TC
equivalence?

That's a good question.  I consulted the Unicode standard for the
answer.  Canonically equivalent sequences are defined to be fully
equivalent.  Conformant processes are required to interpret canonically
equivalent sequences as the same (section 3.1 Conformance Requirements
--> Interpretation --> C9).  Compatibility characters are characters
that have compatibility decompositions.  They only reason they are
included in Unicode is for compatibility with other standards (to
allow lossless round-trip conversions).  Performing compatibility
decomposition removes only formatting information (which shouldn't
really be present in plain text in the first place).  (See section 3.6
Decomposition, and section 2.2 Unicode Design Principles --> Unification
--> Compatibility Characters.)

Apparently the Unicode folks judged that the difference between
simplified Chinese characters and traditional Chinese characters is
more significant than just formatting information, because they are not
compatible equivalents.

I think users understand that a simplified character and a traditional
character are two distinguishable things, even though some users might
wish for them to be interchangeable.  If they're not interchangeable,
some users might be a little annoyed, but at least they won't be
mystified.

Compatible equivalents, on the other hand, are things like ligature
fi versus separate fi.  Most people have never noticed that f and
i sometimes get joined.  People would be completely mystified if
ligature fi were not treated the same as separate fi.  Other compatible
equivalents are roman numeral IV versus letters IV, and fullwidth
A versus halfwidth A.  These are not merely characters with the
same meaning, they are really the same characters, just formatted
differently.

> If SC/TC equivalence is like the equivalence between "humor" and
> "humour",
>
> But, if SC/TC equivalence is like the equivalence between "H" and "h",

It's not exactly like either of those, but to me it looks closer to
humor/humour.

AMC