[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] new I-D: Safely Encoding of likeness information into ACE label version 0.2



Lets do a test. This email is written in UTF-8.

U+529B in UTF-8 is 力
U+30AB in UTF-8 is カ

Does it look similar? Well, to a non-native maybe. To me, given that i
know some kana and some chinese (hopefully), it is not the same.

Perhaps some may say it "looks" totally different depending what UTF-8
fonts you have. Not to mention there are different written forms of
Chinese ideograph which ends up with different fonts. (For those
interest, should read z-Variant section of the Unicode book).

Now to make it more interesting, lets take this Chinese ideograph

U+65E5 in UTF-8 日
U+66F0 in UTF-8 曰

Same? Well, they are actually very very different meaning...and used in
different ways. (Notice the first one is 'thinner'? Yes, that is how it
is written, one fatter and one thinner)

Or make how about U+6046 and U+6052? Different? Well, they are used
similarly at least when we refer to hang shang bank of HK. (Depending
what IME you use, they produce U+6046 or U+6052).

And these are not even a complete list. Lets not forget about other
languages too.

Bottomline: This is not an easy task. We need to ask ourselves in IDN WG
if we have the right expertise to do this. Honestly, if you are saying
we normalized U+529B and U+30AB, neither Chinese nor Japanese would be
very happy about it.

-James Seng

>  For example, Katakana 'ka' (U+30AB) and Chinese letter 'power'
(U+529B).
>  look the same. We can assign  likenessindex 0 and 1 to 'ka' and
'power',
>  respectively.