[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] An ignorant question about TC<-> SC





On Thu, 25 Oct 2001 16:27:10 +0900 Martin Duerst <duerst@w3.org> writes:
> At 13:02 01/10/23 -0700, liana Ye wrote:
> 
> >  From screen display point view, TC/SC are different glyph
> >  sets(who defines the sets? How is it used by 1/5 of the world
> >population? Is Uicode group the only authoritive one? In
> >China there are over 600 recorded views on this).
> 
> The Han ideographs in Unicode/ISO 10646 are defined by the
> IRG (Ideographic Raporteur Group). This group reports to
> ISO/IEC SC2 WG2, the ISO WG responsible for ISO 10646.
> It is composed of representatives from all the countries
> or similar entities interested in Han ideographs. That
> includes China, Japan, Korea (both South and North), Taiwan,
> Hong Kong, Singapore, and the US (I hope I didn't forget
> anybody, and please excuse the maybe politically uncorrect
> shortcuts). The US is the only country represented without
> a tradition of using Han ideographs, but usually only
> sends a small delegation and mainly helps with wording.
> Many other countries may send rather large delegations
> (given the number of characters, which means a lot of
> work, this is no surprise). The Unicode consortium
> participates in the IRG with an observer status only.
> 
> The IRG has published guidelines for deciding when to
> unify two occurences and when not. Because of the very
> huge number of characters, there is in some cases indeed
> a thin line as to whether something should be unified or
> not. And in these cases, the IRG just has to make a decision.
> 
> Overall, the guidelines are somewhat difficult to understand
> at first, but they are designed mainly with a 'least surprise
> to the average user' in mind, and I think they have achieved
> this goal very well. The guidelines are based on earlier
> ones used for the Japanese standard.
> 
> The core of the guidelines says that if two characters look
> significantly different, then they are encoded with two codes
> even if they e.g. are one-to-one SC/TC equivalents. This is
> to avoid suddenly changing the appearance of letters for a
> user who may not be familiar with the significantly differently
> looking shape. On the other hand, cases where there is only
> a small difference in shape are unified (i.e. only one code)
> unless this small difference in shape makes an actual
> difference in meaning.
> 
> Overall, the results are so that if you present a text
> where you change the glyph shapes within the range that
> is unified, people who have done basic education but don't
> know about different shapes (e.g. people in Taiwan or
> Hong Kong who only know about TC, or people in China or
> Singapore who only know about SC, or people in Japan who
> only know about the forms used in Japan) will read over
> these changes without problems, and might at some points
> say 'this looks a bit strange', but will still identify
> the character.
> 
> There are some exceptions to these rule related to backwards-
> compatible roundtripping (source separation rule).
> 
> 
> Hope this helps,      Martin.

In another word, the fundimental work from UCS is the to table 
glyphs based on their visual distinction.  Then why this group 
get into the guide line work in normalization of characters? Why
they give advice such as "not put TC/SC into" another table 
for a different purpose?

Liana