[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] An ignorant question about TC<-> SC

To: liana Ye <liana.ydisg@juno.com>
Subject: Re: [idn] An ignorant question about TC<-> SC
From: Martin Duerst <duerst@w3.org>
Date: Fri, 26 Oct 2001 16:31:31 +0900
Cc: klensin@jck.com, idn@ops.ietf.org
At 02:37 01/10/25 -0700, liana Ye wrote:

>In another word, the fundimental work from UCS is the to table
>glyphs based on their visual distinction.

That is simply not true. The fundamental work is to make
sure that the result is best usable by average users for
their average purpose (writing electronic texts).

There have been a lot of requests to use finer distinctions,
e.g. from researchers or from people who publish dictionaries.
They were rejected, because they would confuse the average
user and the average use more than they would help.

There have also been requests for more unification, e.g. to
simplify searching/lookup in some situations, but they
have again been rejected because they would confuse the
average user and the average use more than they would help.
[In addition, such unifications would have to deal with the
many-to-one and many-to-many mappings, which just wouldn't
work.]


>Then why this group
>get into the guide line work in normalization of characters?

Because they are experts on characters, they use normalization
in their implementations, and so on.


>Why
>they give advice such as "not put TC/SC into" another table
>for a different purpose?

Because many of them have been studying Han characters extensively,
and many others have worked with people who have done so, for
actual implementations and products, and they all have learned,
in one way or another, that TC/SC conversion is a very hard
problem that cannot be solved by a simple mapping table.

Do you really think that people from companies such as Microsoft,
IBM, Apple, and many others (see the list of members of the
Unicode Consortium at
http://www.unicode.org/unicode/consortium/memblogo.html),
and in particular the specialists for internationalization
in these companies, would want to tell you that TC/SC
mapping is difficult if they knew how to do it? Do you
think that any of these companies would not want to
sell products that did the conversion if it were easy
enough to make such a product? How much time do you think
engineers in each of these companies have spent studying
the problem and trying to come up with a simple but
satisfactory solution, without a satisfactory result?

I very much think that all of the people at the Unicode
Consortium would be very glad if there were a simple
mapping solution to the TC/SC problem, and would not
at all object to it being added to IDN, because they
understand that the TC/SC is an important one. But they
know from years, if not decades, of experience, that
it's all but an easy job, depeding a lot on context
(as James has said) which is not available in IDN,
and where a machine never gets perfect.


Hope this helps.    Regards,    Martin.



>Liana
>
>
>On Thu, 25 Oct 2001 16:27:10 +0900 Martin Duerst <duerst@w3.org> writes:
> > At 13:02 01/10/23 -0700, liana Ye wrote:
> >
> > >  From screen display point view, TC/SC are different glyph
> > >  sets(who defines the sets? How is it used by 1/5 of the world
> > >population? Is Uicode group the only authoritive one? In
> > >China there are over 600 recorded views on this).
> >
> > The Han ideographs in Unicode/ISO 10646 are defined by the
> > IRG (Ideographic Raporteur Group). This group reports to
> > ISO/IEC SC2 WG2, the ISO WG responsible for ISO 10646.
> > It is composed of representatives from all the countries
> > or similar entities interested in Han ideographs. That
> > includes China, Japan, Korea (both South and North), Taiwan,
> > Hong Kong, Singapore, and the US (I hope I didn't forget
> > anybody, and please excuse the maybe politically uncorrect
> > shortcuts). The US is the only country represented without
> > a tradition of using Han ideographs, but usually only
> > sends a small delegation and mainly helps with wording.
> > Many other countries may send rather large delegations
> > (given the number of characters, which means a lot of
> > work, this is no surprise). The Unicode consortium
> > participates in the IRG with an observer status only.
> >
> > The IRG has published guidelines for deciding when to
> > unify two occurences and when not. Because of the very
> > huge number of characters, there is in some cases indeed
> > a thin line as to whether something should be unified or
> > not. And in these cases, the IRG just has to make a decision.
> >
> > Overall, the guidelines are somewhat difficult to understand
> > at first, but they are designed mainly with a 'least surprise
> > to the average user' in mind, and I think they have achieved
> > this goal very well. The guidelines are based on earlier
> > ones used for the Japanese standard.
> >
> > The core of the guidelines says that if two characters look
> > significantly different, then they are encoded with two codes
> > even if they e.g. are one-to-one SC/TC equivalents. This is
> > to avoid suddenly changing the appearance of letters for a
> > user who may not be familiar with the significantly differently
> > looking shape. On the other hand, cases where there is only
> > a small difference in shape are unified (i.e. only one code)
> > unless this small difference in shape makes an actual
> > difference in meaning.
> >
> > Overall, the results are so that if you present a text
> > where you change the glyph shapes within the range that
> > is unified, people who have done basic education but don't
> > know about different shapes (e.g. people in Taiwan or
> > Hong Kong who only know about TC, or people in China or
> > Singapore who only know about SC, or people in Japan who
> > only know about the forms used in Japan) will read over
> > these changes without problems, and might at some points
> > say 'this looks a bit strange', but will still identify
> > the character.
> >
> > There are some exceptions to these rule related to backwards-
> > compatible roundtripping (source separation rule).
> >
> >
> > Hope this helps,      Martin.
Prev by Date: Re: [idn] (bias) summary of reordering discussion
Next by Date: Re: [idn] Re: stringprep and unassigned code points
Prev by thread: Re: [idn] An ignorant question about TC<-> SC
Next by thread: Re: [idn] An ignorant question about TC<-> SC
Index(es):
- Date
- Thread