[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Re: peanut gallery



The 'prohibit Simplified Chinese code points', even from a purely
technical point of view, is a bad idea.
It is complicated: one would have to be very careful not to exclude
what appear to be simplified characters but are actually used either
in other languages, or in Chinese itself as a traditional character.
Trying to do it by just algorithmically comparing SC/TC standards
(e.g. GB 2812 to GB 12345) based on mapping tables will give the wrong
answer.

If it were to be done, it would require a very detailed document
listing all of the code points, plus a very through review by experts
of each of those code points from all the CJK countries. I suspect the
only existing body with the expertise to do so is the IRG (which
includes official representatives of China, Hong Kong (SAR), Macao
(SAR), Singapore, Japan, South Korea, North Korea, Taiwan and Vietnam,
plus a representative from the Unicode consortium. For more
information, see
http://www.info.gov.hk/digital21/eng/structure/intro_irg.html). I
doubt, however, that they would consider it worth the considerable
time and effort involved.

I will quote snipets of some other people's messages on the subject:

======
...
Thomas, do you have a reference for U+9EBC (麼) and U+9EBD (麽) being
different?  The only dictionary I have which contains both is the
(traditional) CiHai, it and it claims they're variants of each other.

Meanwhile, both Sanseido and KangXi say that U+5C1B (尛) is a member of
the family.  (KangXi says that anciently U+9EBC (麼) was written U+5C1B
(尛).
Mathews and Sanseido also remind us that U+5E85 (庅) is another
variant,
and Sanseido *also* lists U+5692 (嚒).

So, Doug, you see that U+4E48 (么) could conceivably be a traditional
character in its own right *or* the simplified form for no fewer than
six
(!) other ideographs.
...
======
...
Meanwhile, it is true that there are simplified characters which
correspond to more than one traditional form.  In the case of U+8721
(蜡),
it is *both* a traditional character in its own right *and* the
simplified
form for another character, U+881F (蠟).

Characters which are simplifications for more than one traditional
form
are quite common.  Just to do a quick survey, I pulled one dictionary
off
my shelf.  It has in the back a table of simplifications.  The first
page
has 99 simplified characters, five of which are simplifications for
more
than one traditional form.  Perhaps that many again are also
traditional
characters in their own right.  This is also missing out on some of
the
more spectacular instances, such as U+53F0 (台), which is a traditional
character itself *and* the simplified form for three others, U+6AAF
(檯), U+
81FA (臺), and U+98B1 (颱).  There's at least one other character which
is
the simplified form for four traditional ones, but off-hand I can't
remember what it is.
...
=====
...
First of all, we have "wax", Mandarin la4

  Traditional form: U+881F
  Simplified form:  U+8721

Then we have "maggot", Mandarin qu1, of which Cihai claims that
Shuowen says:

  Vulgar form:  U+86C6
  Correct form: U+43E3
  Archaic form: U+8721

  My Taiwan and PRC dictionaries both claim U+86C6 for "maggot", so
  that would now be considered both the Traditional and Simplified
form,
  with U+8721 being an obsolete, archaic variant for it.

Then we have "yearend ceremony (of Zhou dynasty)", Mandarin zha4

  Traditional form: U+8721

  Not listed in my contemporary PRC dictionary.

And yes, this is the kind of mess that has discouraged anybody from
doing a systematic survey of simplifications...
=====

—————

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "ben" <ben@cc-www.com>
To: <DougEwell2@cs.com>; <idn@ops.ietf.org>
Cc: <tsenglm@cc.ncu.edu.tw>; <kenw@sybase.com>; <paf@cisco.com>
Sent: Monday, January 28, 2002 06:56
Subject: [idn] Re: peanut gallery


> Hi Doug,
>
> Rest assured that no one is trying to completely failed the IDN
model.
>
> The only possibility/compromise that I have read so far in these
> posting is to "prohibit Simplified Chinese code points"... and even
> with that idea, the WG is waiting for an Internet Draft before there
> could be further discussions/considerations.
>
> If I can't use IDNs in a couple of months, then I suggest changing
the
> name of the IETF to BPG (big peanut gallery).
>
> Thanks,
> Ben
>
>
> ----- Original Message -----
> From: <DougEwell2@cs.com>
> To: <idn@ops.ietf.org>
> Cc: <tsenglm@cc.ncu.edu.tw>; <kenw@sybase.com>; <paf@cisco.com>
> Sent: Monday, January 28, 2002 2:47 AM
> Subject: Re: [idn] Mapping and Prohibit of code points
>
>
> > I know the chairs consider this thread off-topic, so I will keep
> this brief.
> >
> > The only result that can come from excluding Han characters from
the
> IDN
> > model "until" a solution to the claimed CJK problem is developed,
is
> to
> > ensure the failure of the currently planned IDN model.
> >
> > Opponents would be able to claim truthfully that IDN completely
> fails to
> > support CJK, which of course would be true if Han characters were
> prohibited.
> >
> > They would then mount a new campaign to replace the discredited
IDN
> model
> > with a new CJK-oriented model, with local variations instead of
> global
> > uniformity.
> >
> > I am certain that at least some of those who are calling for CJK
> prohibition
> > understand this.
> >
> > -Doug Ewell
> >  Fullerton, California
> >
> >
>
>
>