[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Hangul and IDN (was Re: [idn] reordering strawpoll)



On Tue, 13 Nov 2001, Kent Karlsson wrote:

> >> The precomposed Hangul syllable U+AE4C (GGA) is canonically
equivalent
> >> with <U+1101, U+1161> (GG, A), through algorithmic decomposition.
That is fine
> >> so far. But <U+1101, U+1161> is in turn equivalent to
> >> <U+1100, U+1100, U+1161> (G, G, A),

> >> but this equivalence is neither a canonical equivalence, as it
> >> should have been, nor a compatibility equivalence. Still, the
latter letter
> >> sequence represents EXACTLY the same syllable as the two earlier
character
> >> sequences, and a proper rendering engine (of which there are
already some,
> >> I'm told) would correctly render the three sequences in the same
way.

> >Are you raising this possiblity:    U+1101 <---> U+1100 U+1000
> > (GG <-> G G) ?

> >Design of conjoining (cluster) jamos treat two choseong sequence

> (You mean <U+1100, U+1100>.) No, that is not a syllable break
condition.
> See tables 3-4 and 3-5 on page 53 in TUS 3.0 (link below).

> (B.t.w. apart from algorithmic decomposition (per se) of Hangul
> syllable characters,
> the cluster Jamos are not needed, and should ideally not be used. That
is not
> spelled out in TUS 3.0, though...)

  Idealy, I agree that they should not, but in practice
we have to allow them to be used.


> > That is described somewhere in Unicode 3, >chapter 3,section 11.
That will help you.
>
> Please reread that section carefully! (Link below.) In particular page
53.
>
> >if some rendering engine display the two as the same syllable,
> >I suspect the product is buggy or beyond the standard. :-)

  I don't see any reason to classifiy such a rendering engine
as buggy.  Such a  rendering engine faithfully follows   the underlying
principle of Hangul syllable formation as manifested in HunMinJongUm
Haerye(훈민 음 해례), which is well represented by section 3.11
of TUS 3.0.

> No, it's not beyond the Unicode standard at all.  See page 53 of TUS
3.0,
> http://www.unicode.org/unicode/uni2book/ch03.pdf which says (my
emphasis):
> "A standard syllable block is composed of a *sequence* of choseong
followed
> by a *sequence* of jungseong and optionally a *sequence* of
jongseong."

  I missed 'a sequence of' part in my first reading of the section
and thought it should be revised to make it much more general than I
thought Unicode standard had stipulated.  It turned out that TUS 3.0,
as it stands, is general and flexible enough to be able to encode
Hangul syllables made up of consonant clusters and vowel clusters
not assigned their own code points.

>....
>         Hangul-syllable-NFD  ::=   C+  V+  F*  T*
>....

> This is needed to be able to spell historic (and future) Hangul texts
that
> may  use consonant or vowel clusters that are not given a character
> of their own.

  Yes, this is absolutely necessary. MS Office XP and Windows XP support
a lot more Hangul Jamos and accordingly syllables than can be obtained
using only Jamos currently given a codepoint of their own and
restricting the Hangul syllable forming rule to 'C V F?'

> Taking into account what NFC may cause, the full Hangul syllable
syntax is:
>
> Hangul-syllable  ::=
>                  C*  CVsyllable  V*  F*  T*
>             |    C*  CVFsyllable  F*  T*
>             |    C+  V+  F*  T*
>
> where CVsyllable is a consonants-vowels syllable character, and
CVFsyllable is a
> consonants-vowels-consonants syllable character.
>
> >Would you tell me the version of the rendering engine ?
>
> I'm told(!) this is (properly!) implemented at least in Windows XP and
> IE 6 (maybe also IE 5.5).  Perhaps also elsewhere (I'm not keeping
track).

  It's a version of UniScribe (MS's rendering engine for
complex scripts like Indic/Thai/Burmese scripts, Hangul,  Arabic and
Hebrew) and opentype fonts with glyph substitution rules for Middle
Korean
that come with Korean version of MS Windows XP, MS Office XP and MS IE
6. (MS IE 5.5 and MS Office 2000 uses the private use area to support
a limited set of pre-composed Middle Korean syllables).

  However, I'm not sure whether this rendering engine renders 'U+1100
U+1100' as U+1101. What I was told by  Seuk Soo SUNG
<seuksoos@microsoft.com> is that it implemented  'C+  V+  F*' (he
didn't use 'C+ V+ F*' in his message ) for Middle Korean but NOT for
modern Korean. I'm copying this message to him  for further
clarification.

   Regards,

   Jungshik Shin