[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Hangul and IDN (was Re: [idn] reordering strawpoll)
The UTC recently agreed to clarify that the syllable structure is more
general, along the lines that Kent is describing. I'll add more info
when I
have the time.
Mark
—————
Ὀλίγοι ἔμφονες πολλῶν ἀφρόνων
φοβερώτεροι — λάτωνος
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]
http://www.macchiato.com
----- Original Message -----
From: "Kent Karlsson" <kentk@md.chalmers.se>
To: "Soobok Lee" <lsb@postel.co.kr>; <idn@ops.ietf.org>
Cc: "Jungshik Shin" <jshin@mailaps.org>
Sent: Tuesday, November 13, 2001 05:04
Subject: Re: Hangul and IDN (was Re: [idn] reordering strawpoll)
> Hi!
>
> >> The precomposed Hangul syllable U+AE4C (GGA) is canonically
equivalent
> >> with <U+1101, U+1161> (GG, A), through algorithmic decomposition.
That
is fine
> >> so far. But <U+1101, U+1161> is in turn equivalent to <U+1100,
U+1100,
U+1161>
> >> (G, G, A),
>
> >> but this equivalence is neither a canonical equivalence, as it
> >> should have been, nor a compatibility equivalence. Still, the
latter
letter
> >> sequence represents EXACTLY the same syllable as the two earlier
character
> >> sequences, and a proper rendering engine (of which there are
already
some,
> >> I'm told) would correctly render the three sequences in the same
way.
>
> >Are you raising this possiblity: U+1101 <---> U+1100 U+1000 (GG
<-> G
G) ?
> >Design of conjoining (cluster) jamos treat two choseong sequence
> >U+1100 U+1000 as illegal sequence (syllable break condition).
>
> (You mean <U+1100, U+1100>.) No, that is not a syllable break
condition.
> See tables 3-4 and 3-5 on page 53 in TUS 3.0 (link below).
>
> (B.t.w. apart from algorithmic decomposition (per se) of Hangul
syllable
characters,
> the cluster Jamos are not needed, and should ideally not be used. That
is
not
> spelled out in TUS 3.0, though...)
>
> > That is described somewhere in Unicode 3, >chapter 3,section 11.
That
will help you.
>
> Please reread that section carefully! (Link below.) In particular page
53.
>
> >if some rendering engine display the two as the same syllable,
> >I suspect the product is buggy or beyond the standard. :-)
>
> No, it's not beyond the Unicode standard at all. See page 53 of TUS
3.0,
> http://www.unicode.org/unicode/uni2book/ch03.pdf which says (my
emphasis):
> "A standard syllable block is composed of a *sequence* of choseong
followed
> by a *sequence* of jungseong and optionally a *sequence* of
jongseong."
>
> That that description is only about NFD form is not spelled out, nor
> is the fact that combining characters, in particular a Hangul tone
mark,
> may follow (logically they apply to the entire syllable!). Whether to
consider
> the combining characters as part of the syllable or not, I think is a
matter
> of taste. If we also take combining characters into account, but still
NFD,
> the syntax for a Hangul syllable is:
>
> Hangul-syllable-NFD ::= C+ V+ F* T*
>
> where C is a choseong, a V is a jungseong (vowel), F is a jongseong,
and T
is a
> combining character (like a Hangul tone mark). (I'll ignore the
FILLER
issues
> for the moment, including their automatic insertion.)
>
> This is needed to be able to spell historic (and future) Hangul texts
that
may use
> consonant or vowel clusters that are not given a character of their
own.
>
> Taking into account what NFC may cause, the full Hangul syllable
syntax
is:
>
> Hangul-syllable ::=
> C* CVsyllable V* F* T*
> | C* CVFsyllable F* T*
> | C+ V+ F* T*
>
> where CVsyllable is a consonants-vowels syllable character, and
CVFsyllable is a
> consonants-vowels-consonants syllable character.
>
> >Would you tell me the version of the rendering engine ?
>
> I'm told(!) this is (properly!) implemented at least in Windows XP and
> IE 6 (maybe also IE 5.5). Perhaps also elsewhere (I'm not keeping
track).
>
> Kind regards
> /kent k
>
>
>
>