[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] new hangul nameprep issue
Hi, Kent and Shin
Thanks for your valuable contributions. I promise to investigate it thoroughly.
First, please test the enclosed link in IE 5.0, 5.5, 6.0 in Win 2K and Win XP.
That contains some javascript codes to make on-the-fly html containing
the 3 sequences you give me.
In my win2k and IE 6.0 (english version) rendering,
syllable breaks are assumed in the choseong sequence.
I will test this link in other platforms.
Regards. Soobok lee
http://164.124.123.207/etc/f8.html
%u1100%u1100%u1161 , NFKC may try partial combination: %u1100%u1161 --> %AC00 (below)
ᄀ가
%u1101%uAC00
ᄀ가
%u1101%u1161 NFKC may try to unfiy this into : %AE4C (below)
á„á…¡
%uAE4C
까
---------------------------------
This is RACE conversion using NFKC:
1100 1100 1161
bq--3aiqblaa
1100 AC00
bq--3aiqblaa
##### the above two sequence are unified.
1101 1161
bq--vzga
AE4C
bq--vzga
----- Original Message -----
From: "Kent Karlsson" <kentk@md.chalmers.se>
To: "Soobok Lee" <lsb@postel.co.kr>; <idn@ops.ietf.org>
Cc: "Jungshik Shin" <jshin@mailaps.org>
Sent: Tuesday, November 13, 2001 10:04 PM
Subject: Re: Hangul and IDN (was Re: [idn] reordering strawpoll)
> Hi!
>
> >> The precomposed Hangul syllable U+AE4C (GGA) is canonically equivalent
> >> with <U+1101, U+1161> (GG, A), through algorithmic decomposition. That is fine
> >> so far. But <U+1101, U+1161> is in turn equivalent to <U+1100, U+1100, U+1161>
> >> (G, G, A),
>
> >> but this equivalence is neither a canonical equivalence, as it
> >> should have been, nor a compatibility equivalence. Still, the latter letter
> >> sequence represents EXACTLY the same syllable as the two earlier character
> >> sequences, and a proper rendering engine (of which there are already some,
> >> I'm told) would correctly render the three sequences in the same way.
>
> >Are you raising this possiblity: U+1101 <---> U+1100 U+1000 (GG <-> G G) ?
> >Design of conjoining (cluster) jamos treat two choseong sequence
> >U+1100 U+1000 as illegal sequence (syllable break condition).
>
> (You mean <U+1100, U+1100>.) No, that is not a syllable break condition.
> See tables 3-4 and 3-5 on page 53 in TUS 3.0 (link below).
>
> (B.t.w. apart from algorithmic decomposition (per se) of Hangul syllable characters,
> the cluster Jamos are not needed, and should ideally not be used. That is not
> spelled out in TUS 3.0, though...)
>
> > That is described somewhere in Unicode 3, >chapter 3,section 11. That will help you.
>
> Please reread that section carefully! (Link below.) In particular page 53.
>
> >if some rendering engine display the two as the same syllable,
> >I suspect the product is buggy or beyond the standard. :-)
>
> No, it's not beyond the Unicode standard at all. See page 53 of TUS 3.0,
> http://www.unicode.org/unicode/uni2book/ch03.pdf which says (my emphasis):
> "A standard syllable block is composed of a *sequence* of choseong followed
> by a *sequence* of jungseong and optionally a *sequence* of jongseong."
>
> That that description is only about NFD form is not spelled out, nor
> is the fact that combining characters, in particular a Hangul tone mark,
> may follow (logically they apply to the entire syllable!). Whether to consider
> the combining characters as part of the syllable or not, I think is a matter
> of taste. If we also take combining characters into account, but still NFD,
> the syntax for a Hangul syllable is:
>
> Hangul-syllable-NFD ::= C+ V+ F* T*
>
> where C is a choseong, a V is a jungseong (vowel), F is a jongseong, and T is a
> combining character (like a Hangul tone mark). (I'll ignore the FILLER issues
> for the moment, including their automatic insertion.)
>
> This is needed to be able to spell historic (and future) Hangul texts that may use
> consonant or vowel clusters that are not given a character of their own.
>
> Taking into account what NFC may cause, the full Hangul syllable syntax is:
>
> Hangul-syllable ::=
> C* CVsyllable V* F* T*
> | C* CVFsyllable F* T*
> | C+ V+ F* T*
>
> where CVsyllable is a consonants-vowels syllable character, and CVFsyllable is a
> consonants-vowels-consonants syllable character.
>
> >Would you tell me the version of the rendering engine ?
>
> I'm told(!) this is (properly!) implemented at least in Windows XP and
> IE 6 (maybe also IE 5.5). Perhaps also elsewhere (I'm not keeping track).
>
> Kind regards
> /kent k
>
>
>