[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] new hangul nameprep issue



Hi, Kent and Shin

Thanks for your valuable contributions. I promise to investigate it thoroughly.
First, please test the enclosed link in IE 5.0, 5.5, 6.0  in Win 2K and Win XP.
That contains some javascript codes to make on-the-fly html containing
the 3 sequences you give me.
In my win2k and IE 6.0 (english version) rendering, 
syllable breaks are assumed in the choseong sequence.
I will test this link in other platforms.

Regards. Soobok lee

http://164.124.123.207/etc/f8.html


%u1100%u1100%u1161 ,  NFKC  may try  partial combination:  %u1100%u1161 --> %AC00 (below)
ᄀ가 

%u1101%uAC00  
ᄀ가 

%u1101%u1161                NFKC may try to unfiy this into  : %AE4C (below)
까 

%uAE4C 
까 
---------------------------------

This is RACE conversion using NFKC:

1100 1100 1161
bq--3aiqblaa

1100 AC00
bq--3aiqblaa

#####  the above two sequence are unified.


1101 1161
bq--vzga

AE4C
bq--vzga






----- Original Message ----- 
From: "Kent Karlsson" <kentk@md.chalmers.se>
To: "Soobok Lee" <lsb@postel.co.kr>; <idn@ops.ietf.org>
Cc: "Jungshik Shin" <jshin@mailaps.org>
Sent: Tuesday, November 13, 2001 10:04 PM
Subject: Re: Hangul and IDN (was Re: [idn] reordering strawpoll)


> Hi!
> 
> >> The precomposed Hangul syllable U+AE4C (GGA) is canonically equivalent
> >> with <U+1101, U+1161> (GG, A), through algorithmic decomposition. That is fine
> >> so far. But <U+1101, U+1161> is in turn equivalent to <U+1100, U+1100, U+1161>
> >> (G, G, A), 
> 
> >> but this equivalence is neither a canonical equivalence, as it
> >> should have been, nor a compatibility equivalence. Still, the latter letter
> >> sequence represents EXACTLY the same syllable as the two earlier character
> >> sequences, and a proper rendering engine (of which there are already some,
> >> I'm told) would correctly render the three sequences in the same way.
> 
> >Are you raising this possiblity:    U+1101 <---> U+1100 U+1000  (GG <-> G G) ?
> >Design of conjoining (cluster) jamos treat two choseong sequence 
> >U+1100 U+1000  as illegal sequence (syllable break condition).
> 
> (You mean <U+1100, U+1100>.) No, that is not a syllable break condition. 
> See tables 3-4 and 3-5 on page 53 in TUS 3.0 (link below).
> 
> (B.t.w. apart from algorithmic decomposition (per se) of Hangul syllable characters,
> the cluster Jamos are not needed, and should ideally not be used. That is not
> spelled out in TUS 3.0, though...)
> 
> > That is described somewhere in Unicode 3, >chapter 3,section 11. That will help you.
> 
> Please reread that section carefully! (Link below.) In particular page 53.
> 
> >if some rendering engine display the two as the same syllable,
> >I suspect the product is buggy or beyond the standard. :-)
> 
> No, it's not beyond the Unicode standard at all.  See page 53 of TUS 3.0,
> http://www.unicode.org/unicode/uni2book/ch03.pdf which says (my emphasis):
> "A standard syllable block is composed of a *sequence* of choseong followed
> by a *sequence* of jungseong and optionally a *sequence* of jongseong."
> 
> That that description is only about NFD form is not spelled out, nor
> is the fact that combining characters, in particular a Hangul tone mark,
> may follow (logically they apply to the entire syllable!).  Whether to consider
> the combining characters as part of the syllable or not, I think is a matter
> of taste. If we also take combining characters into account, but still NFD,
> the syntax for a Hangul syllable is:
> 
>         Hangul-syllable-NFD  ::=   C+  V+  F*  T*
> 
> where C is a choseong, a V is a jungseong (vowel), F is a jongseong, and T is a
> combining character (like a Hangul tone mark).  (I'll ignore the FILLER issues
> for the moment, including their automatic insertion.)
> 
> This is needed to be able to spell historic (and future) Hangul texts that may use
> consonant or vowel clusters that are not given a character of their own.
> 
> Taking into account what NFC may cause, the full Hangul syllable syntax is:
> 
> Hangul-syllable  ::= 
>                  C*  CVsyllable  V*  F*  T* 
>             |    C*  CVFsyllable  F*  T*  
>             |    C+  V+  F*  T*
> 
> where CVsyllable is a consonants-vowels syllable character, and CVFsyllable is a
> consonants-vowels-consonants syllable character.
> 
> >Would you tell me the version of the rendering engine ?
> 
> I'm told(!) this is (properly!) implemented at least in Windows XP and
> IE 6 (maybe also IE 5.5).  Perhaps also elsewhere (I'm not keeping track).
> 
>         Kind regards
>         /kent k
> 
> 
>