[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Hangul and IDN (was Re: [idn] reordering strawpoll)



Hi, Kent
Welcome to hangul problems.
You may have missed "hangulchar" I-D in WG pool.
Your analysis is thorough, but seems to be already addressed by the I-D.
And there have been more discussions offline more about other hangul nameprep
issues.That will be included new hangulchar 2.0 I-D soon.

Regards,

Soobok Lee

----- Original Message ----- 
From: "Kent Karlsson" <kentk@md.chalmers.se>
To: <idn@ops.ietf.org>
Sent: Tuesday, November 13, 2001 12:01 AM
Subject: Hangul and IDN (was Re: [idn] reordering strawpoll)


> 
> Regardless of reordering, there is an actual problem for Hangul, which
> I don't think has been addressed. I have lately, and with the help of a
> Korean colleague, been looking fairly deeply into the problem of
> collating (ordering) Hangul strings properly.  So even though I cannot
> understand Korean, and only begin to be able to read the letters (I look
> more at code point numbers than glyphs), I've looked quite a lot into this.
> See also page 53 of The Unicode Standard 3.0, which deals with
> Hangul syllables. Let me just pick an example. The number of instances
> are in the thousands, but the basic problem is the same.
> 
> The precomposed Hangul syllable U+AE4C (GGA) is canonically equivalent
> with <U+1101, U+1161> (GG, A), through algorithmic decomposition. That is fine
> so far. But <U+1101, U+1161> is in turn equivalent to <U+1100, U+1100, U+1161>
> (G, G, A), but this equivalence is neither a canonical equivalence, as it
> should have been, nor a compatibility equivalence. Still, the latter letter
> sequence represents EXACTLY the same syllable as the two earlier character
> sequences, and a proper rendering engine (of which there are already some,
> I'm told) would correctly render the three sequences in the same way.
> But for historical reasons, there is now neither a canonical, nor a compatibility
> equivalence there.  Just an equivalence, in the same script, in syllabic meaning
> and (when properly implemented) in display. (Yes, G and GG are pronounced
> differently, but this is about spelling.)
> 
> This is something that 'nameprep' should handle, since it is unfortunately not
> handled by NFKC.  The logical steps would be to 1) algorithmically decompose
> Hangul syllables, 2) map cluster Jamos to the basic letter sequences each
> represent. Then either (design decision) invoke NFKC or NFKC augmented
> to compose also "modern" cluster Jamo's before the part of NFKC formation
> that does algorithmic composition of Hangul syllables (the historic cluster
> Jamos can (design decision) stay decomposed).  Or, indeed, do the
> decomposition into basic (i.e. non-cluster) Hangul Jamo letters, after
> conversion to NFKC form, leaving Hangul "subnames" as sequences of
> letter characters, just like for other alphabetic scripts (I don't know how this
> would effect the length of ACE encoded IDN names).  (Some thought
> needs to go into how ((Halfwidth)) Compatibility Hangul Letters are to be
> handled. The compatibility mapping are, ahem, not fully appropriate... 
> The Hangul "filler" characters are also a problem, which needs to be
> considered.)
> 
>           Kind regards
>           /kent k
> 
> 
>