[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Hangul and IDN (was Re: [idn] reordering strawpoll)
Hi, Kent
Welcome to hangul problems.
You may have missed "hangulchar" I-D in WG pool.
Your analysis is thorough, but seems to be already addressed by the I-D.
And there have been more discussions offline more about other hangul nameprep
issues.That will be included new hangulchar 2.0 I-D soon.
Regards,
Soobok Lee
----- Original Message -----
From: "Kent Karlsson" <kentk@md.chalmers.se>
To: <idn@ops.ietf.org>
Sent: Tuesday, November 13, 2001 12:01 AM
Subject: Hangul and IDN (was Re: [idn] reordering strawpoll)
>
> Regardless of reordering, there is an actual problem for Hangul, which
> I don't think has been addressed. I have lately, and with the help of a
> Korean colleague, been looking fairly deeply into the problem of
> collating (ordering) Hangul strings properly. So even though I cannot
> understand Korean, and only begin to be able to read the letters (I look
> more at code point numbers than glyphs), I've looked quite a lot into this.
> See also page 53 of The Unicode Standard 3.0, which deals with
> Hangul syllables. Let me just pick an example. The number of instances
> are in the thousands, but the basic problem is the same.
>
> The precomposed Hangul syllable U+AE4C (GGA) is canonically equivalent
> with <U+1101, U+1161> (GG, A), through algorithmic decomposition. That is fine
> so far. But <U+1101, U+1161> is in turn equivalent to <U+1100, U+1100, U+1161>
> (G, G, A), but this equivalence is neither a canonical equivalence, as it
> should have been, nor a compatibility equivalence. Still, the latter letter
> sequence represents EXACTLY the same syllable as the two earlier character
> sequences, and a proper rendering engine (of which there are already some,
> I'm told) would correctly render the three sequences in the same way.
> But for historical reasons, there is now neither a canonical, nor a compatibility
> equivalence there. Just an equivalence, in the same script, in syllabic meaning
> and (when properly implemented) in display. (Yes, G and GG are pronounced
> differently, but this is about spelling.)
>
> This is something that 'nameprep' should handle, since it is unfortunately not
> handled by NFKC. The logical steps would be to 1) algorithmically decompose
> Hangul syllables, 2) map cluster Jamos to the basic letter sequences each
> represent. Then either (design decision) invoke NFKC or NFKC augmented
> to compose also "modern" cluster Jamo's before the part of NFKC formation
> that does algorithmic composition of Hangul syllables (the historic cluster
> Jamos can (design decision) stay decomposed). Or, indeed, do the
> decomposition into basic (i.e. non-cluster) Hangul Jamo letters, after
> conversion to NFKC form, leaving Hangul "subnames" as sequences of
> letter characters, just like for other alphabetic scripts (I don't know how this
> would effect the length of ACE encoded IDN names). (Some thought
> needs to go into how ((Halfwidth)) Compatibility Hangul Letters are to be
> handled. The compatibility mapping are, ahem, not fully appropriate...
> The Hangul "filler" characters are also a problem, which needs to be
> considered.)
>
> Kind regards
> /kent k
>
>
>