[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: [idn-nameprep] nameprep and others: hangeulchar



--On Tuesday, 28 August, 2001 12:48 +0900 Martin Duerst
<duerst@w3.org> wrote:

>...
> Overall, I think you are arguing on the base that fillers will
> only be used together with Hangul. But what if people find out
> that they can use it as a kind of space in other scripts?
>...

Just to reinforce Martin's question a bit, _this_ is one of the
key questions of which I fear we keep losing sight as we look at
languages (or even scripts) one at a time.  (Full IDN list
included, since I think there is a general principle here that
everyone should think about.)

Unless we write rules that I think would be unbelievably
complicated and ultimately unsuccessful, the implication of using

	* [most of] Unicode and
	
	* no language tagging on either queries or registrations

is that, if
    L is a character from Latin-1
    C is a character from a Cyrillic script
    H is a character from a Han script
    A is a character from an Arabic script
    K is a katakana character
each of them chosen without restriction from the characters of
those scripts that we permit,

nothing is going to prevent labels of ECHAK, or any of its
permutations.  And, in particular, as Martin indirectly
suggests, if we permit Jamo fillers ("f" below), or Arabic
breaks (non-joiners, "n"), we will almost certainly see
    LLLfLLLL  and
    CCCnCCCC
as people figure out that things they can think about as
embedded spaces are "better" than funny case rules for
catenating multiple-word phrases.

I also just realized (most of you are probably ahead of me),
that I have no idea how LLLAL (or HHHAH) would be rendered if
the character chosen for "A" has different glyphs depending on
its joining position.  But, fortunately, that one is not our
problem.

Arggh.
   john