[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: [idn-nameprep] nameprep and others: hangeulchar
Well, ECHAK.com (with the substituted characters) wouldn't make much sense.
But neither do the currently permissible following names (generated
randomly):
9XGD5RY4PT0ZAVD7BD3WFBZV4BPJMCRKVWNJBWUPW5SK5BC71E34SG3TF3GZY5Q.com
QWHU884UNCUFTAZLP5P4ZSLKZSYRU5AGS5QLHLRZWVKV3EZXXXE3IHEJCQ0K5AJ.com
H6L8JGU2WKV1RDHYSXBD64Q421T1CRSWPR79RRB0EHATZ23AOIMMB8KKK4NBOA1.com
AZ733L21J15Q2JE2V4XYK5Y4AQJ3I5K2E5OOMFXFQ7TOVBLF4GCJYY3F7PFS5EY.com
ON1HB4ZZEP9XCKNQJ3D2BQ0FM79OU1NEMIYB5FR6R84OBCJJHVAH1781JBKMY18.com
A96FH1ASKQ0W1V8MOQD8PDNWBARY2I5JU4A0IZWGYT2YS5A1308T49WTN9P2L4R.com
4N6YFZNQP7W9ZZYWTNIZKJRFJTFP44F60GW84M5SNGJ7QGIH986GH7SG1JMZPJ1.com
GI5XHTM3Q2SMHGXKGTNIJQBHJG1EJDSQDX6TCH5WKH53V4ZV9YAB5TUH685YNX6.com
3PS2XQFCELAB75GHOXLQZ9SAQWCIDOQU2U62Y4S7LB8EORL8QNFKRIR0BBY1Y2A.com
I3HFH8772PW3CE4NY7MVPITUNOS7Q3I4LBOZNLKM2JSMU8VS9ACWQKBYW10BJ5J.com
The questions we need to focus on are:
- matching. Do two strings represent the same underlying abstract
characters?
- prohibition. Which are the characters that are clearly unsuitable for
inclusion in IDNs.
I believe, as I have said before, that it is too difficult to separate out
the legitimate mixes of scripts and symbols from the 'questionable' ones for
us to do anything. And adding language tags would simply make it more
complicated, not less. (How much good does it do me to know that the second
of the above is French, not German?)
There are tricky edge cases that we need to discuss or get help with. But we
can't let the edge cases prevent us from delivering the 99.9999% solution.
For fillers and joiners, there are a reasonable solutions on the floor. One
may be optimal, but many will work just fine.
Mark
—————
Γνῶθι σαυτόν — Θαλῆς
[http://www.macchiato.com]
----- Original Message -----
From: "John C Klensin" <klensin@jck.com>
To: "Martin Duerst" <duerst@w3.org>
Cc: <idn-nameprep@viagenie.qc.ca>; <idn@ops.ietf.org>
Sent: Tuesday, August 28, 2001 05:40
Subject: [idn] Re: [idn-nameprep] nameprep and others: hangeulchar
> --On Tuesday, 28 August, 2001 12:48 +0900 Martin Duerst
> <duerst@w3.org> wrote:
>
> >...
> > Overall, I think you are arguing on the base that fillers will
> > only be used together with Hangul. But what if people find out
> > that they can use it as a kind of space in other scripts?
> >...
>
> Just to reinforce Martin's question a bit, _this_ is one of the
> key questions of which I fear we keep losing sight as we look at
> languages (or even scripts) one at a time. (Full IDN list
> included, since I think there is a general principle here that
> everyone should think about.)
>
> Unless we write rules that I think would be unbelievably
> complicated and ultimately unsuccessful, the implication of using
>
> * [most of] Unicode and
>
> * no language tagging on either queries or registrations
>
> is that, if
> L is a character from Latin-1
> C is a character from a Cyrillic script
> H is a character from a Han script
> A is a character from an Arabic script
> K is a katakana character
> each of them chosen without restriction from the characters of
> those scripts that we permit,
>
> nothing is going to prevent labels of ECHAK, or any of its
> permutations. And, in particular, as Martin indirectly
> suggests, if we permit Jamo fillers ("f" below), or Arabic
> breaks (non-joiners, "n"), we will almost certainly see
> LLLfLLLL and
> CCCnCCCC
> as people figure out that things they can think about as
> embedded spaces are "better" than funny case rules for
> catenating multiple-word phrases.
>
> I also just realized (most of you are probably ahead of me),
> that I have no idea how LLLAL (or HHHAH) would be rendered if
> the character chosen for "A" has different glyphs depending on
> its joining position. But, fortunately, that one is not our
> problem.
>
> Arggh.
> john
>
>
>