[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] NFC vs NFKC



At 00:16 01/10/25 -0700, Yves Arrouye wrote:

> > > Yes. IDN is a new system, with the goal of being able to
> > > represent labels in a wide range of scripts. There is no
> > > need to allow artefacts of legacy character encodings,
> > > because IDN isn't designed to map existing names back
> > > and forth from other naming systems.
> >
> > So you do not think it is realistic, for example, to have a
> > document encoded in KS C 5601, containing domain names where
> > the labels would be written using non conjoining Jamo.

Hangul domain names in KS C 5601 would normally be encoded as
precombined syllables. KS C 5601 provides about 2300 of them,
the ones most often used. Users who want to used other Hangul
syllables won't use non-conjoining Jamo (compatibility Jamo),
because they will be displayed one Jamo at a time.

In the nameprep design team, we discussed some cases where users
might want to use sequences of independent consonant Jamo. There
are a few web pages with some examples of these, but not too many.
We are not sure whether they should be allowed in domain names or
not, and if they are, how they should be represented.


>[Edit] Is that because nobody uses these anymore, and so you would not
>expect a modern invention like IDN to appear in an anachronic way? Is that
>the case with all LEGACY characters?

I don't want IDN to carry unnecessary legacy compatibility
baggage. If it turns out that something really helps the user
(e.g. for the full-width Latin letters), I have nothing against
mapping them. But I don't think it's a good idea to map a
wholesale 3000, most of them not really used and very difficult
to type in, just to get a few dozen mapped the right way.
We can easily do the later without having to do the former.


Regards,   Martin.