[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] nameprep inconsistency (was turkish i)



Soobok Lee <lsb@postel.co.kr> wrote:

> Even though 0049 0307 === 0130 (modulo NFC), two have different       
> output labels .                                                       

Oh dear, that is just wrong.  It violates the Unicode principle that
canonically equivalent strings should always be treated the same.

I think this points to a larger problem:  The Unicode Consortium has
provided a normalization algorithm that squashes equivalent variations,
and a folding algorithm that squashes case differences, but they haven't
provided an algorithm that squashes both equivalent variations and case
differences.  So the IETF has tried to build one, and we've gotten it
wrong.

I suggest that the Unicode Consortium should define two new algorithms:
one that is like NFC but also squashes case, and one that is like NFKC
but also squashes case.  Then nameprep can simply refer to one of those
(and also specify a set of prohibited characters).

AMC