[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] nameprep



Paul Hoffman / IMC <phoffman@imc.org> wrote:

> need to finish nameprep, which at this point is finished other than it
> does not address the new characters added in Unicode 3.1.

Nameprep currently includes the following steps:

    KC-normalize
    map
    KC-normalize
    prohibit

KC-normalize is defined (by the relevant Unicode technical report) as
compatible-decomposition followed by canonical composition.  So the
steps are really:

    compatible-decompose
    canonical-compose
    map
    compatible-decompose
    canonical-compose
    prohibit

Do libraries that provide a KC-normalize function also provide the
separate decompose and compose functions?  If so, consider simplifying
the procedure to:

    compatible-decompose
    map
    canonical-compose
    prohibit

Nameprep defines the mapping step itself, so it could take care to
define it in such a way that it preserves the compatibly-decomposed
property (easy--just make sure the output of each mapping entry is
compatibly decomposed), which ensures that the final output is in
normalization form KC.  The mapping table would be smaller this way,
because it would be operating on a smaller repertoire.

AMC