[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] nameprep

To: idn@ops.ietf.org
Subject: [idn] nameprep
From: "Adam M. Costello" <amc@cs.berkeley.edu>
Date: Fri, 1 Jun 2001 03:07:54 +0000
Delivery-date: Thu, 31 May 2001 20:10:49 -0700
Envelope-to: idn-data@psg.com
User-Agent: Mutt/1.3.17i

Paul Hoffman / IMC <phoffman@imc.org> wrote:

> need to finish nameprep, which at this point is finished other than it
> does not address the new characters added in Unicode 3.1.

Nameprep currently includes the following steps:

    KC-normalize
    map
    KC-normalize
    prohibit

KC-normalize is defined (by the relevant Unicode technical report) as
compatible-decomposition followed by canonical composition.  So the
steps are really:

    compatible-decompose
    canonical-compose
    map
    compatible-decompose
    canonical-compose
    prohibit

Do libraries that provide a KC-normalize function also provide the
separate decompose and compose functions?  If so, consider simplifying
the procedure to:

    compatible-decompose
    map
    canonical-compose
    prohibit

Nameprep defines the mapping step itself, so it could take care to
define it in such a way that it preserves the compatibly-decomposed
property (easy--just make sure the output of each mapping entry is
compatibly decomposed), which ensures that the final output is in
normalization form KC.  The mapping table would be smaller this way,
because it would be operating on a smaller repertoire.

AMC

Prev by Date: Re: [idn] Dots, and a path to working IDNs
Next by Date: Re: [idn] UTF-8 as the long-term IDN solution
Prev by thread: Re: [idn] nameprep (Korean Nameprep)
Next by thread: Re: [idn] nameprep
Index(es):
- Date
- Thread