[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] nameprep
- To: idn@ops.ietf.org
- Subject: [idn] nameprep
- From: "Adam M. Costello" <amc@cs.berkeley.edu>
- Date: Fri, 1 Jun 2001 03:07:54 +0000
- Delivery-date: Thu, 31 May 2001 20:10:49 -0700
- Envelope-to: idn-data@psg.com
- User-Agent: Mutt/1.3.17i
Paul Hoffman / IMC <phoffman@imc.org> wrote:
> need to finish nameprep, which at this point is finished other than it
> does not address the new characters added in Unicode 3.1.
Nameprep currently includes the following steps:
KC-normalize
map
KC-normalize
prohibit
KC-normalize is defined (by the relevant Unicode technical report) as
compatible-decomposition followed by canonical composition. So the
steps are really:
compatible-decompose
canonical-compose
map
compatible-decompose
canonical-compose
prohibit
Do libraries that provide a KC-normalize function also provide the
separate decompose and compose functions? If so, consider simplifying
the procedure to:
compatible-decompose
map
canonical-compose
prohibit
Nameprep defines the mapping step itself, so it could take care to
define it in such a way that it preserves the compatibly-decomposed
property (easy--just make sure the output of each mapping entry is
compatibly decomposed), which ensures that the final output is in
normalization form KC. The mapping table would be smaller this way,
because it would be operating on a smaller repertoire.
AMC