[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] First report from IDN nameprep design team
- To: <idn@ops.ietf.org>
- Subject: [idn] First report from IDN nameprep design team
- From: "James Seng/Personal" <James@Seng.cc>
- Date: Thu, 7 Dec 2000 12:32:40 +0800
- Delivery-date: Wed, 06 Dec 2000 20:35:53 -0800
- Envelope-to: idn-data@psg.com
To the IDN WG:
The IDN nameprep design team has been studying the nameprep document,
and we propose the following changes. We are not finished with our
work, but want to report our progress and hear input from the WG. Of
course, this will be discussed heavily in San Diego next week, and a
new version of the nameprep draft can be made ready before the end of
December on the points for which there is general agreement.
1) It is difficult and probably not useful to try to prohibit
characters that might cause confusion because they look like other
characters or because they might be accidentally entered by users.
Therefore, the next list of prohibited characters will be
significantly smaller. For example, compatibility characters (which
are common for Arabic and Asian scripts) would be allowed on input.
2) The order of the steps for nameprep will be changed from
prohibit -> fold -> normalize
to
map -> normalize -> prohibit
This new order has many advantages. It allows many more characters to
be input to the nameprep process without returning errors because
those characters will get converted by the normalization step into
allowed characters. It also allows the mapping step to fix edge-case
problems before they get to the normalization step, as described in
the next point.
3) So far, the mapping step in nameprep only maps uppercase
characters to lowercase. The compatibility normalization step does
the work of converting compatibility characters into their normal
forms, but there are other sets of characters that the input
mechanisms on users' systems might enter that can be mapped to other
characters. For example, there are many different hyphen characters
(such as U+00AD, soft hyphen) that do not get normalized but can all
be mapped into the single hyphen character that is already allowed by
STD 13. Also, with the new order suggested above, there are some
special cases for case-mapping that need to be added so that all
characters case-map as expected. Some characters might be mapped to
nothing, meaning that they will simply be ignored on input; for
example, some of the non-displaying characters that are currently
prohibited might instead be mapped out of the input stream instead of
causing an error. The mapping step will be specified as a single
table of mappings so that implementors don't have to create the table
themselves from disparate sources.
4) Doing case-folding from the Unicode data table does not handle all
cases of folding. The mechanism for mapping to lowercase will
instead be derived from the CaseFolding.txt file. (See UTR 21 from
the Unicode Consortium for more details.)
5) Non-character codepoints will be listed as prohibited characters.
6) The question of where to do name preparation will be removed from
this document, but must be addressed in the eventual IDN protocol
document.
7) Change the word "canonicalize" to "normalize".