[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [idn] First report from IDN nameprep design team
- To: James Seng/Personal <James@Seng.cc>, idn@ops.ietf.org
- Subject: RE: [idn] First report from IDN nameprep design team
- From: Jonathan Rosenne <rosenne@qsm.co.il>
- Date: Thu, 07 Dec 2000 08:10:51 +0200
- Delivery-date: Wed, 06 Dec 2000 22:40:31 -0800
- Envelope-to: idn-data@psg.com
I repeat my request to ignore certain Hebrew characters, namely points and
accents, i.e. remove them during nameprep.
I think some Arabic experts have indicated that this should be applied also for
Arabic.
Jony
> -----Original Message-----
> From: owner-idn@ops.ietf.org [mailto:owner-idn@ops.ietf.org]On Behalf
> Of James Seng/Personal
> Sent: Thursday, December 07, 2000 6:33 AM
> To: idn@ops.ietf.org
> Subject: [idn] First report from IDN nameprep design team
>
>
> To the IDN WG:
>
> The IDN nameprep design team has been studying the nameprep document,
> and we propose the following changes. We are not finished with our
> work, but want to report our progress and hear input from the WG. Of
> course, this will be discussed heavily in San Diego next week, and a
> new version of the nameprep draft can be made ready before the end of
> December on the points for which there is general agreement.
>
> 1) It is difficult and probably not useful to try to prohibit
> characters that might cause confusion because they look like other
> characters or because they might be accidentally entered by users.
> Therefore, the next list of prohibited characters will be
> significantly smaller. For example, compatibility characters (which
> are common for Arabic and Asian scripts) would be allowed on input.
>
> 2) The order of the steps for nameprep will be changed from
> prohibit -> fold -> normalize
> to
> map -> normalize -> prohibit
>
> This new order has many advantages. It allows many more characters to
> be input to the nameprep process without returning errors because
> those characters will get converted by the normalization step into
> allowed characters. It also allows the mapping step to fix edge-case
> problems before they get to the normalization step, as described in
> the next point.
>
> 3) So far, the mapping step in nameprep only maps uppercase
> characters to lowercase. The compatibility normalization step does
> the work of converting compatibility characters into their normal
> forms, but there are other sets of characters that the input
> mechanisms on users' systems might enter that can be mapped to other
> characters. For example, there are many different hyphen characters
> (such as U+00AD, soft hyphen) that do not get normalized but can all
> be mapped into the single hyphen character that is already allowed by
> STD 13. Also, with the new order suggested above, there are some
> special cases for case-mapping that need to be added so that all
> characters case-map as expected. Some characters might be mapped to
> nothing, meaning that they will simply be ignored on input; for
> example, some of the non-displaying characters that are currently
> prohibited might instead be mapped out of the input stream instead of
> causing an error. The mapping step will be specified as a single
> table of mappings so that implementors don't have to create the table
> themselves from disparate sources.
>
> 4) Doing case-folding from the Unicode data table does not handle all
> cases of folding. The mechanism for mapping to lowercase will
> instead be derived from the CaseFolding.txt file. (See UTR 21 from
> the Unicode Consortium for more details.)
>
> 5) Non-character codepoints will be listed as prohibited characters.
>
> 6) The question of where to do name preparation will be removed from
> this document, but must be addressed in the eventual IDN protocol
> document.
>
> 7) Change the word "canonicalize" to "normalize".
>
>
>