[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Thoughts on nameprep
- To: idn@ops.ietf.org
- Subject: Re: [idn] Thoughts on nameprep
- From: "Adam M. Costello" <amc@cs.berkeley.edu>
- Date: Mon, 12 Mar 2001 02:23:12 +0000
- Delivery-date: Sun, 11 Mar 2001 18:25:32 -0800
- Envelope-to: idn-data@psg.com
- User-Agent: Mutt/1.3.15i
"D. J. Bernstein" <djb@cr.yp.to> wrote:
> > If that transcoder is not required to output normalization form KC,
> > then it could very easily produce a bad name.
>
> Do you have any real examples of this happening?
As far as I know, lv is the only Unicode transcoder installed on my
machine. In JIS X 0208 (the main Japanese character encoding) there is
only one character that looks like a small letter mu, and lv sends it to
U+03BC (Greek small latter mu). In Latin-1 there is only one character
that looks like a small mu, and lv sends it to U+00B5 (micro sign).
Normalization form KC collapses these two characters into one, but lv
doesn't.
Come to think of it, I seriously doubt whether any transcoder would
output normalization form KC by default; if it's going to output any
normalized form at all, it's probably going to use form C, because
that's a better choice for most purposes (but not for domain names),
because it preserves more information.
There may be tools that output decomposed diacritics. I don't know
any examples, but the combining diacritics are there in the Unicode
character set, and not deprecated, so I don't see how we could be
justified in assuming that no programs will ever use decomposed forms in
documents that might happen to contain host names.
Here's my new argument: By Murphy's law, bad names will pop up from
time to time. Even if applications were not required to try to map
bad names to good names, some of them would, to be helpful. We could
try to forbid applications from being helpful, but they'd do it anyway
rather than annoy their users. If not all applications perform mapping,
or they don't all do it the same way, then bad names that work in some
situations will not work in other situations, or will map to different
good names in different situations, and users will be mystified and
frustrated. Much better that a given bad name yields the same result
(be it success or failure) in all situations. That requires a standard
mapping algorithm.
AMC