[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] The layers and character handling in DNS
- To: Paul Hoffman / IMC <phoffman@imc.org>
- Subject: Re: [idn] The layers and character handling in DNS
- From: John C Klensin <klensin@jck.com>
- Date: Sun, 18 Feb 2001 09:09:44 -0500
- Cc: idn@ops.ietf.org
- Delivery-date: Sun, 18 Feb 2001 06:11:44 -0800
- Envelope-to: idn-data@psg.com
Paul,
I think this leads to results that violate the law of least
astonishment.
Suppose I'm using local character code X to represent script Z
and so is Keith. My local mapping is Unicode<-F1(X,Z) and
Keith's is Unicode<-F2(X,Z), and they are different. Let's
assume that each is consistently reversible (although that is
another problem, both with this issue and, I suspect, with
Nameprep).
So I find an IDN domain on the network, and, by the time its name
emerges from my application, it is in X, having been passed
through F1'(Unicode,Z). I pass X to Keith out of band (e.g., in
an email message with "text/plain; charset=X"). Now Keith looks
it up, applying F2(X,Z) before Nameprep. And, behold, he either
gets 'nodomain' or, worse, the wrong stuff.
I think consistent behavior is important. And I don't see how
one gets that without either native implementations of 10646
(i.e., no local character sets) or with standardized/ unambiguous
mappings between each local set and 10646.
Of course, we could try to make Nameprep complicated enough to
smooth over all such details, s.t., Nameprep(F1(X,Z)))=
Nameprep(F2(X,Z)) for all Fn. Seems unlikely somehow.
john
--On Thursday, 15 February, 2001 12:33 -0800 "Paul Hoffman / IMC"
<phoffman@imc.org> wrote:
> At 3:08 PM -0500 2/15/01, Keith Moore wrote:
>> whether you can effectively *input* IDNs without platform
>> support for Unicode is a different question.
>
> Correct. Most "platforms" have support for some character sets,
> each of which is a subset of the Unicode (ISO/IEC 10646)
> character repertoire. Because of this, the input mechanism can
> be used with IDN. The "platform" decides the mapping between
> the character sets and Unicode.
>
> The difference here and where this thread was going is that the
> mapping is done by the application (probably through "the
> platform"), *not* by the protocol itself.
>
> --Paul Hoffman, Director
> --Internet Mail Consortium
>