[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] The layers and character handling in DNS
- To: John C Klensin <klensin@jck.com>, Paul Hoffman / IMC <phoffman@imc.org>
- Subject: Re: [idn] The layers and character handling in DNS
- From: Patrik Fältström <paf@cisco.com>
- Date: Sun, 18 Feb 2001 06:59:49 -0800
- Cc: idn@ops.ietf.org
- Delivery-date: Sun, 18 Feb 2001 07:57:02 -0800
- Envelope-to: idn-data@psg.com
At 09.09 -0500 01-02-18, John C Klensin wrote:
>Suppose I'm using local character code X to represent script Z
>and so is Keith. My local mapping is Unicode<-F1(X,Z) and
>Keith's is Unicode<-F2(X,Z), and they are different. Let's
>assume that each is consistently reversible (although that is
>another problem, both with this issue and, I suspect, with
>Nameprep).
Ok.
>So I find an IDN domain on the network, and, by the time its name
>emerges from my application, it is in X, having been passed
>through F1'(Unicode,Z). I pass X to Keith out of band (e.g., in
>an email message with "text/plain; charset=X").
I think you send "X" as data, where the charset is "Z".
>Now Keith looks
>it up, applying F2(X,Z) before Nameprep. And, behold, he either
>gets 'nodomain' or, worse, the wrong stuff.
This might happen.
>I think consistent behavior is important. And I don't see how
>one gets that without either native implementations of 10646
>(i.e., no local character sets) or with standardized/ unambiguous
>mappings between each local set and 10646.
>
>Of course, we could try to make Nameprep complicated enough to
>smooth over all such details, s.t., Nameprep(F1(X,Z)))=
>Nameprep(F2(X,Z)) for all Fn. Seems unlikely somehow.
The whole goal with the normalization which UTC has done is that the
number of X and Fn which make Nameprep(F1(X,Z)) != Nameprep(Fn(X,Z))
is as small as possible. We can argue until we die how many of these
X and n exists which break the pattern, and because of this, I think
this discussion is mostly academic.
BUT: I think the key message John has, which the IDN group have to be
aware of is that the IDN wg seems to have made a choice that Unicode
with the normalization rules which are defined by UTC (not IETF) is
the absolutely best functions that can be found. Further, the
problems brought up _will_ happen, and we in the IETF can only rely
on UTC being smart enough to develop normalization tables and do good
marketing to see that the number of Fn is relatively small for each X.
Now, the big question which John doesn't ask explicitly is whether
these mappings are good enough, and if the IDN wg is aware of the
fact that it should have been able to say "no, too many Fn exists for
too many X, so we don't belive these mappings are good enough".
Personally, I have been thinking of this a lot the last 6-8 months,
and my conclusion is:
(1) The IETF is NOT a forum where we have the knowledge on making a
decision whether a mapping function is good enough or not for
characters. The only thing the IETF can do is to choose someone which
works in this area and trust them doing an as good job as possible.
UTC is doing a good job, but they have "bugs" and problems with their
mapping tables (as anyone would have) and we in the IETF will inherit
them -- for good and for bad.
(2) Because of the issues John list, or even more important the fact
that we (amateurs) in the IETF do already know that two code points
in Unicode which are not normalized to the same value still look the
same exists, the only way of solving the problem is to (a) have
special normalization rules (not the UTC ones) in the IETF or (b)
regardless of IDN try to push all applications to be aware of this
issue so they start looking at dictionary approaches in places of the
user interface where misunderstanding can happen. And (a) is NOT a
path to take due to the argument in (1) above.
So, consistent behaviour is something we will get only if F1 and Fn
for the same X map to at least two code points which are normalized
to the same in the nameprep phase -- and we in the IETF _have_ to
rely on UTC for this. Eventual misunderstandings that can happen (or
rather, WILL happen) because of non-normalization happening -- for
example between latin, greek and cyrillic -- will not be solved by
nameprep.
Or else we should not do this at all.
My personal opinion is that we SHOULD do this, but know about the
limitations, and regardless of the IDN solution with nameprep
understand that we need a dictionary _aswell_ because of the
limitations that exists.
paf
>
> john
>
>
>
>--On Thursday, 15 February, 2001 12:33 -0800 "Paul Hoffman / IMC"
><phoffman@imc.org> wrote:
>
>> At 3:08 PM -0500 2/15/01, Keith Moore wrote:
>>> whether you can effectively *input* IDNs without platform
>>> support for Unicode is a different question.
>>
>> Correct. Most "platforms" have support for some character sets,
>> each of which is a subset of the Unicode (ISO/IEC 10646)
>> character repertoire. Because of this, the input mechanism can
>> be used with IDN. The "platform" decides the mapping between
>> the character sets and Unicode.
>>
>> The difference here and where this thread was going is that the
>> mapping is done by the application (probably through "the
>> platform"), *not* by the protocol itself.
>>
>> --Paul Hoffman, Director
>> --Internet Mail Consortium
>>
--
Patrik Fältström <paf@cisco.com> Cisco Systems
Consulting Engineer Office of the CSO
Phone: (Stockholm) +46-8-4494212 (San Jose) +1-408-525-0940
PGP: 2DFC AAF6 16F0 F276 7843 2DC1 BC79 51D9 7D25 B8DC