[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] The layers and character handling in DNS

To: John C Klensin <klensin@jck.com>, Paul Hoffman / IMC <phoffman@imc.org>
Subject: Re: [idn] The layers and character handling in DNS
From: Patrik Fältström <paf@cisco.com>
Date: Sun, 18 Feb 2001 06:59:49 -0800
Cc: idn@ops.ietf.org
Delivery-date: Sun, 18 Feb 2001 07:57:02 -0800
Envelope-to: idn-data@psg.com

At 09.09 -0500 01-02-18, John C Klensin wrote:
>Suppose I'm using local character code X to represent script Z
>and so is Keith.  My local mapping is Unicode<-F1(X,Z) and
>Keith's is Unicode<-F2(X,Z), and they are different.  Let's
>assume that each is consistently reversible (although that is
>another problem, both with this issue and, I suspect, with
>Nameprep).

Ok.

>So I find an IDN domain on the network, and, by the time its name
>emerges from my application, it is in X, having been passed
>through F1'(Unicode,Z).  I pass X to Keith out of band (e.g., in
>an email message with "text/plain; charset=X").

I think you send "X" as data, where the charset is "Z".

>Now Keith looks
>it up, applying F2(X,Z) before Nameprep.  And, behold, he either
>gets 'nodomain' or, worse, the wrong stuff.

This might happen.

>I think consistent behavior is important.  And I don't see how
>one gets that without either native implementations of 10646
>(i.e., no local character sets) or with standardized/ unambiguous
>mappings between each local set and 10646.
>
>Of course, we could try to make Nameprep complicated enough to
>smooth over all such details, s.t., Nameprep(F1(X,Z)))=
>Nameprep(F2(X,Z)) for all Fn.   Seems unlikely somehow.

The whole goal with the normalization which UTC has done is that the 
number of X and Fn which make Nameprep(F1(X,Z)) != Nameprep(Fn(X,Z)) 
is as small as possible. We can argue until we die how many of these 
X and n exists which break the pattern, and because of this, I think 
this discussion is mostly academic.

BUT: I think the key message John has, which the IDN group have to be 
aware of is that the IDN wg seems to have made a choice that Unicode 
with the normalization rules which are defined by UTC (not IETF) is 
the absolutely best functions that can be found. Further, the 
problems brought up _will_ happen, and we in the IETF can only rely 
on UTC being smart enough to develop normalization tables and do good 
marketing to see that the number of Fn is relatively small for each X.

Now, the big question which John doesn't ask explicitly is whether 
these mappings are good enough, and if the IDN wg is aware of the 
fact that it should have been able to say "no, too many Fn exists for 
too many X, so we don't belive these mappings are good enough".

Personally, I have been thinking of this a lot the last 6-8 months, 
and my conclusion is:

(1) The IETF is NOT a forum where we have the knowledge on making a 
decision whether a mapping function is good enough or not for 
characters. The only thing the IETF can do is to choose someone which 
works in this area and trust them doing an as good job as possible. 
UTC is doing a good job, but they have "bugs" and problems with their 
mapping tables (as anyone would have) and we in the IETF will inherit 
them -- for good and for bad.

(2) Because of the issues John list, or even more important the fact 
that we (amateurs) in the IETF do already know that two code points 
in Unicode which are not normalized to the same value still look the 
same exists, the only way of solving the problem is to (a) have 
special normalization rules (not the UTC ones) in the IETF or (b) 
regardless of IDN try to push all applications to be aware of this 
issue so they start looking at dictionary approaches in places of the 
user interface where misunderstanding can happen. And (a) is NOT a 
path to take due to the argument in (1) above.

So, consistent behaviour is something we will get only if F1 and Fn 
for the same X map to at least two code points which are normalized 
to the same in the nameprep phase -- and we in the IETF _have_ to 
rely on UTC for this. Eventual misunderstandings that can happen (or 
rather, WILL happen) because of non-normalization happening -- for 
example between latin, greek and cyrillic -- will not be solved by 
nameprep.

Or else we should not do this at all.

My personal opinion is that we SHOULD do this, but know about the 
limitations, and regardless of the IDN solution with nameprep 
understand that we need a dictionary _aswell_ because of the 
limitations that exists.

    paf

>
>      john
>
>
>
>--On Thursday, 15 February, 2001 12:33 -0800 "Paul Hoffman / IMC"
><phoffman@imc.org> wrote:
>
>>  At 3:08 PM -0500 2/15/01, Keith Moore wrote:
>>>  whether you can effectively *input* IDNs without platform
>>>  support for Unicode is a different question.
>>
>>  Correct. Most "platforms" have support for some character sets,
>>  each of which is a subset of the Unicode (ISO/IEC 10646)
>>  character repertoire. Because of this, the input mechanism can
>>  be used with IDN. The "platform" decides the mapping between
>>  the character sets and Unicode.
>>
>>  The difference here and where this thread was going is that the
>>  mapping is done by the application (probably through "the
>>  platform"), *not* by the protocol itself.
>>
>>  --Paul Hoffman, Director
>>  --Internet Mail Consortium
>>

-- 
Patrik Fältström <paf@cisco.com>                         Cisco Systems
Consulting Engineer                                  Office of the CSO
Phone: (Stockholm) +46-8-4494212            (San Jose) +1-408-525-0940
        PGP: 2DFC AAF6 16F0 F276 7843  2DC1 BC79 51D9 7D25 B8DC

Prev by Date: Re: [idn] The layers and character handling in DNS
Next by Date: Re: [idn] The layers and character handling in DNS
Prev by thread: Re: [idn] The layers and character handling in DNS
Next by thread: Re: [idn] The layers and character handling in DNS
Index(es):
- Date
- Thread