[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Interactions between short-term and long-term



At 10.52 +0100 01-01-28, Dan wrote:
>Patrik wrote:
>
>  >Consider another example where a user is using an existing software
>  >which doesn't do nameprep, and tries to send email to a user in a
>  >domain which include a character which is changed because of
>  >nameprep, what then happens?
>
>It works without problems. Both DNS-servers and e-mail MTAs understand how
>to compare domain names.

Read again what I wrote:

One user uses existing software which doesn't to nameprep, and enters 
a domainname which includes a chaaracter which should have been 
changed according to the nameprep algorithm. It doesn't get changed 
because the software doesn't do the required change, and therefore 
non-nameprepped characters end up inside configuration files or in 
the SMTP protocol.

You said youself that you didn't want non-nameprepped domainnames 
used as domainnames in configuration files etc.

If we don't require software to do the right thing (which includes 
nameprep) then comparison will not be possible, or we are back to 
what you didn't like which is doing nameprep algorithm on the server 
side and not client side (as close to the keyboard as possible).

>Even existing software of today do not do this correctely. A domain name
>must be matched case insensitivly, but a lot of programs (Unix have a lot
>of them) do not.
>
>But everybody do not have to understand "nameprep"!!

Not everyone, that is correct. According to the IDNA spec only the 
software which accept input from keyboard (etc) have to do nameprep. 
Existing DNS software, resolver libraries, SMTP software etc doesn't 
have to understand it at all. Only client applications (including 
tools for changing configuration of DNS  software ct) have to 
understand nameprep.

See IDNA spec.

>The basics for interoperability of text data is to agree on character
>to code point mapping and normalisation. And normalisation means to
>use ONE representation of a character instead of like Unicode allow
>many variations. A good base would be UCS normalised using Unicode
>form C (or maybe KC).
>All programs should use this, otherwise interoperability will fail.

Please read the nameprep spec. It explains in detail what is needed 
for domainnames.

>No, the basic text input can do normalisation
>but not full "nameprep".

Correct. But I would as use not like software doing normalization either.

>And when I edit a zone file, it will not have
>"namepreped" names.

It all depends on what program you use when creating the zonefile.

>It will be in normal text data format. It is the
>responsibility of the DNS server to reject loading of names that
>have illegal characters (though I do not know how it is going to
>know what names are host names) and to apply "domain name matching rules"
>when comparing domain names.

This is already discussed in the nameprep draft.

>A DNS server should not mangle the names
>by doing forced conversion to lower case.

Do not talk about other things than nameprep. That's the only thing 
that is needed.

>If we really want the world to get an easy start on interoperability
>on text data, we should at protocol level use something like
>UCS normalised using form C and encoded using UTF-8 (or UCS-2 or UCS-4).

Might be a good generic goal for "protocol elements", and partly 
already specified (the need for a goal) in the IAB Charset report 
created several years ago.

>And the for DNS use, define a standard library that can be used in both
>DNS servers and in clients that can take a name in normalised text form
>and compare them. Doing like "nameprep" wants, by converting a name
>into a representation that can be used compare domain names by
>binary matching, is totally wrong for me.

The keyword which you miss in your discussion is "compatibility" with 
existing software which do not do nameprep. If we follow your 
scenario and require nameprep functionality in all software which 
compares domainnames then we will not have IDN implemented for many 
more years.

Instead, if we create a format where binary (or case insensitive 
comparison which is used today) comparison is possible, then existing 
deployed servers and clients can be used.

Much faster deployment of IDN, and that is very important.

>In a way it is like
>the ACE versus UTF-8 debate, ACE and "nameprep" introduces a special
>form on text intermixed with normal formatted text.

Yes, and that is a good thing.

Saying that we _have_to_ change all software on the planet is just a 
non-starter, and because of that we need an encoding form of 
domainnames which makes them work with existing software.

Something which is already is pointed out in the documents from the 
design teams, and something we discussed on this mailing list more 
than one year ago, so why do this again?

   paf