[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Let's go forward with IDNA and UTF-8



--On 01-05-27 12.00 +0200 Dan <Dan.Oscarsson@trab.se> wrote:

> Patrik said:
> 
>> I see two differences between UTF-8 and ACE which was what I used as
>> arguments when I made up my mind:
>> 
>> (1) Risk of loss of information
>> 
>> ACE approach is NOT destroying any information when data is sent from
>> sender to receiver in any of the protocols we have today. It is 100%
>> backward compatible.
> Probably 100%, but some software may compare an ACE version of the name
> with thde decode version and fail.
> 
> But ACE+nameprep IS destroying information.

Whether we should do nameprep or not is a different question than encoding,
and that question is orthogonal to whether UTF-8 or ACE encoding of the
domains should be used.

I really want those issues to be separated.

> Also ACE can only be used for host names, if may not fit all types of
> DNS names and it cannot be used for other textual data in DNS.
> So by using ACE you always need to implement handling of UTF-8 (or what
> is used) for all other textual data in DNS.
> When we internationalise DNS so IDNs can be handled it must also
> include internationalisation of all text data handled in DNS.

Sure, but internationalization of the DNS protocol is a no-brainer (I agree
with you here) because the protocol is specified to be able to handle 8bit
octets of any value.

The problem is not DNS, but when the data in DNS is used in other protocols.

> There is one major difference between MIME and ACE/UTF-8 in DNS.
> In MIME the most importat element for delivery of e-mail, the
> e-mail address, was not changed and forced to remain in ascii.
> This means that if an adress is copied from an application understanding
> MIME to one who does not, it will still work.

Correct, and I call this "the clipboard problem" (even though it can happen
automatically aswell).

> With DNS it is the address itself that is changed.
> If you have an application understanding ACE and decodes it
> into the real address, I am sure people will copy the decoded form
> into applications that do not know about ACE and will therefore result
> in UTF-8 or other encoding being sent into DNS.

Yes, absolutely, as also John Klensin has said several times. This is
though nothing special for ACE, or UTF-8 or whatever, but a result of every
operating system in their local version using a different default script.

> You will get this problem with the UTF-8 solution also.

Yes.

> Some of these problems can be handled if you use DNS servers
> like those I have in UDNS which can handle both ACE and UTF-8, as
> well as some local characters sets.
> So I am sure several of the problems you see with UTF-8 will
> also exist with the ACE only solution, though with ACE only
> decoded ACE names will never work.

I don't agree with your conclusion, because if you have an application
which take care of broken "clients" already, that can take care of decoded
ACE aswell as UTF-8 aswell as ISO 2022-JP or whatever.

I.e. I think we all agree about one thing: we will get a complete mess all
over the place, and the only thing that can not stop that is to NOT do IDN
at this layer in the protocol stack at all, and instead just do a
dictionary solution.

> One thing everybody also may think about. Even with UTF-8 as
> only encoding, there must be a way to display names that
> cannot be displayed using the characters available in a client.

I would even say that one of the errors we will get in the "clipboard
problem" is if you copy and paste between two applications which both know
ACE, but the internal script used in either of the applications (or the
clipboard itself) can not handle the characters which are encoded (in ACE
or UTF-8). We will loose there aswell, if not the applications are very
clever, and for example encode the domainname before storing it in a
clipboard.

   paf