[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Some comments



At 21.37 +0000 01-01-13, D. J. Bernstein wrote:
>What's your position, Patrik? Do you want to look around in a future
>UTF-8 world and see that the Internet is using some clumsy ACE? Do you
>think that Sendmail will never have to be fixed? Do you object to the
>IDN WG warning implementors that programs should be 8-bit-clean?

My position is that:

   - A solution to the IDN problem have to be solved _now_,
     and we don't have time to update all protocols to use
     anything else than US-ASCII before that point in time
   - This means that the immediate solution must be ACE
     based

In the longer term if course a more efficient encoding of Unicode can 
be used, but that is much further ahead.

I would say that the warning message should not be to be 8-bit-clean, 
but that protocols (and therefore applications) should be prepared 
that not only the version of Unicode will change, but also the 
prefered encoding mechanism. I don't want 8-bit clean protocols, and 
UTF-8. I want protocols which can handle UCS-2, UCS-4 or UTF-16 (or 
whatever non-encoded version of Unicode). Why have an encoding at all 
if we are taking the step of changing the protocols?

What I don't like in your arguments is the focus on "applications" 
instead of looking at what is specified in the protocols.

>  > Non-decoded data is not as problematic, because then information has
>>  not been lost during encoding and transport.
>
>That's what Keith said for years about Quoted-Printable. He ignored the
>screams of the users who, in fact, DID NOT HAVE THE INFORMATION.

I don't understand what you say. Note that I had several different 
problems, and that the problem "non-decoded-data" was when the 
recipient did get all the information in the form of a charset 
argument to the text/plain content type, while the other problems 
where when you didn't have this information.

My point was that when we have non-encoded data, the recipient could 
by changing the application see all the information without any loss, 
but he could not if there were some loss somewhere (8th bit removed 
or charset argument missing to 8bit clean data).

What you seem to talk about was how bad it was when the user had an 
application which could not decode the quoted-printable given the 
encoding was correct and charset parameter was correct?

If that is the case, then you don't understand what I talk about.

>(Keith's favorite airline: ``Yes, sir, we know we shredded your luggage,
>but it's all here! Take some time and sew it back together. By the way,
>we don't shred English luggage.'')

You talk about quoted-printable encoding and charset parameter as it 
was something which destroys the content as shredding of luggage 
would do.

Sorry, that doesn't work.

Shredding the luggage is when the user can not reassemble the luggage 
by using the correct tools. Such things like getting 8bit data 
without charset parameter, or getting 8bit data where the 8th bit is 
set to zero.

>The goal is not merely to ensure that, in theory, all the information
>has been preserved. The goal is to actually _provide_ the information to
>the user.

My argument was that IF the whole encoded message, as with quoted 
printable and a correct charset parameter is coming to the recipient 
of the email, then he can himself by changing application get the 
information without noticing that the data have been encoded at all.

>  > Also, for the n:time, for UTF-8 to work we need to change a large
>>  number of protocols, intermediaries/middleware boxes, firewalls etc
>>  because the change to the protocols is quite large.
>
>What matters is the software. The protocols can change for free if the
>software already supports the new protocols.
>
>UTF-8, with nameprep handled close to the user as I described, works
>with many existing 8-bit-clean pieces of software. ACE-now-and-forever
>and ACE-now-UTF-8-later would impose much larger software upgrade costs.

But, software doesn't handle 8 bit stuff correctly today even though 
we have in the IETF been talking about it for a very long time.

Even worse, software doesn't handle the charset parameter correctly 
either (mapping between transport charset and local charset) even 
though MIME has now existed for quite some number of years.

Software is not at all as good as you seem to belive.

A couple of questions:

  - Do you use an operating system which doesn't use ISO 8859-1 as
    native charset for input and output? What charset?
  - What is the minimal set of characters you need
    in your native language, and what is that language?
  - What percent of email you receive and send each day is not english,
    i.e. need characters which doesn't exist in the A-Z set?

Note that I am talking about your own personal experience and not 
what your customers/users of your software report.

    paf