[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Some comments
- To: idn@ops.ietf.org
- Subject: Re: [idn] Some comments
- From: Patrik Fältström <paf@cisco.com>
- Date: Sat, 13 Jan 2001 23:19:58 +0100
- Delivery-date: Sat, 13 Jan 2001 14:21:03 -0800
- Envelope-to: idn-data@psg.com
At 21.37 +0000 01-01-13, D. J. Bernstein wrote:
>What's your position, Patrik? Do you want to look around in a future
>UTF-8 world and see that the Internet is using some clumsy ACE? Do you
>think that Sendmail will never have to be fixed? Do you object to the
>IDN WG warning implementors that programs should be 8-bit-clean?
My position is that:
- A solution to the IDN problem have to be solved _now_,
and we don't have time to update all protocols to use
anything else than US-ASCII before that point in time
- This means that the immediate solution must be ACE
based
In the longer term if course a more efficient encoding of Unicode can
be used, but that is much further ahead.
I would say that the warning message should not be to be 8-bit-clean,
but that protocols (and therefore applications) should be prepared
that not only the version of Unicode will change, but also the
prefered encoding mechanism. I don't want 8-bit clean protocols, and
UTF-8. I want protocols which can handle UCS-2, UCS-4 or UTF-16 (or
whatever non-encoded version of Unicode). Why have an encoding at all
if we are taking the step of changing the protocols?
What I don't like in your arguments is the focus on "applications"
instead of looking at what is specified in the protocols.
> > Non-decoded data is not as problematic, because then information has
>> not been lost during encoding and transport.
>
>That's what Keith said for years about Quoted-Printable. He ignored the
>screams of the users who, in fact, DID NOT HAVE THE INFORMATION.
I don't understand what you say. Note that I had several different
problems, and that the problem "non-decoded-data" was when the
recipient did get all the information in the form of a charset
argument to the text/plain content type, while the other problems
where when you didn't have this information.
My point was that when we have non-encoded data, the recipient could
by changing the application see all the information without any loss,
but he could not if there were some loss somewhere (8th bit removed
or charset argument missing to 8bit clean data).
What you seem to talk about was how bad it was when the user had an
application which could not decode the quoted-printable given the
encoding was correct and charset parameter was correct?
If that is the case, then you don't understand what I talk about.
>(Keith's favorite airline: ``Yes, sir, we know we shredded your luggage,
>but it's all here! Take some time and sew it back together. By the way,
>we don't shred English luggage.'')
You talk about quoted-printable encoding and charset parameter as it
was something which destroys the content as shredding of luggage
would do.
Sorry, that doesn't work.
Shredding the luggage is when the user can not reassemble the luggage
by using the correct tools. Such things like getting 8bit data
without charset parameter, or getting 8bit data where the 8th bit is
set to zero.
>The goal is not merely to ensure that, in theory, all the information
>has been preserved. The goal is to actually _provide_ the information to
>the user.
My argument was that IF the whole encoded message, as with quoted
printable and a correct charset parameter is coming to the recipient
of the email, then he can himself by changing application get the
information without noticing that the data have been encoded at all.
> > Also, for the n:time, for UTF-8 to work we need to change a large
>> number of protocols, intermediaries/middleware boxes, firewalls etc
>> because the change to the protocols is quite large.
>
>What matters is the software. The protocols can change for free if the
>software already supports the new protocols.
>
>UTF-8, with nameprep handled close to the user as I described, works
>with many existing 8-bit-clean pieces of software. ACE-now-and-forever
>and ACE-now-UTF-8-later would impose much larger software upgrade costs.
But, software doesn't handle 8 bit stuff correctly today even though
we have in the IETF been talking about it for a very long time.
Even worse, software doesn't handle the charset parameter correctly
either (mapping between transport charset and local charset) even
though MIME has now existed for quite some number of years.
Software is not at all as good as you seem to belive.
A couple of questions:
- Do you use an operating system which doesn't use ISO 8859-1 as
native charset for input and output? What charset?
- What is the minimal set of characters you need
in your native language, and what is that language?
- What percent of email you receive and send each day is not english,
i.e. need characters which doesn't exist in the A-Z set?
Note that I am talking about your own personal experience and not
what your customers/users of your software report.
paf