[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] What's wrong with skwan-utf8?

To: duerst@w3.org
Subject: Re: [idn] What's wrong with skwan-utf8?
From: Dan <Dan.Oscarsson@trab.se>
Date: Wed, 3 Jan 2001 16:38:44 +0100 (MET)
Cc: idn@ops.ietf.org, paf@cisco.com, wessorh@ar.com, djb@cr.yp.to
Delivery-date: Wed, 03 Jan 2001 07:42:03 -0800
Envelope-to: idn-data@psg.com
Reply-To: Dan <Dan.Oscarsson@trab.se>

>>They can be updated of course (and I think that is your point). The 
>>Applications Area have created a small task-force(!) which is to review 
>>the protocols in question to see what can be done with them to see that 
>>they are internationalized. I.e. not only the domainname is to be 
>>internationalized but most certainly other protocol elements which are 
>>displayed to the user aswell. One example is the localpart in email addresses.
>
>That's a very good point. But I strongly believe that's a strong
>point for using UTF-8. Here's why: If we use an ACE solution, each
>protocol will have to rehash many similar problems: Whether to use
>UTF-8 or ACE or some other ACE different from the DNS ACE or something
>completely different altogether, also in the case of ACE, how to
>distinguish between original ASCII and ACE, and so on. We'll end
>up with something extremely fragmentary and patched. Even for those
>protocols where it would be rather easy and beneficial to use UTF-8,
>there will be discussions about what to do.

We could also remember that e-mail introduced something like ACE,
it is called quoted-printable (and BASE64). It was introduced for the
same reason: to preserve 7-bits in e-mail so that applications
should not break. And now look at the result:
- yes, it probably did not break some applications.
- yes, it works in the old SMT protocol.
- but it also introduced a lot of problems:
  - still after so many years with MIME many applications do not
    yet handle it or display characters in a user friendly way (using
    local native characters).
  - and quite often it fails somewhat in some applications so that
    we can see some of the quotable-printable text or the characters
    gets messed up.
  - and what a mess for software developers. identifying what parts
    of a text is quoted-printable, identifying the encoded character set
    and decoding it. And doing the reverse when sending.
    If all text (both headers and text bodies) in e-mail had been in
    UTF-8 the world had been so very much easier.
    
With an ACE in DNS, we will have e-mail with ACE-encoded domain names and
of course we want to use non-ASCII in the user name part too. So
an e-mail address could look like:
ax--ergh45d6.ax--fddgf@yf--sdff.hello.yf--sdfh.com
and it could be in a header with comments in quoted-printable.
What a mess to decode. How many applications will break on that?
How many will fail to get it right? How many will display the mess
to the user?

If we used one single character encoding form (like UTF-8) I am sure
much more characters would end up correctely and be displayed
correctely than with an ACE/quoted-printable scheme. If all protocols
use the same encoding only one encode/decode between local character set
and the standard encoding need to be done than can be used by
all software.

So I fully agree with Martin that using an ACE might not be the best
solution - it might break more applications than using only UTF-8
might do.

Regards,
    Dan

Prev by Date: Re: [idn] What's wrong with skwan-utf8?
Next by Date: Re: [idn] What's wrong with skwan-utf8?
Prev by thread: Re: [idn] What's wrong with skwan-utf8?
Next by thread: Re: [idn] What's wrong with skwan-utf8?
Index(es):
- Date
- Thread