[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Do you want ASCII forever?
Dan Oscarsson <Dan.Oscarsson@trab.se> wrote:
> I would like to know if most of you want to
> have ASCII forever, or if you want to move to UCS as the
> character encoding to be used for interoperability?
>
> I have since the beginning of the IDN working group wanted UCS
> to the only character encoding to be used between systems.
> But from the discussions on the list I have felt there is a big
> group not wanting to leave ASCII.
Speaking as a relative outsider to the IDN working group and its
charter, I see this as a loaded question that completely misrepresents
the whole debate between ACE and UTF-8.
Neither ACE nor UTF-8 "is" the UCS to the exclusion of the other. UTF-8
is an official Character Encoding Form for the UCS (Unicode), while the
ACE concept in general and Punycode in particular were designed
specifically for the IDN project. But that has nothing to do with ACE
proponents "not wanting to leave ASCII." Anyone who is involved in IDN
is, almost by definition, involved because he or she wants to see a way
to use non-ASCII characters in the DNS.
Both ACE and UTF-8 are designed to represent all the characters of the
UCS, while at the same time maintaining compatibility with some legacy
character set. In the case of UTF-8, the set with which compatibility
is maintained is the entire 7-bit ASCII code space; in the case of ACE,
it is the subset of ASCII used in the existing DNS.
Remember that the entire purpose for inventing UTF-8 in the first place,
ten years ago, was to allow Unicode to be used in Unix and other
ASCII-based operating systems, given their assumptions and
estrictions -- not a very different situation from inventing ACE to
allow Unicode to be used in the DNS, given *its* assumptions and
restrictions.
Remember, too, that IDNA requires more than just ACE (or UTF-8). It
also requires nameprep, which is several times more complex than ACE
and, in any event, refutes the argument that existing UTF-8-aware
browsers and e-mail clients could handle a UTF-8-based IDNA with no
change.
> IDNA will result in increased complexity. I have written software
> handing the decoding of MIME and URLs. I very much dislike the
> mess where every small part of a text line have to first be parsed
> into parts, the each part have to be decoded using different
> methods. ACE, %-encoding, quoted-printable,...
Of course IDNA will result in increased complexity. Did anyone honestly
believe it would be possible to add support for an almost limitless
repertoire of characters without some added complexity?
> The world would have been so much simpler if everybody
> had used UCS. Why not at least make that the goal and try
> to get rid of the "encode on top of ASCII"?
I agree that life would be easier if the rest of the world had
cooperated in advance. But they didn't (and they still aren't now, at
least not completely), and IDN is intended to work in the world that
actually exists, not the one we wish we lived in.
> Looking at some of the more important areas: DNS, SMTP, HTTP
> and HTML they could be fixed fairely easy.
>
> - SMTP could add a negotiate in startup switching default
> character mode to UCS NFC in UTF-8 (requiring
> all headers to be UTF-8 and all default for all text).
>
> - HTTP version 1.2 could require all URLs and headers to
> be UCS NFC in UTF-8.
>
> - HTML 4.02 could require all URLs to use the same character
> set as the rest of the document or use %-encoded UTF-8.
>
> - DNS can use something built on UDNS, though due to having
> IDNA it will need an overhead as it will have to query
> the server if it can handle UCS (would not have been needed
> if we had started with UDNS).
Look at all the changes you are proposing. Then look at all the
difficulty e-mail clients and Web browsers are having just keeping up
with the standards that already exist.
Timeliness does matter in this project. While we are sitting here,
debating (again) whether IDNA should use the "perfect world" approach
that requires all these other upgrades, companies are already selling
"internationalized" Web addresses based on proprietary technologies.
Developing a solution that is philosophically cleaner but requires the
rest of the world to make changes will only slow things down. It will
not speed them up, and it will not result in a justifiably simpler
solution for end users.
> What does IETF want?
According to the charter, IETF wants "internationalized access to domain
names." There is no mention of Unicode or UCS to achieve this goal, and
although it would be a bit silly to try to achieve the goal *without*
Unicode, that does not imply in any way that UTF-8 or another standard
Unicode CEF must be used to do so.
The charter also says, "A fundamental requirement in this work is to not
disturb the current use and operation of the domain name system, and for
the DNS to continue to allow any system anywhere to resolve any domain
name." That pretty much rules out changing the rest of the world to be
UTF-8 conformant.
I, too, look forward to the day when the world uses Unicode as
universally as it uses ASCII today (i.e. when the set of systems that
"do not use Unicode" can be described as simply as we currently say "IBM
mainframes" to describe the systems that "do not use ASCII"). But we
aren't there yet, and IDN cannot be the tool to force everyone else to
get there.
Besides, haven't we already had this debate?
-Doug Ewell
Fullerton, California