[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Let's go forward with IDNA and UTF-8




>From all the discussion on the list I can see two major directions
for non-ASCII names in DNS: use ACE (in application) and use UTF-8.

The people saying ACE/IDNA say it is quickest and least disruptive
to non-IDN aware software. Those wanting UTF-8 thinks this format
is best for internationalisation and fear that starting with IDNA
will result in ACE forever.

I have for a long time wanted UTF-8 but realise that we probably
must use ACE for backward compatiblity.
Before I go on with what I think we should do, I will give some comments
on UTF-8.

Today we have many character sets in use. In protocols we also have many
and we have ACE like encoding, the best example is quoted-printable
in MIME. I have used these encodings and character sets as a user,
a system administrator and as a software developer. It is a terrible
mess! As a user I quite often get to see junk because of things
like quoted-printable in e-mail and as a software programmer I
need very complex software to decode all different encodings used and
to handle all the character sets. To decode an e-mail is difficult,
in it are character sets, encodings and normal text mixed forcing me
to complex parsing. If everything had been in one character set the
world had been much simpler. I would be much more willing to support
languages different from my own. If normalised UCS had been used
everywhere it would be wonderful.
While I know that my system that have everything encoded in ISO 8859-1
cannot change overnight to UCS, if would work much better if all
interoperability protocols used one encoding of all character data.
While UTF-8 is far from perfect, it is the best choice available.
I can today see no other format than UCS normalised using Unicode form C
and then encoded using UTF-8 to be the best to use for interoperability.
That is my reason for wanting UTF-8/UCS/normalisation form C.

Back to business.

I can see we have three things to decide:
1) Internationalsation of the DNS protocol and handling
2) Selecting ACE
3) How to match names (nameprep).


1) Internationalsation of the DNS protocol and handling
Here people want IDNA for quick employment.
ACE for backward compatibility.
UCS/UTF-8 as standard encoding instead of one more encode in ASCII format.

Could we not then do this?
   The DNS protocol/handling selected is:
   - ACE for backward compatibility
   - IDNA for quick employment
   - UCS/UTF-8 as standard encoding combined with EDNS for longer labels.
Of what I can see this can be done. Combining the IDNA draft with the
UDNS draft will give most of it. And both IDNA and UCS/UTF-8 can be
used at the same time.

Would not this satisfy most people?

2) Selecting ACE.
This need to be done quickly. Especially as Verisign plans to use RACE
even though it may not be the best. We cannot accept that one company
selects the ACE to use. The ACE selected must be done after comparing
all ACE formats. We should give the main criteria for selecting it.
I suggest that the ACE working group inform the rest of us what they
are thinking and what they have done so far.


3) How to match names (nameprep).
We also need to define how names are to be matched.
Today we have one draft for that (nameprep) but that one is
written to fit with IDNA and may have to be changed to fit general
matching rules.
The most important matter to take up here, which have been pointed
to by other (like Dan Bernstein), is: shall two characters
that look identical to typical users match as equal?
They do not in the current version of nameprep.



Could we not go forward with something like this?
Or do those favoring IDNA not think UTF-8 is at all acceptable?
Or do those favoring UTF-8 not accept ACE to be used for backward
compatibility?

If we do a compromise and allow for both preferred formats,
IDN can be working fairely soon. Otherwise I suspect IDN handling
in DNS (and other protocols) will be a mess for a long time.

   Dan