[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] UTF-8 as the long-term IDN solution



I can see from all the discussion that we do not agree on
some of the background requirements and that all have not read
all drafts as some answers are in the drafts and at other places
we discuss a solution with different background requirements.


For example: what is a good name?

For me, when entering a name through my user interface and
when receiving one from DNS (for example in a PTR query), a
good name is one that allows my to use the form looking best.
That means that I will enter if using mixed case, if that looks
good, and I expect DNS to return mixed case in a PTR query if
that is how the name was stored. It is not acceptable for DNS
to destroy information by converting my names to all lower case.
I am sure there are other forms than case that other people
use to make their names look better and they prefer to use.

For me, I expect DNS to match names ignoring diffrences in
form of characters (that is A and a match as equal). It has
always been so, and I want it to continue to be so for all
letters (and by letters I do not mean the letters of the English
alphabet. I mean the letters in UCS).
It is totally wrong to allow some letters to have mixed case
and some not (some want ASCII to be compared case insensitive
but no others). A non-IT person will never understand that
limitation.


Format of ACE and UTF-8:
Many who discuss ACE think of ACE in the form needed to support
IDNA. In this case you must first mangle the name into the form needed
to compare names and then encode it with ACE.
If you had the DNS server understand how to compare non-ASCII, the name
encoded in ACE can be the original format using mixed case and other
forms.
And when some discuss UTF-8 you think of UTF-8 handled like IDNA, meaning
that the DNS server need only compare the names in binary form.
If you allow the DNS server know how to compare names, the only
requirement on UTF-8 is that is must be normalised. You could allow
it to be unnormalised, but that is a waste of resources, the best
place to normalise it is in the client.



If you do the work to implement ACE and it works, why do UTF-8?
For me I support the need to exchange non-ASCII data between systems.
My system do not use UTF-8, but to support interoperability, I am
willing to convert my internal format into ONE standard format
when sending over an external protocol and convert form ONE
standard format into by internal when receiving data. I will not
support many different formats as the cost of that is to high.
Of all formats I have seen, normalised UCS using form C, is best.
UTF-8 or UCS-4 is fine for encoding because they are so simple
to handle.
As the world is not just DNS, and many other protocols are going
for UTF-8, my programs will support UTF-8. This means that
if we use UTF-8 in DNS, the cost will be very low as UTF-8 handling
is already needed for other protocols. But ACE will always
add an extra cost and thats why I think we should not work
on making the selected ACE to be as compact as possible, instead
to be as simple as possible and as directely encodable from
UTF-8 as possible.


Is UTF-8 the future?
I would expect that for most UCS is the future character set for
interoperability. I can see no existing alternative.
UTF-8 is stream oriented and do not have a problem of what
byte order to use, so it is probably the best way to encode UCS.
Otherwise UCS-4 is best due to longest bit width and no complicating
things like surrogates.



-
What I want to say with the above is for you to think about more
that just DNS. Think of where the world is going. What character
handling will be used in other places that DNS and think of
how the normal, non-IT person, will see characters.

   Dan