[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] What's wrong with skwan-utf8?



> That's a very good point. But I strongly believe that's a strong
> point for using UTF-8. Here's why: If we use an ACE solution, each
> protocol will have to rehash many similar problems: Whether to use
> UTF-8 or ACE or some other ACE different from the DNS ACE or something
> completely different altogether, also in the case of ACE, how to
> distinguish between original ASCII and ACE, and so on. We'll end
> up with something extremely fragmentary and patched. Even for those
> protocols where it would be rather easy and beneficial to use UTF-8,
> there will be discussions about what to do.
> 
> That will be different if we use UTF-8 for DNS. It will be much
> easier for application protocols to use the same solution. 

no, you have it exactly backwards.  if we use ACE for IDNs
within application protocols, then we can use the same 
representation for IDNs in each of those protocols, and we need
only worry about the presentation of IDNs to the user.
if we don't use some ACE for IDNs, then each application protocol
will end up with a different solution, and we will have to convert
between representations at the boundaries between applications - often
in places where there are presently no hooks for such a conversion.

we can't simply declare that existing app protocols will use UTF-8 - 
or any other non-ACE - because each app has its own, unique, entrenched 
assumptions about the legality of non-ASCII characters or the meanings 
of the octets used to encode non-ASCII characters if they do appear, 
and each one has its own way of framing domain names.  even if we were
to set a long-term goal of encoding IDNs "natively" within applications
(for whatever minimial benefit this might bring) we would need ACE
as a transition strategy - we're not going  to be able to upgrade all
apps that assume ASCII DNS names within a short time, and we're not going
to be able to wait until they're all upgraded before we start using IDNs.

furthermore, if we use ACE within applications then non-upgraded
applications will still behave predictably when presented with
encoded IDNs.  if we don't use ACE then those applications will
behave unpredictably.  (and no, it's neither feasible nor useful
to try to predict the behavior of all apps when presented with
native IDNs - there are too many apps to consider, some of them 
quite obscure but nevertheless important - and you can't effectively
analyze an immense and constantly changing distributed system 
for bugs by presenting it with a small number of test cases)

the need for an ACE is dictated by the needs of existing applications
and by the need to exchange IDNs between such applications.
the encoding used by DNS "on the wire" for IDNs is almost irrelevant.  
but if it uses ACE also, then existing applications will
"do the right thing" when called upon to look up an encoded IDN,
and inverse lookups will also work.  also, if ACE is used on-the-wire,
then having the on-the-wire form of IDNs "leak" to legacy applications
does not cause any majory problems; whereas if some other form is 
used on-the-wire, then such leakage causes

so, to summarize

- ACE is necessary, at least in the near term, for use within applications
- if in the near-term the on-the-wire format uses ACE also, there's less 
  likely to be leakage to legacy apps which will cause unpredictable behavior.
  
Keith