[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] What's wrong with skwan-utf8?
- To: "John C Klensin" <klensin@jck.com>
- Subject: Re: [idn] What's wrong with skwan-utf8?
- From: "Edmon" <edmon@neteka.com>
- Date: Wed, 3 Jan 2001 10:13:29 -0500
- Cc: <idn@ops.ietf.org>
- Delivery-date: Wed, 03 Jan 2001 07:11:02 -0800
- Envelope-to: idn-data@psg.com
Hi John,
My biggest worry is what message we send to the end user and what
preparation we should do during the transition period. If we know that some
systems will break and crash, should we try to at least identifiy some of
these problems specifically so that operators around the world can start
preparing for it?
What is your suggestion for preparation?
Do we just say that your application is non-compliant for IDN and must
upgrade? Even for this simple message, we should prepare some kind of
automatic response? or not? so that the end user knows what the problem is.
If we do so, however, it does mean that we will be making some changes on
the server end. If we must do something to indicate to the user at the
server end, then we might as well try to determine a "better long term"
solution now so that while the server end prepares for the immediate
transition, it also incorporates the longer term solution.
Edmon
> --On Friday, 29 December, 2000 12:02 -0500 Edmon
> <edmon@neteka.com> wrote:
>
> > Does anyone have a good list of "what happens" if an full
> > 8-bit name is received by the different applications?
> > The reason we should strive to compile this list is that once
> > IDN is deployed, there will be people trying to enter
> > multilingual domain names from non-compliant applications.
> > There is simply no way of avoiding this, even if we choose to
> > go with ACE.
> >
> > If something is bound to break or go down, we should know
> > about it beforehand so that we can prepare for it.
>
> Edmon,
>
> Murphy's Law is really the operant principle here. If something
> significant on which other things depend is changed, some of
> those things will respond in disruptive ways. Indeed, even if
> the original software and systems were clearly broken, things
> will have been written to adapt to the broken behavior and some
> of them will behave poorly when the problem is corrected.
> Making lists of software is almost pointless unless one is
> willing to say "ok, this will cover 75% of the cases, and it is
> just tough luck for the other 25%" or "anything that is broken
> will provide incentives to its authors for fixing it quickly and
> that might even be a good thing" (I've heard arguments close to
> both of those on this list, although I find it hard to
> sympathize with them in the case of the DNS).
>
> In the specific case of "8 bit" names, experience indicates that
> one of several things will happen:
>
> (i) Everything will, as Dan has pointed out, work fine.
>
> (ii) The application will discard the high-order bits,
> resulting in a mess.
>
> (iii) The application will convert all "unknown" or
> "invalid" octets (i.e., those with the high bit on or
> those others not permitted by the application in domain
> names) to a single "noise" character.
>
> (iv) The application will produce an "invalid name"
> error message.
>
> (v) The application will treat the octets as arising
> from one of the ISO 8859-NN character sets (which one
> will depend on the application, although 8859-1 is most
> common), rather than as a UTF-8 encoding of ISO 10646/
> Unicode.
>
> (vi) The "invalid" characters will cause the application
> to go down some insufficiently-tested or
> insufficiently-robust code path, resulting in its
> blowing up or fatally crashing.
>
> Note that several of the above can lead, in especially unlucky
> cases, to finding a name other than the one the user intended
> and hence returning of results that represent a different host
> or resource than the one desired. That type of error, which is
> hard to detect, may, from a user or resource-provider
> standpoint, be worse than any of the other erroneous outcomes.
>
> Of course, most of these same comments would apply to direct
> (not further encoded) use of UCS-4.
>
> We presume, but cannot prove, that seven-bit-only encodings will
> not have these problems because they will not trigger behavior
> based on the high order bit being unexpectedly turned on. There
> are still possibilities, some based on incorrect code and others
> deriving from actions taken to avoid displaying characters that
> are not within the capabilities of local devices, of some of the
> same issues (especially variants on iii, iv, or vi) occurring
> with ACE-like encodings.
>
> john
>