[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] idn-uri



Bruce wrote:

>Thanks for clarifying the overall process. This is beginning to get
manageable
>now. I have a question, though. Why do you say that it would be
preferable
>for the generating systems to do the conversion from (1) to (2) if
possible,
>if we won't be able to assume at later steps that the work has already
been
>done? I would think that the user would prefer to see the unmodified
>http://<iso-2022-jp> in his text editor (the rest of the page would
also be
>in iso-2022-jp), which would enable him to easily to edit his links
>without returning to the original generating software.

I can see three important stages/places:
1) The local system/user interface.
   Here the user could use ISO-2022-jp and it would then be expected
that
   URLs, domain names and everything else is is ISO-2022-jp. No ACE, no
%-encoding.
   This is the simplest for the user. It is also best for programmers as
no special
   character handling is needed.

2) Element comparing and handling.
   When URLs, domain names, or some thing else need to be compared or
   worked on, they must all be of the same format. You cannot compare
   a %-encoded URL with a ISO-2022-jp-URL. Also, some have special
   rules like host names which may only contain a restricted set of
characters.
   The format used here is what I think Eric is calling the canonical
format.
   Typically this would be UCS-4, but could be %-encoded names. Though
   using UCS would in most cases be easiest to process.

3) Transmission using a protocol between places.
   Here you need one (or a few) standard format to send all data with.
This so
   both ends share the same "language". To make it simple, only one
format should
   be used for all data. You should not have ACE-encoded host names,
%-encoded URLs
  and quoted-printable encoded text at the same time. This only makes
things difficult.
   As the world looks today, best would be if normalised UTF-8 was used
everywhere.

    Dan