[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] idn-uri
Bruce wrote:
>Thanks for clarifying the overall process. This is beginning to get
manageable
>now. I have a question, though. Why do you say that it would be
preferable
>for the generating systems to do the conversion from (1) to (2) if
possible,
>if we won't be able to assume at later steps that the work has already
been
>done? I would think that the user would prefer to see the unmodified
>http://<iso-2022-jp> in his text editor (the rest of the page would
also be
>in iso-2022-jp), which would enable him to easily to edit his links
>without returning to the original generating software.
I can see three important stages/places:
1) The local system/user interface.
Here the user could use ISO-2022-jp and it would then be expected
that
URLs, domain names and everything else is is ISO-2022-jp. No ACE, no
%-encoding.
This is the simplest for the user. It is also best for programmers as
no special
character handling is needed.
2) Element comparing and handling.
When URLs, domain names, or some thing else need to be compared or
worked on, they must all be of the same format. You cannot compare
a %-encoded URL with a ISO-2022-jp-URL. Also, some have special
rules like host names which may only contain a restricted set of
characters.
The format used here is what I think Eric is calling the canonical
format.
Typically this would be UCS-4, but could be %-encoded names. Though
using UCS would in most cases be easiest to process.
3) Transmission using a protocol between places.
Here you need one (or a few) standard format to send all data with.
This so
both ends share the same "language". To make it simple, only one
format should
be used for all data. You should not have ACE-encoded host names,
%-encoded URLs
and quoted-printable encoded text at the same time. This only makes
things difficult.
As the world looks today, best would be if normalised UTF-8 was used
everywhere.
Dan