[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: converter page?



Martin Duerst <duerst@w3.org> writes:

> Hello Simon,
>
> Very nice to put up such a script.

I believe I have fixed the problems you mention, thanks for taking the
time to point them out.

> It would be great if the default page was served as UTF-8.
> That way, on any recent browser, any user can just copy/paste
> or type in their idn and submit the query, without having to
> worry about encoding issues.

The page is served in the charset you select.  Chose UTF-8 if you want
UTF-8.  Only supporting UTF-8 would restrict the page's usefulness.
Standards compliant browsers handle charset conversions in copy/paste.

> Using various different encodings the way you do is exposing
> your system internals in a way the Web was designed (and is
> implemented) to abstract from.
>
> The 'force charset to' drop-down menu is particularly dangerous,
> because it does not force the browser to send the characters
> that the user has pasted or input to the server in that encoding,
> it just forces the server to MISinterpret the octets that the
> browser sent.
>
> At the top of the page, you write:
>     Report problems to bug-libidn@gnu.org, but first please make sure your
>     browser really is encoding the data you type in the charset you select.
>     If not, incorrect output or an error is the proper response.
>
> This is heavily backwards. The browser will do the right thing if
> you just allow it to do so, and don't allow the user to mess
> around with it.

I have tried to make the intended behaviour more clear.  You must type
characters in the charset the page uses.  If you want to use another
charset, it is a two step process: first change charset, then enter
new data.

> Also, some browsers tend to send named or numeric character references
> when characters in a text field are outside of the encoding of the
> page. That as such is non-standard, and you don't necessarily
> have to deal with it. However, you should make sure that the
> output you send back is properly escaped. For example not
>
> $ echo 'D&uuml;rst.josefsson.org' | /usr/local/bin/idn --idna-to-ascii 2>&1
>
> but
>
> $ echo 'D&amp;uuml;rst.josefsson.org' | /usr/local/bin/idn
> --idna-to-ascii 2>&amp;1

Since it is non-standard, I'll deal with it using the garbage in
garbage out philosophy.  Someone might even find the current behaviour
useful.

> I tested this with several browsers. With IE, there were difficulties
> to interpret the encoding of your page correctly in the first place.
> My current guess is that this is due to the fact that you use additional
> double quotes in
> <meta http-equiv='Content-Type' content='text/html; charset="ISO-8859-1"' />,
> instead of simply
> <meta http-equiv='Content-Type' content='text/html; charset=ISO-8859-1' />
> I might be wrong, but other than that, I can't see any reason at the moment.

I don't see anything wrong with the code, and I don't have access to
IE to test this further.  If you, or someone else, wants to
investigate this further, it would be appreciated.