[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: converter page?



Martin Duerst <duerst@w3.org> writes:

>>The page is served in the charset you select.
>
> No. When I go to http://josefsson.org/idn.php, I didn't
> select iso-8859-1. In fact, my main browser (Netscape 7) sends you
> Accept-Charset: UTF-8, *
> in its HTTP headers. For the others I used, Opera7 sends
> Accept-Charset: windows-1252,utf-8,utf-16,iso-8859-1;q=0.6,*;q=0.1
> whereas IE6 and Tango don't send anything. Both Netscape 7
> and Opera clearly express a preference for UTF-8 over iso-8859-1.

All your clients support ISO-8859-1, and it makes things easier for me
(I don't have fully working UTF-8 editor on the web server host), so
ISO-8859-1 seemed like a good choice.  ISO-8859-1 is mentioned more
prominently than UTF-8 in HTML 3.2 and HTTP 1.1, so it should be more
interoperable too.

But okay, I have changed to UTF-8 by default.

>>Standards compliant browsers handle charset conversions in copy/paste.
>
> Well, yes, they handle character encoding conversion in copy/paste.
> They convert from the encoding used in the clipboard to their
> internal (unicode-based) encoding. That's why
> you should avoid confusing the user with 'charset' stuff.

It is a debugging tool for libidn, I expect users to at least be aware
of charset stuff.

As it happens, one of the browsers I use to test the page uses mule
for the internal encoding, and has incomplete unicode support, so
moving the page to UTF-8 forces me to select ISO-8859-1 and reload the
page.

Honestly, the only advantage of moving to UTF-8 for the page I can see
is if there is a browser out there that supports UTF-8 but not
ISO-8859-1.  Since HTML 3.2 and HTTP 1.1 uses ISO-8859-1 that browser
would be broken anyway.

> "The following string must only contain characters that can be
> represented in ISO-8859-1."

Thanks, I'm using it now.

> But there are cases where you produce garbage on your own.
> For example, if I input &uuml;<u">.josefsson.org,
>   &uuml; followed by an actual u-umlaut, (where <u"> is actually an u-umlaut),
> and switch on UseSTD3ASCIIRules, I get:
> /usr/local/bin/idn: idna_to_ascii_from_locale() failed with error 3.
> which I guess means IDNA_CONTAINS_LDH = 3.
> Now if I switch off UseSTD3ASCIIRules and use the same input,
> what I see as a result is xn--<u">-8ya.josefsson.org. The correct
> result is of course &uuml;ü.josefsson.org, which is in the
> source, but not visible. So you have to fix the source to
> be xn--&amp;uuml;-8ya.josefsson.org.

Ah, true.  I'll see if there is a handy function for escaping...