[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Why follow IDNA with UTF-8?
At 00:32 01/07/16 +0000, Adam M. Costello wrote:
>Martin Duerst <duerst@w3.org> wrote:
>
> > In HTML 4.0, there is a provision that says that URIs that contain
> > non-ASCII characters should be interpreted based on conversion to
> > UTF-8
> >
> > The Web and the W3C took RFC 2277 seriously. We are ready for UTF-8,
> > we don't need ACE at all.
>
>Not quite. HTML 4.01 still says that URIs are defined by RFC 2396. So
>consider href="http://IDN.com/foo/". The provision you mention will
>convert it to http://%HH%HH%HH.com/foo/, but that's not a syntactically
>valid URI according to RFC 2396, which says the host part can contain
>only letters, digits, hyphens, and dots, not percent signs.
Yes. I already have written a draft to do the necessary update.
Please see http://search.ietf.org/internet-drafts/draft-ietf-idn-uri-00.txt.
And the philosophy of URIs is that you don't check more than necessary,
and many applications do that.
>And nowhere
>does the HTML spec say that the %HH encoding should not be used on the
>host part
Yes, on purpose. The U in URI stands for Uniform (among else).
>(or should be undone before the name is looked up).
That's not the business of the HTML spec, obviously.
Regards, Martin.