[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] Why follow IDNA with UTF-8?



Martin Duerst <duerst@w3.org> wrote:

> In HTML 4.0, there is a provision that says that URIs that contain
> non-ASCII characters should be interpreted based on conversion to
> UTF-8
>
> The Web and the W3C took RFC 2277 seriously.  We are ready for UTF-8,
> we don't need ACE at all.

Not quite.  HTML 4.01 still says that URIs are defined by RFC 2396.  So
consider href="http://IDN.com/foo/";.  The provision you mention will
convert it to http://%HH%HH%HH.com/foo/, but that's not a syntactically
valid URI according to RFC 2396, which says the host part can contain
only letters, digits, hyphens, and dots, not percent signs.  And nowhere
does the HTML spec say that the %HH encoding should not be used on the
host part (or should be undone before the name is looked up).

AMC