[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] URL encoding in html page
Soobok Lee <lsb@postel.co.kr> wrote:
> If a simple HTML page contains the following tag,
> <a href=http://www.<ML>.com>Hello World!</a>
> in which, <ML> maybe in a native legacy encoding or utf8 encoding, it
> is easy to imagine that some vistors who click that link may be led to
> wrong sites or nowhere.
Very easy to imagine indeed, because the HTML spec says that the href
attribute must contain a URI, and the URI spec says that the host must
contain only ASCII letters, digits, hyphens, and dots (or it may be a
bracket-enclosed IPv6 address literal).
> Should IDNA recommend all HTML authors to use such ACEed URL for
> backward compatilbility and error-free fast deployment?
Not necessary, since the HTML and URI specs already limit the host to
ASCII letters, digits, hyphens, and dots.
> Is HTTP/1.2 being planned for IDN HOST: values ?
Bruce Thomson <bthomson@fm-net.ne.jp> replied:
> Well, depending on how you want it to work, version 1.1 might be OK.
> It allows %-escaped UTF-8 I believe.
HTTP 1.1 says that the Host: header field must contain the <authority>
part of the URI (that is, <host>[:<port>] ), and the URI spec forbids
%-escapes in <authority>.
> Would using ACE here be a change to the HTTP spec
No. ACE host labels are honest-to-goodness valid ASCII host labels, so
you can use them wherever traditional ASCII host labels are allowed.
You don't need any special permission or invitation.
AMC