[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] URL encoding in html page
If a simple HTML page contains the following tag,
<a href=http://www.<ML>.com>Hello World!</a>
in which, <ML> maybe in a native legacy encoding or utf8 encoding, it is easy to imagine that
some vistors who click that link may be led to wrong sites or nowhere.
When <ML> encoding is not specified by HTML <meta> charset tags and
the author and visitors have different default char encodings,
interoperability problems will surge on , no matter which architure we
choose between IDNA and UTF8-based IDN. IDN-non-aware and even IDN-aware
Web Browsers would not be able to decide which encoding was used by the author.
IDNA-non-aware browsers will always fail to resolve native encoded Web Hostmaname in the URL,
even when the html page has specified its charset encoding in the <head> section.
In this case, IDNA backward compatibility cannot save the old browser from failures,
without dns/webserver workarounds for on-the-fly native-to-ACE heuristic encodings.
With LDH-only URL, we had no such problems and headaches.
someone may argue that the html URL should have to contain ACEed URL like this:
<a href=http://www.bq--blahblah.com>Hello World!</a>
Should IDNA recommend all HTML authors to use such ACEed URL for backward compatilbility
and error-free fast deployment?