[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] URL encoding in html page
----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "IETF idn working group" <idn@ops.ietf.org>
Sent: Sunday, March 24, 2002 9:36 PM
Subject: Re: [idn] URL encoding in html page
>
> ----- Original Message -----
> From: "Soobok Lee" <lsb@postel.co.kr>
> > > Not necessary, since the HTML and URI specs already limit the host to
> > > ASCII letters, digits, hyphens, and dots.
> >
> > We experts already knew this. But, many ML.com registrants don't know about this
> > poor destiny of ML.com. They want to use native ML.com in their HTML homepage.
> >
> > If we want to have interoperable URI supporting native IDN, we should revise
> > URI spec and HTTP spec BOTH. But, native IDN supports accompany potential
> > legacy code versioning and code interoperablility problems.
> > Would anyone provide indepth analysis on this caveat ?
> >
>
>
> Even if we stay with current HTTP/1.1 which allows only ASCII host: header values,
> still we could revise URI spec to allow native (utf8 or legacy encoding) IDN in URI.
>
> 1) With IDNA and HTTP/1.1 , the web browser can encode Native IDN in URI into ACE one , and
> then open HTTP 1.1 session into the ACEed hostname with ACE host: value.
>
> 2) With IDNA and revised HTTP with utf8 host support, the web browser can encode
> utf8 IDN in URI into ACE one, and then open HTTP session into ACE hostname with utf8 host: value.
>
> 3) With UTF8-based IDN and revised HTTP with utf8 host support, it can check whether
> the native IDN is in utf8, and, if not, convert the iDN into utf8 , and then open
> HTTP session into utf8 webhost with utf8 host: value.
>
>
> 2) and 3) may be infeasible due to HTTP's lack of capability negotiation feature like that of ESMTP,
s/and 3)// :-) In 3), the webserver surely support native utf8 host: value.
> because the new web browser with native IDN URI support can't decide whether the web server supports
> native IDN or supports only ASCII(ACE) host in HOST: value before trying that twice with both forms
> of host: value (utf8 first, and then ACE if needed). Using ACE host: value is always safe in 1) and 2).
>
> BTW, in 1) and 2), we cannot avoid legacy versioning problems because
> most ACE conversion would be done by "ACE(NFKC(CaseFold(legacy-to-Unicode(native label))))".
> Most homepages in east asia are in legacy encodings and that monopoly (near 100%) won't change
> in the forseeable future.
>
> new legacy codes may be created after IDN-aware browsers are distributed.
> old legacy codes may get new code points for newly added characters.
> If IDN-aware browsers/applications are not upgraded with new legacy-to-Unicode mappings,
> they will occasionally fail to convert legacy-encoded IDN into UNICODE one.
> That kind of IDN failure had never seen in LDH DNS.
>
> Soobok Lee
>
>
>
>
>
>
>