[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] URL encoding in html page
Compliant browsers already have to handle Unicode, since NCRs (e.g.
ሴ ) are always Unicode code points. All XML parsers also have
to handle Unicode (UTF-8 and UTF-16).
> Legacy encodings
> will dominates even in the future, because it is compact and
> inexpensive.
While I do expect the transition to Unicode to take some time, once
some of the older browsers die off it may shift more rapidly than we
think.
Mark
—————
Γνῶθι σαυτόν — Θαλῆς
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]
http://www.macchiato.com
----- Original Message -----
From: "Soobok Lee" <lsb@postel.co.kr>
To: "IETF idn working group" <idn@ops.ietf.org>
Sent: Friday, March 22, 2002 02:04
Subject: Re: [idn] URL encoding in html page
>
> ----- Original Message -----
> From: "Bruce Thomson" <bthomson@fm-net.ne.jp>
> To: "Soobok Lee" <lsb@postel.co.kr>; "IETF idn working group"
<idn@ops.ietf.org>
> Sent: Friday, March 22, 2002 6:29 PM
> Subject: Re: [idn] URL encoding in html page
>
>
> > > What if all the html viewable text is in english, but, only the
href url contains
> > > legacy (korean) encoded hostnames? chinese visitors would see
clean english homepage,
> > > but fail to click through the korean link.
> > >
> > Well, that could happen, but a META tag would solve that so
easily. Personally
> > I often use a simple text editor to deal with HTML, and would find
it easier to
> > use legacy encodings or UTF-8 than cut-and-paste ACE from
somewhere.
> > Of course the user could do it either way and it would work.
>
> Yes. Charset META tags help. But, many homepages have assumptions
on the main audience's
> default char encodings and very often omit the META tag for the
encoding like :
> <meta http-equiv="Content-Type" content="text/html;
charset=euc-kr">
>
> Moreover, IDN url would be used in a pure FRAMESET document that
defines frame URLs
> and contains no viewable texts. Such FRAMESET documents often omit
charset META tags.
> (look into the html source of http://www.freeway.co.kr/ )
>
> AFIAK, 99.99999% of korean homepages have implicit/explicit
> legacy korean encoding (KS_C_5601-1987 or euc-kr). So do most
japanese/chineses homepages.
> UTF8/UCS-2 encodings are rarely used in global WEB publishing.
Legacy encodings
> will dominates even in the future, because it is compact and
inexpensive.
>
> IF we want to make IDN truly internationally interoperable, all
IDN-aware webbrowsers/applications
> should contain libaries of all kinds of legacy-to-Unicode conversion
routines. It will burden
> too much memory load on handheld devices like PDA.
>
> Moreover, legacy encodings are revised separately from unicode. We
may face with as toughest
> versioning problems as we did in stringprep/nameprep versioning
problems for newly added unicode points.
> How to guarantee stability and intergrity of IDN operations in the
all combinations of numerous kinds and versions of iDN-aware
> applications and legacy encodings?
>
> Soobok Lee
>
>
>