[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Re: converter page?
On the ICU site there is a page that you might find helpful. For example, if
you input text with foreign characters, like
Počítače; Οι ηλεκτρονικοί.
and set Compound 1 to
\P{ascii} hex
you'll get
Po\u010D\u00EDta\u010De; \u039F\u03B9
\u03B7\u03BB\u03B5\u03BA\u03C4\u03C1\u03BF\u03BD\u03B9\u03BA\u03BF\u03AF.
You can also use "\P{ascii} hex/unicode" to get this format:
PoU+010DU+00EDtaU+010De; U+039FU+03B9
U+03B7U+03BBU+03B5U+03BAU+03C4U+03C1U+03BFU+03BDU+03B9U+03BAU+03BFU+03AF.
And if you want the character names, you can use "\P{ascii} name":
Po\N{LATIN SMALL LETTER C WITH CARON}\N{LATIN SMALL LETTER I WITH
ACUTE}ta\N{LATIN SMALL LETTER C WITH CARON}e; \N{GREEK CAPITAL LETTER
OMICRON}\N{GREEK SMALL LETTER IOTA} \N{GREEK SMALL LETTER ETA}\N{GREEK SMALL
LETTER LAMDA}\N{GREEK SMALL LETTER EPSILON}\N{GREEK SMALL LETTER
KAPPA}\N{GREEK SMALL LETTER TAU}\N{GREEK SMALL LETTER RHO}\N{GREEK SMALL
LETTER OMICRON}\N{GREEK SMALL LETTER NU}\N{GREEK SMALL LETTER IOTA}\N{GREEK
SMALL LETTER KAPPA}\N{GREEK SMALL LETTER OMICRON}\N{GREEK SMALL LETTER IOTA
WITH TONOS}.
(The "\P{ascii}" is a filter, added so that none of the above affect the
ASCII contents.)
Or use "Latin" to get
Počítače? Oi ēlektronikoí.
You could also use "Latin; nfd; \p{mark} remove; nfc" to strip accents,
getting:
Pocitace? Oi elektronikoi.
Mark
________
mark.davis@jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799
----- Original Message -----
From: "John C Klensin" <klensin@jck.com>
To: "Martin Duerst" <duerst@w3.org>; "Simon Josefsson" <jas@extundo.com>
Cc: "IDN" <idn@ops.ietf.org>
Sent: Saturday, March 08, 2003 13:59
Subject: Re: [idn] Re: converter page?
> Simon,
>
> Let me make one additional suggestion, which is sort of
> orthogonal to Martin's... It would be useful, as an alternative
> to UTF-8 and the other encodings you support, to be able to put
> in a string of characters as a list of items in U+nnnn form.
> You show that form in your debugging option, but, if the
> characters going in don't match what you produce, there is no
> obvious way to provide them. I'm particularly concerned here
> about characters my browser has no way to render (e.g.,
> appropriate fonts not installed, etc.)
>
> The script/web page itself is much appreciated.
>
> thanks,
> john
>
>
> --On Saturday, 08 March, 2003 15:31 -0500 Martin Duerst
> <duerst@w3.org> wrote:
>
> > Hello Simon,
> >
> > Very nice to put up such a script.
> >
> > It would be great if the default page was served as UTF-8.
> > That way, on any recent browser, any user can just copy/paste
> > or type in their idn and submit the query, without having to
> > worry about encoding issues.
> >
> > Using various different encodings the way you do is exposing
> > your system internals in a way the Web was designed (and is
> > implemented) to abstract from.
> >
> > The 'force charset to' drop-down menu is particularly
> > dangerous, because it does not force the browser to send the
> > characters that the user has pasted or input to the server in
> > that encoding, it just forces the server to MISinterpret the
> > octets that the browser sent.
> >
> > At the top of the page, you write:
> > Report problems to bug-libidn@gnu.org, but first please
> > make sure your browser really is encoding the data you
> > type in the charset you select. If not, incorrect output
> > or an error is the proper response.
> >
> > This is heavily backwards. The browser will do the right thing
> > if you just allow it to do so, and don't allow the user to mess
> > around with it.
> >
> > Also, some browsers tend to send named or numeric character
> > references when characters in a text field are outside of the
> > encoding of the page. That as such is non-standard, and you
> > don't necessarily have to deal with it. However, you should
> > make sure that the output you send back is properly escaped.
> > For example not
> >
> > $ echo 'Dürst.josefsson.org' | /usr/local/bin/idn
> > --idna-to-ascii 2>&1
> >
> > but
> >
> > $ echo 'D&uuml;rst.josefsson.org' | /usr/local/bin/idn
> > --idna-to-ascii 2>&1
> >
> >
> >
> > Regards, Martin.
> >
> > P.S.:
> >
> > I tested this with several browsers. With IE, there were
> > difficulties to interpret the encoding of your page correctly
> > in the first place. My current guess is that this is due to
> > the fact that you use additional double quotes in
> > <meta http-equiv='Content-Type' content='text/html;
> > charset="ISO-8859-1"' />, instead of simply
> > <meta http-equiv='Content-Type' content='text/html;
> > charset=ISO-8859-1' /> I might be wrong, but other than that,
> > I can't see any reason at the moment. (you should also make
> > sure that you properly escape the '&' in things such as
> > "&mode=toascii&charset=UTF-8").
> >
> >
> >
> > At 01:10 03/03/02 +0100, Simon Josefsson wrote:
> >> "Eric A. Hall" <ehall@ehsco.com> writes:
> >>
> >> > Anybody know of a web form that does IDNA conversion
> >> > on-the-fly? Something that will let me enter the domain
> >> > name and get the IDNA encoded form back. I find myself
> >> > needing to do do some quicky conversions periodically.
> >>
> >> <http://josefsson.org/idn.php>
> >
> >
>
>
>
>
>
>