[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] IDN character repertoire, nameprep
- To: Joel Rowbottom <joel@jml.net>
- Subject: Re: [idn] IDN character repertoire, nameprep
- From: John C Klensin <klensin@jck.com>
- Date: Tue, 20 Mar 2001 18:54:03 -0500
- Cc: idn working group <idn@ops.ietf.org>
- Delivery-date: Tue, 20 Mar 2001 15:54:33 -0800
- Envelope-to: idn-data@psg.com
Dear Mr. Simplistic :-)
--On Tuesday, 20 March, 2001 12:17 -0800 Joel Rowbottom
<joel@jml.net> wrote:
> OK. Maybe I'm missing the point here, but why aren't we just
> saying "OK, character set latin-1, let's stick to that and make
> sure that the accents
> are there"? Then perhaps passing an extra parm on a DNS request
> for a preferential character set?
First of all, in the grand scheme of things, Latin-1 is very,
very, easy. Almost as easy as IA4 / "Hostname ASCII", although
even it raises problems. As you move to other scripts, you get
into much more subtle problems than "accents" (see the archives
and discussions of this WG for the last, e.g., 12 months). But,
even in Latin-1, rules about upper/lower case matching get quite
subtle, e.g., languages (sic) in which upper case letters don't
have accents.
But, you see, if you permit (or require, since "permit" implies
that there will be a global default which discriminates against
all other languages) a "preferred character set" parameter, then
the DNS also needs to contain a character set tag so you can tell
if the preference matches. And, if it doesn't, you will either
need the grandmother of all converters (which will not, in many
cases, work because of non-unique inter-character-set mapping
issues) or error messages when your guess as to script and coding
doesn't match that in the DNS. That gets you DNS data structure
changes (to accomodate the tag), protocol changes (to accomodate
the parameter), and either a lot of new opportunities for
non-interoperation (error messages if script/code-tagged doesn't
match script/code-requested) or a huge converter module hung off
the side of DNS servers. And the "lookups are exact and
predictable" principle dies.
That is a fairly high price to pay in complexity for something
you think is simple, isn't it?
> I know that I've done various bits of experimentation with a
> few people and
> client systems notwithstanding we can actually implement most
> of an internationalised character set now... so why aren't we?
Because, if your experiments are working according to the model I
think you are describing above, they are dependent on your
collection of "people" all using the same 8859-N code set and
transmitting that information out of band. If I transmit a
collection of octets to you with the eighth bit high in some of
them, you will have no clue as to whether I'm transmitting
Latin-1, Latin-9, Cyrillic, Greek, Hebrew,... You may not even
know whether I'm transmitting one of those or, say, some flavor
of JIS.
And, if you take all of those and look them up as if they are
Latin-1, either we are all going to be in very big trouble or you
are going to have to preclude registrations in character sets and
scripts for anything but Latin-1. I don't think the latter
would be considered a solution to the IDN problem by anyone
outside western Europe and North America (and, in fairness, not
by many Western Europeans and North Americans either).
john