[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] Re: Back to work (Nameprep) (was: Re: Just send UTF-8 with nameprep (was: RE: [idn] Reality Check))

To: Keith Moore <moore@cs.utk.edu>
Subject: [idn] Re: Back to work (Nameprep) (was: Re: Just send UTF-8 with nameprep (was: RE: [idn] Reality Check))
From: Martin Duerst <duerst@w3.org>
Date: Wed, 18 Jul 2001 16:46:26 +0900
Cc: Keith Moore <moore@cs.utk.edu>, idn@ops.ietf.org

At 02:28 01/07/18 -0400, Keith Moore wrote:
> > Or do you have some kind
> > of principles or tests in mind that should be applied? Or any
> > particular kind of procedure that we should follow?
>
>one idea (which I don't particularly like) is to assume that all characters
>within a single label are from a single langauge, and if the same glyph
>maps to different code points (indicating characters from differnet languages)
>then you resolve the ambiguity by using the code point that creates the
>fewest number of language changes.  I won't even begin to list the problems
>with this; I mention it only because I think that this approximates the
>behavior that is most natural for human beings.

I think this is worth trying, in order to get rid of the famous 'A' for
Latin, Greek, and Cyrillic. It's of course to be done on a per script
base, not per language. I wouldn't actually resolve by tweaking
codepoints (sometimes it will be very difficult to decide which
codepoint to tweak), but just by rejecting strange combinations.
You have to do a keyboard switch to get from one script to the other,
so the chance of getting a mixture accidentally isn't great.
Doing the check only on the registration side may also be a very
good idea; that may allow us to start with very tight rules and
expand them later (e.g. allow scripts separated by a hyphen,...).
It would also help a lot to address some bidirectionality problems.

>another idea (which I likely only slightly better) is to have two kinds
>of ACE - one (using nameprep) for name-to-whaterver lookups and another
>(not using nameprep) for IDNs returned in PTR records.  That way,
>nameprep can be more agressive about folding together codepoints with
>similar glyphs, because it doesn't affect names *returned* from DNS.
>(unfortunately, you still have to nameprep names returned in CNAME
>records, NS records, MX records, etc.)

Well, 'not using nameprep' for PTR records would actually mean
to use it to make sure the mapping is right, but not to put the
result into the PTR record. It may be that it's better to use
a new record type.

In terms of feasibility, the idea to check that each label only
contains letters from a single script sonds much more doable,
although there are a few special cases (languages written with
Cyrillic using some Latin letters).

Regards,    Martin.

Prev by Date: [idn] Debunking the ACE myth
Next by Date: Re: [idn] Re: Back to work (Nameprep) (was: Re: Just send UTF-8 with nameprep (was: RE: [idn] Reality Check))
Prev by thread: Re: [idn] Re: Back to work (Nameprep) (was: Re: Just send UTF-8 with nameprep (was: RE: [idn] Reality Check))
Next by thread: Re: [idn] Re: Back to work (Nameprep) (was: Re: Just send UTF-8 with nameprep (was: RE: [idn] Reality Check))
Index(es):
- Date
- Thread