[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [idn] Fast nameprep vs. slow nameprep





> -----Original Message-----
> From: D. J. Bernstein [mailto:djb@cr.yp.to]
...
> Karlsson Kent - keka writes:
> > Do you expect system
> > providers to make a special "IDN mode" for the keyboards?
> 
> It seems that no effort is necessary for European locales: 
> the existing
> keyboard interfaces already produce KC-normalized strings.

No, they don't.  I can easily enter a ¨ (a diearesis) or
a ´ (acute accent) on the keyboard, neither of which is
in KC.  Almost(!) all the letters are likely to already be
in KC, but on this keyboard I can generate µ (micro sign)
easily, and µ (micro sign) is not in KC. 

Further, "nameprepping" as it is currently specified maps
all uppercase to lowercase (nearly; it specifies an
extended form of "case *folding*" as per UTR 21, rather than
map to lowercase).  No keyboard should do that.

> (I have no idea why you're worried about soft hyphens. Why 
> would a user
> type a soft hyphen inside a domain name? The keyboard 
> interface doesn't
> have to remove them; they simply won't work.)

Current "nameprep" table says:

00AD; ; SOFT HYPHEN; Map out

and there may well be a soft hyphen (or zero width space) in
a URL to make it linebreak in a not too bad way when a visible
copy printed (or when printing/dispaying the source).  Soft hyphens
can (already for some; more in the future) easily be generated
from the keyboard.

> In Japan, on the other hand, we've heard that users already have to
> change keyboard modes to type domain names. Otherwise they don't even
> get the right dot!

There are two issues here:

1. There is still no special mode that generates "nameprepped"
strings already from the keyboard, not even strings that are
guaranteed to be KC.

2. As I said in my previous e-mail, the entier domain name
(IDN) should be "nameprepped" before parsing on "." do get
name parts.  KC will map fullwidth full stop to full stop,
and "nameprep" may (if so decided) further map ideographic
full stop to full stop; and that before parsing into name parts
(for ACEing or whatever).  It is not reasonable to require
keyboard mode changes (by the user) for each "." typed in an
IDN. It is even more unreasonable to require new keyboard
modes (from systems providers) that generate "nameprepped" strings.


> > If you paste in text,
> 
> Good UTF-8 domain names will be preserved by copy-and-paste. 
> There's no need for further nameprep.

???

If I paste "Åre" is should sure still be "Åre" that is
pasted in.  The "nameprepped" version is different: "åre";
and *pasting* should not do any such change.

> 
> > Can please the keyboard interfaces be left out of this discussion.
> > It's far out of scope for the IDN WG.
> 
> Wrong. The whole point of IDNs is to let users read and write 
> non-ASCII
> characters in domain names. The entire path from keyboard to 
> computer to
> screen---as the charter says, ``the use of such names by humans''---is
> within the scope of the IDN WG.

I did participate in producing the latest Swedish keyboard
standard (soon to be published).  It does not specify any
"nameprepping", or KC mapping, nor are there any plans of ever
introducing such things even in a special keyboard mode.
Not that such modes are prevented, but nobody will expect
any such mode, and "Svensson" (Joe User...) would never
use it even in the unlikely event that some system provider
created such a mode and shipped it with the system.  

The IDN working group must *consider* keyboard input, sure.
However, it should not dream of *changing* the way keyboards
work; such things are definitely out of scope for *this* WG.

		Kind regards
		/kent k