[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] Comments on IDNA/stringprep/nameprep
Ken, you are coming in very late in this process. A lot of this was
debated back and forth early in the process, both on mail and in
personal contact. I suggest for a start that you review all of the
archives so that you don't simply retread issues.
Mark
—————
Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]
http://www.macchiato.com
----- Original Message -----
From: "Kent Karlsson" <kentk@md.chalmers.se>
To: <idn@ops.ietf.org>
Sent: Thursday, February 07, 2002 15:25
Subject: [idn] Comments on IDNA/stringprep/nameprep
Comments on IDNA/stringprep/nameprep
1. stringprep and nameprep should be rejoined to a
hostnameprep. They are only about host name preparation,
not any other name preparation. Similar preparations
may still take advantage of the hostnameprep document,
by declaring "deltas", small changes that may be needed
for other (Internet, DNS) names. That would likely
minimise the size of the "reuse" documents.
2. hostnameprep should be applied to the *entire*
hostname; i.e. the entire name should be 'mapped'
in the same way *before* it is parsed into parts.
3. Various FULL STOPs should be mapped to FULL STOP,
which must be allowed. Some of this is accomplished via
NFKC, but some mappings need be added specially for
hostnameprep, e.g. IDEOGRAPHIC FULL STOP. (Note that
parsing into parts should come after the mapping and
prohibition steps.)
4. Various Pd (punctuation dash) should be mapped to
HYPHEN-MINUS by hostnameprep. Future keyboards may
generate HYPHEN rather HYPHEN-MINUS (except perhaps
in "programming language mode", which few will use).
At least, hostnameprep should not prevent such a development.
5. Symbols/punctuation/dingbats (except the hyphen-like
dashes) should not be allowed by hostnameprep; and
all of that prohibition should be in hostnameprep,
not some of them handled differently elsewhere.
Punctuatuation in particular, in contexts where hostnames
are embedded, may in future syntaxes use non-ASCII
punctuation adjacent to the hostname. At the very least
such a development should not be prevented by hostnameprep.
Symbols are at present excluded, and should remain so also
for non-ASCII symbols, for the same reason as punctuation
should be excluded.
6. Hangul syllables (with conjoining characters, not
non-conjoining compatiblity characters) that represent
the same syllable must be mapped to the same representation.
Due to unfortunate historic reasons, this does no longer
happen automatically with NFKC (though for drafts for
NFKC it did). Mappings should be added so that "syllabically"
equivalent Hangul conjoning characters are mapped to a common
representation. Hangul compatibility letters should be
prohibited though. Correctly mapping those is more complicated
than can be expressed in the (current form of) (host)nameprep
mappings. Hangul compatibility letters should instead be
prohibited. (Mapping table for Jamos, and prohibition table for
Hangul compatibility characters, are available upon request.)
Future keyboard, e.g., input may generate only single letter
Jamos, rather than any "cluster letter" Jamos or precomposed
Hangul syllable characters. At the very least, hostnameprep
should not prevent such a development.
7. No document associated with hostnameprep should make any
further restrictions on domain/host names than hostnameprep
itself. (In addition, duplicating some of the restrictions
elsewhere is confusing and should not be done.)
8. Note: The SC/TC issue cannot be solved at a near-impossible-
to-change (once deployed) technical level, but should instead
be solved at a policy level (which may employ software with
relatively easy to change mappings).
9. User interfaces that encounter mixed script hostname *parts*
should be recommended to "flag" them (ballon warning, color
differentiate, make blinking, bounce automatic registratations, ...).
/Kent Karlsson