[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] I-D ACTION:draft-ietf-idn-idna-08.txt
Dan Oscarsson brings up an interesting point. The introduction section
of IDNA says:
IDNA does not require changes to any infrastructure. In particular,
IDNA does not require any changes to DNS servers, resolvers, or
protocol elements, because the ASCII name service provided by the
existing DNS is entirely sufficient.
Later, section 6.3 says:
All IDNs served by DNS servers MUST contain only ASCII characters.
Clearly IDNA does "require" a change to the DNS infrastructure, because
it introduces some new requirements, some of which apply to DNS servers.
This apparent contradiction comes from multiple senses of the word
"require". When I wrote that paragraph in the introduction, I meant
"require" in the sense of "depend on", not "demand".
I suggest clarifying this by changing "require" to "depend on" in the
two instances quoted above, and making similar changes in the last
paragraph before 1.1:
- Proposals that were not chosen by the IDN Working Group would
- require that user applications, resolvers, and DNS servers be
- updated in order for a user to use an internationalized domain name.
+ Proposals that were not chosen by the IDN Working Group would depend
+ on user applications, resolvers, and DNS servers being updated in
+ order for a user to use an internationalized domain name.
- Rather than require widespread updating of all components, IDNA
- requires only user applications to be updated;
+ Rather than rely on widespread updating of all components, IDNA
+ depends on updates to user applications only;
Even though IDNA does make some demands on DNS servers, IDNA will still
work with DNS servers that predate IDNA, as long as the new zone files
are created in an IDNA-conformant way (with IDNs stored in their ASCII
forms). That's why we can say that IDNA does not depend on changes to
the infrastructure.
Dan Oscarsson <Dan.Oscarsson@trab.se> wrote:
> Regarding the ASCII names of DNS is sufficient you probably mean
> that it is sufficient for IDNA.
Yes, exactly.
> It is not generally sufficient.
I think the meaning is clear enough, but I wouldn't object if "for IDNA"
were added at the end of that paragraph (again, the first paragraph of
the introduction). I'll leave the decision to Paul and Patrick.
> it is stated that resolvers do not need to be changed.
Indeed, they don't need to be changed.
> But it is probably the resolver libraries that should be changed.
The spec doesn't say they shouldn't be changed, it just says they don't
need to be. Whether they should be changed is something that OS makers
can decide for themselves.
> The definition of what IDN means is complex. It is defined to be a
> domain name on which ToASCII can be applied without failing.
The internals of ToASCII are complex, but once you encapsulate that
concept, the definition of IDN is quite simple, as you have just
illustrated.
> And then in the text IDN is sometimes used to mean a ACE-encoded name
> and sometimes to mean a UCS-encoded name.
When the text says IDN, it means IDN as defined is section 2. If there
is someplace where IDN is used to mean something else, please point it
out, because it would be an error.
An IDN can contain ACE labels and non-ACE ASCII labels and non-ASCII
labels. As long as every label of a domain name passes ToASCII, it's an
IDN.
> The basic definition for domain names should be
IDNA does not define domain names or labels. That's not its job. IDNA
defines internationalized domain names and internationalized labels in
terms of the preexisting concepts of domain names and labels.
> Section 3 says in requirement 1) that several different "dots" are
> allowed between labels when written that way. There are probably more
> "dots" in UCS that could work.
There are.
> I see no reason to forbid them, unless all are forbidden except
> U+002E.
We don't forbid them. The draft nowhere prohibits applications from
accepting additional characters as dots.
But only full stop and ideographic full stop (and their
fullwidth/halfwidth versions) are required to be accepted by all
applications. Ideographic full stop was singled out because it's the
only one that anyone ever asked for. I think that's probably because
it's the only one that belongs to a script large enough to use a modal
input method. CJK users need to switch modes in order to type the ASCII
full stop, whereas for other users the ASCII full stop is probably
within easier reach.
> Only one type of dots simplifies comparing of names.
True, but comparing names is already pretty complex (it involves Unicode
normalization, and sometimes also transcoding to Unicode).
> Only one type of "dot" should be used in each context.
> In a Latin based context only U+002E should be used.
Names can be copied from one context into a different context. I'd
rather both dots work in both contexts. I can't think of any way users
would benefit if Latin contexts refused to accept ideographic full stop.
> 3) If a name is put an a non_ASCII context it must be
> changed to the character set of that context
We distinguish between structured data (domain name slots, which contain
domain names intended to be used by programs) and unstructured data
(text intended primarily to be viewed by humans).
If an IDN is put into an IDN-unaware domain name slot, then it must
be converted to the ASCII form regardless of whether that context
supports non-ASCII characters. That's because even if the application
receiving the name knows how to represent non-ASCII strings, it doesn't
necessarily know how to compare IDNs. We must play it safe and use the
ASCII form unless we know the receiver knows how to compare IDNs.
If an IDN is put into unstructured text, the usual rules for putting any
string into that context apply; IDNA adds no special requirements for
IDNs in this situation.
> 4) When two names are compared they must both be in the same context
> and be normalised.
IDNA defines comparison of names very precisely.
> IDNA mandates (probably) that a domain name used in a link in a html
> document must be ACE encoded. This is not what is natural for people
> to use.
It's up to the HTML folks to update the HTML spec to support IDNs
natively. I think they're already working on this. In the meantime,
authors of HTML documents can either enter the ACE themselves or use an
HTML editor/preprocessor that understands IDNA.
> IDNA must NOT restrict current usage and DNS standards making it
> impossible to update the DNS standard later with a suitable defined
> handling of non-ASCII octet values!
The door is left open for updates to DNS:
If a signaling system which makes negotiation possible between old
and new DNS clients and servers is standardized in the future, the
encoding of the query in the DNS protocol itself can be changed from
ACE to something else, such as UTF-8.
What we don't want to allow is untagged non-ASCII text in DNS messages,
because there is no standard for what it means or how it should be
compared.
> IDNA is not an update of the DNS standard.
IDNA is mainly orthogonal to the DNS standard, but it could be
considered an update in one small respect. RFCs 1034 and 1035 say:
domain name comparisons for all present domain functions are done in
a case-insensitive manner, assuming an ASCII character set, and a
high order zero bit.
we may someday need to add full binary domain names for new
services;
future additions beyond current usage may need to use the full
binary octet capabilities in names
Name servers and resolvers must compare labels in a case-insensitive
manner (i.e., A=a), assuming ASCII with zero parity.
It seems pretty clear that the 8th bit is reserved for future use.
As far as I know, no use for the 8th bit has ever been standardized.
IDNA effectively says that the 8th bit is not to be used for
internationalization, at least not for the time being. It's still
available for other purposes (like non-textual domain names).
Patrik Fältström <paf@cisco.com> wrote:
> IDNA does NOT change the definition of DNS labels.
Right.
> It only says that labels with a specific prefix are said to be "IDN
> labels".
There is no such term as "IDN label". Labels that get modified by
ToUnicode are said to be "ACE labels".
AMC