[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [idn] IDN eamples for testing



"Martin v. Löwis" <martin@v.loewis.de> wrote:

> RFC 3490 says
>
>    In protocols and document formats that define how to handle
>    specification or negotiation of charsets, labels can be encoded
>    in any charset allowed by the protocol or document format.  If a
>    protocol or document format only allows one charset, the labels
>    MUST be given in that charset.
>
> So, I *could* interpret RFC 2616 as defining how to handle
> specification of charsets, as it is "MIME-like".  Therefore, I infer
> that sending
> 
> MIME-Version: 1.0
> Host: =?iso-8859-1?q?www=2Ebrav=E5=2Enu?=
> 
> is conforming to both RFC 3490 and RFC 2616.

As James pointed out, RFC 2616 requires a very strict syntax for the
host, which does not allow for MIME encoded-words.  But even if your
suggested Host: field did not violate RFC 2616, it would still violate
RFC 3490, which says:

    Whenever a domain name is put into an IDN-unaware domain name slot,
    it MUST contain only ASCII characters.

Since the Host: field is IDN-unaware (because it predates IDNA),
you're not allowed to put a domain name there that contains non-ASCII
characters, regardless of whether the field supports non-ASCII charsets.

The passage from RFC 3490 that you quote above, which is about
*encoding*, is orthogonal to the requirement about IDN-unaware slots.
The domain names put into IDN-unaware slots must contain only ASCII
characters, but those ASCII characters can be encoded using any charset
supported by the slot, be it ASCII, UTF-16, UTF-32, EDCDIC, or whatever.

This freedom to use any slot-supported charset becomes more interesting
for IDN-aware slots, which can explicitly allow domain names containing
non-ASCII characters.

AMC