[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] length restrictions on IDN label
On Sun, Oct 13, 2002 at 09:39:42AM -0700, Paul Hoffman / IMC wrote:
> At 3:56 PM +0900 10/13/02, Soobok Lee wrote:
> >[ When i read IDNA draft today, I still can't find
> > the answer from it for the following question about IDN label length.
> > If the following issue is already addressed in the draft, please
> >correct me. ]
>
> It is indeed covered in the draft. The input to IDNA is code points,
> not encoded characters. As you point out, different encodings give
> different lengths for the same string. The only lengths that matter
> are those that are already in STD 13.
>
> > Many internet applications impose/assumes the 63-octets-limit of
> >label lengths.
> > IF this assumption is violated, the label will be regarded as invalid
> > labels, and produce unpredictable errors by some implementations.
>
> Which Internet applications are you speaking of? Which encodings are
> they using? As you pointed out, different encodings give different
> lengths. Thus, no sensible application could assume a 63-octet length
> if it deals with different encodings.
UTF8,EUC-KR etc are all ASCII compatible encoding/charset.
Applications don't need to give up/modify old 63-octets restrictions for
LDH labels even in utf8 or euc-kr, because those encodings produce
the same octets string as pure ASCII encoding does. That is,
in those ASCII compatible encoding of LDH chars, the number of codepoints and
the number of octets are equal, while they are not equal in encoding of
non-LDH chars like Hangul, CJK letters (the octet length is doubled or tripled).
>
> > From implementators' point of view, more precise specificiation is needed
> > about whether IDN label/FQDN has *NEW* length restrictions in
> >various char encodings,
> > if IDNA tries to extend the character repertoires of allowable characters.
>
> It seems likely that most implementers can understand that they must
> continue to follow the same rules that they always have for the
> length of domain names and labels.
The unit of length restriction matters: # of code points or # of octets ?
That should be made clearer. RFC1035 uses "octets", not a character/code point.
I enclose related RFC1035 (STD13) sections here.
Mockapetris [Page 9]
RFC 1035 Domain Implementation and Specification November 1987
(snip)
2.3.4. Size limits
Various objects and parameters in the DNS have size limits. They are
listed below. Some could be easily changed, others are more
fundamental.
labels 63 octets or less
names 255 octets or less
TTL positive values of a signed 32 bit number.
UDP messages 512 octets or less
3. DOMAIN NAME SPACE AND RR DEFINITIONS
3.1. Name space definitions
Domain names in messages are expressed in terms of a sequence of labels.
Each label is represented as a one octet length field followed by that
number of octets. Since every domain name ends with the null label of
the root, a domain name is terminated by a length byte of zero. The
high order two bits of every length octet must be zero, and the
remaining six bits of the length field limit the label to 63 octets or
less.
To simplify implementations, the total length of a domain name (i.e.,
label octets and label length octets) is restricted to 255 octets or
less.
Although labels can contain any 8 bit values in octets that make up a
label, it is strongly recommended that labels follow the preferred
syntax described elsewhere in this memo, which is compatible with
existing host naming conventions. Name servers and resolvers must
compare labels in a case-insensitive manner (i.e., A=a), assuming ASCII
[Soobok Lee]