[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] length restrictions on IDN label
- To: idn@ops.ietf.org
- Subject: [idn] length restrictions on IDN label
- From: Soobok Lee <lsb@postel.co.kr>
- Date: Sun, 13 Oct 2002 15:56:56 +0900
- User-agent: Mutt/1.4i
[ When i read IDNA draft today, I still can't find
the answer from it for the following question about IDN label length.
If the following issue is already addressed in the draft, please correct me. ]
I have a punycode label of length 63 octets:
L1: zq--o39AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
L2=ToUnicode(L1) produces: U+AC00 x 56 times ( Hangul "KA" repeated 56 times)
L2:
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00
U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00
But this L2 can be encoded in various unicode/legacy encodings into
various lengths of octets:
UTF8 : 3 x 56 = 168 octets
UCS2 : 2 x 56 = 112 octets
UCS4 : 4 x 56 = 224 octets
KSX1001/EUC-KR : 2 x 56 = 112 octets
These encodings produce labels longer than 63 octets
Moreover, each ACE label of valid (<256 octets) ACE-form FQDN IDN may be
converted into below-63-octets valid UTF8 labels, while the cumulative sum
of the length of each UTF8 label of the FQDN IDN may exceed 256 octets
limits.
Many internet applications impose/assumes the 63-octets-limit of label lengths.
IF this assumption is violated, the label will be regarded as invalid
labels, and produce unpredictable errors by some implementations.
From implementators' point of view, more precise specificiation is needed
about whether IDN label/FQDN has *NEW* length restrictions in various char encodings,
if IDNA tries to extend the character repertoires of allowable characters.
The above case is very rare, but in any cases, the implementors have practical
security-related need to impose some limits on the iDN lables in non-ACE encodings.
(for example, to avoid buffer overflow errors due to expanded ToUnicode labels)
Cheers,
Soobok Lee