[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[idn] How gethostbyname() handles 8-bit characters
- To: idn@ops.ietf.org
- Subject: [idn] How gethostbyname() handles 8-bit characters
- From: "D. J. Bernstein" <djb@cr.yp.to>
- Date: 29 May 2001 18:59:11 -0000
- Delivery-date: Tue, 29 May 2001 12:13:22 -0700
- Envelope-to: idn-data@psg.com
- Mail-Followup-To: idn@ops.ietf.org
Adam M. Costello writes:
> If gethostbyname() receives 8-bit text, should it assume that it's UTF-8,
The current gethostbyname() release (if you put no-check-names into
/etc/resolv.conf) simply copies the 8-bit bytes to the DNS packet. It
doesn't care about the interpretation of the bytes as characters.
If we make the obvious specification of 8-bit bytes in DNS as UTF-8, the
current gethostbyname() semantics will be consistent with two coherent
programming models:
(1) The local character encoding might not be UTF-8. Higher-level
routines are responsible for converting from the local character
encoding to UTF-8 before calling gethostbyname(). It would be
convenient to have a central routine that does this.
(2) The local character encoding is UTF-8. No conversion is required
in this case. This is simpler than #1, and it's wildly popular
among programmers---dealing with multiple character encodings is
a royal pain. UNIX is rapidly moving from #1 to #2.
There are other programming models that aren't consistent with the
current gethostbyname() semantics. This might be a sufficient reason to
berak compatibility if those models were as nice as #2, but they aren't.
---Dan