[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] How gethostbyname() handles 8-bit characters
- To: idn@ops.ietf.org
- Subject: Re: [idn] How gethostbyname() handles 8-bit characters
- From: "Adam M. Costello" <amc@cs.berkeley.edu>
- Date: Fri, 1 Jun 2001 02:18:21 +0000
- Delivery-date: Thu, 31 May 2001 19:21:10 -0700
- Envelope-to: idn-data@psg.com
- User-Agent: Mutt/1.3.17i
"D. J. Bernstein" <djb@cr.yp.to> wrote:
> The current gethostbyname() release (if you put no-check-names into
> /etc/resolv.conf) simply copies the 8-bit bytes to the DNS packet. It
> doesn't care about the interpretation of the bytes as characters.
What about in the normal case? Is gethostbyname() copying the bytes
because the interface takes bytes, or because the interface takes ASCII
and DNS also takes ASCII? I've tried to discover the answer, to no
avail; I think the semantics are ambiguous.
Putting no-check-names into /etc/resolv.conf is obviously a workaround,
a way to tell the library "get out of my way, I'm doing something
experimental, and I'll take responsibility for it", so I don't think we
can infer anything about the intended semantics from the behavior under
those circumstances.
It would be instructive to see how gethostbyname() behaves in a
locale for which the LDH characters do not occupy their ASCII code
positions. But I know of no such locale on any system that implements
gethostbyname().
My gut feeling is still that if gethostbyname() is going to start
accepting 8-bit names, it should interpret them according to the charset
of the current locale (unless "no-check-names" is in effect, which
disables any interpretation and lets the caller specify the raw bytes).
If application programmers want it to assume UTF-8, they merely need to
set the locale to one that uses UTF-8. And if you're predictions are
correct, that will be the default anyway.
AMC