[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[idn] How gethostbyname() handles 8-bit characters



Adam M. Costello writes:
> If gethostbyname() receives 8-bit text, should it assume that it's UTF-8,

The current gethostbyname() release (if you put no-check-names into
/etc/resolv.conf) simply copies the 8-bit bytes to the DNS packet. It
doesn't care about the interpretation of the bytes as characters.

If we make the obvious specification of 8-bit bytes in DNS as UTF-8, the
current gethostbyname() semantics will be consistent with two coherent
programming models:

   (1) The local character encoding might not be UTF-8. Higher-level
       routines are responsible for converting from the local character
       encoding to UTF-8 before calling gethostbyname(). It would be
       convenient to have a central routine that does this.

   (2) The local character encoding is UTF-8. No conversion is required
       in this case. This is simpler than #1, and it's wildly popular
       among programmers---dealing with multiple character encodings is
       a royal pain. UNIX is rapidly moving from #1 to #2.

There are other programming models that aren't consistent with the
current gethostbyname() semantics. This might be a sufficient reason to
berak compatibility if those models were as nice as #2, but they aren't.

---Dan