[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [idn] UNIX moving to UTF-8
Just to set the record straight,
- Java uses a variant of UTF-8 in class files.
- That variant is not conformant to the UTF-8 spec.
- It is only for internal use within the "compiled" class files and in
serialization. (And while it is theoretically possible for a non-Java
program to pick apart serialized Java classes, it would be extremely rare.)
- UTF-8 is supported by Java for interchange, of course, but the API all
uses UTF-16 interfaces.
Mark
----- Original Message -----
From: "Steve Hanna" <steve.hanna@sun.com>
To: "Mark Davis" <mark@macchiato.com>
Cc: "Eric A. Hall" <ehall@ehsco.com>; "Keith Moore" <moore@cs.utk.edu>; "D.
J. Bernstein" <djb@cr.yp.to>; <idn@ops.ietf.org>
Sent: Friday, January 26, 2001 07:18
Subject: Re: [idn] UNIX moving to UTF-8
> Java uses 16-bit Unicode internally and UTF-8 for most external
> exchanges (such as class files). This has been true since JDK 1.0.
> Support for adding other encodings has been present since JDK 1.1. Not
> that it makes much difference, but I wanted to set the record straight.
>
> -Steve
>
> Mark Davis wrote:
> >
> > People are clearly moving to Unicode. Exactly which UTF they choose (8,
16,
> > 32) is not as important, since they all can be converted to each other
very
> > efficiently and without loss.
> >
> > It is however an overstatement to say that all environments are headed
> > towards UTF-8. Certainly Windows, JavaScript, and Java use UTF-16; these
are
> > not unimportant. I'm not sure about the Mac, but I believe they are also
> > UTF-16.
> >
> > Mark
> >
> > ----- Original Message -----
> > From: "Eric A. Hall" <ehall@ehsco.com>
> > To: "Keith Moore" <moore@cs.utk.edu>
> > Cc: "D. J. Bernstein" <djb@cr.yp.to>; <idn@ops.ietf.org>
> > Sent: Thursday, January 25, 2001 19:53
> > Subject: Re: [idn] UNIX moving to UTF-8
> >
> > >
> > > Keith Moore wrote:
> > > >
> > > > > For extensive evidence that it's happening:
> > > >
> > > > that's not anything of the sort. you cited evidence that a few
> > > > energetic folks are implementing and modifying existing tools to
> > > > support UTF-8.
> > > >
> > > > that's a far cry from widespread adoption of UTF-8 by real users.
> > >
> > > I think it's pretty clear that UTF-8 is the direction that most
> > > environments are heading to, if they aren't there already.
> > >
> > > - Solaris 7 and higher use UTF-8 for the local (Unicode) interfaces
> > > - Windows 2000 and CE use Unicode and UTF-8
> > > - MacOS 9 (or X?) and higher same
> > > - etc... don't discount the Linux i8n efforts either, it is an
extreme
> > > likelihood at the very least if not a statement of direction
> > >
> > > End-user apps that support UTF-8 include Office 2000 and of course
most of
> > > the modern web/mail front-ends
> > >
> > > --
> > > Eric A. Hall
http://www.ehsco.com/
> > > Internet Core Protocols
http://www.oreilly.com/catalog/coreprot/
> > >
>