[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: floating point




Jon,

  Fixed point can often suffice, e.g. 16 integral and 16 fractional bits
allow you to represent numbers as large as 65535 and as small as
1/65536 in a 32-bit number.  Similarly, 32-bit integral and fractional
parts can represent numbers as large as 4294967295 and as small as
1/4294967296 (around .0000000002) in a 64-bit (or two 32-bit) number.

  Mark's suggestion of M * B ^ E is a little more complicated than it
needs to be; in particular, e.g. IEEE floating point fixes B at 2
(which is coincidentally easy for computers to calculate...).
IEEE floating point has lots of details, but ignoring things like
Infinity, NaN and extremely small numbers is fairly straightforward:
e.g. a 32-bit IEEE FP number might have 1 sign bit, 8 exponent bits,
and 23 mantissa bits.  The resulting number is
(sign) * 1.(mantissa) * 2^(exponent).

  Bill