Friday, August 29, 2014

Deconstructing Floats: frexp() and ldexp() in JavaScript

While working on my SqueakJS VM, it became necessary to deconstruct floating point numbers into their mantissa and exponent parts, and assembling them again. Peeking into the C sources of the regular VM, I saw they use the frexp() and ldexp() functions found in the standard C math library.

Unfortunately, JavaScript does not provide these two functions. But surely there must have been someone who needed these before me, right? Sure enough, a Google search came up with a few implementations. However, an hour later I was convinced none of them actually are fully equivalent to the C functions. They were imprecise, that is, deconstructing a float using frexp() and reconstructing it with ldexp() did not result in the original value. But that is the basic use case: for all float values, if
[mantissa, exponent] = frexp(value)
value = ldexp(mantissa, exponent)
even if the value is subnormal. None of the implementations (even the complex ones) really worked.

I had to implement it myself, and here is my implementation (also as JSFiddle):
function frexp(value) {
    if (value === 0) return [value, 0];
    var data = new DataView(new ArrayBuffer(8));
    data.setFloat64(0, value);
    var bits = (data.getUint32(0) >>> 20) & 0x7FF;
    if (bits === 0) { // denormal
        data.setFloat64(0, value * Math.pow(2, 64));  // exp + 64
        bits = ((data.getUint32(0) >>> 20) & 0x7FF) - 64;
    var exponent = bits - 1022;
    var mantissa = ldexp(value, -exponent);
    return [mantissa, exponent];

function ldexp(mantissa, exponent) {
  var steps = Math.min(3, Math.ceil(Math.abs(exponent) / 1023));
    var result = mantissa;
    for (var i = 0; i < steps; i++)
        result *= Math.pow(2, Math.floor((exponent + i) / steps));
    return result;
My frexp() uses a DataView to extract the exponent bits of the IEEE-754 float representation. If those bits are 0 then it is a subnormal. In that case I normalize it by multiplying with 264, getting the bits again, and subtracting 64. After applying the bias, the exponent is ready, and used to get the mantissa by canceling out the exponent from the original value.

My ldexp() is pretty straight-forward, except it needs to be able to multiply by very large and very small numbers. The smallest positive float is 0.5-1073, and to get its mantissa we need to to multiply with 21073. That is larger then the largest float 21023. By multiplying in steps we can deal with that. Three steps are needed for e.g. ldexp(5e-324, 1023+1074) which otherwise would result in Infinity.

So there you have it. Hope it's useful to someone.

Correction: The code I originally posted here for ldexp() still had a bug, it did not test for too small exponents. I fixed it above, and updated the JSFiddle, too. Also, Nicolas Cellier noticed other rounding and underflow problems, his suggestions for ldexp() are now used above.


Michaeö said...

I trie to Unterstand your Code complety.
Ohne question. Why you so you Substrate 1022 from the exponent
Exponent = Bits -1022

Thanks for your help.

Greetings Michael

Vanessa said...

Hi Michael,
for a moment I thought you had found a bug ... but I think it's correct after all:
The exponent is stored with a bias of 1023, but then there is another implicit -1 because the mantissa is stored to be in the range of 0.5...1, that's why I only subtract 1022 not 1023. This is needed so that mantissa * 2 ^ exponent equals the original value.

Michaeö said...

Hi Bert,

That Sounds correct. Your first answer confused me.
I found your Blog because i'm searching for a convertion of a float 32/64Bit to float16bit. And Vice verse.

Michaeö said...

Me again. Now i unterstand that Code completly.
For the german readers i found a very helpfull article about.

nicolas cellier said...

I'm not surprised, even Microsoft can get it wrong when it's about underflow see for example

nicolas cellier said...

By the way, the code exhibited here expose a double rounding problem in case of underflow.

See Squeak image side fallback implementation - the one from Kernel-nice.900.mcz -