Hacker News new | past | comments | ask | show | jobs | submit login
Be careful with JS numbers (greweb.fr)
124 points by gren on Jan 13, 2013 | hide | past | favorite | 58 comments



Essential rule: ALWAYS treat large ID numbers in JSON as strings. Produce them as strings on the server, so the browser treats them as strings during the JSON decoding process.

(Fortunately, if they're ID's where they need to stay exact, then you probably don't need to do any math on them! Although once a coworker of mine had to write a bitwise XOR function in JavaScript that operated on large ID numbers represented as strings. That was fun...)


Been burned by this myself, because while the person that created the ID knew it was a string, the person using it down the line thought it looked like a number and treated it like one.

My advice is to store those ID's with a leading "$" character or something similar. Removes the temptation to treat them like numbers.


Or convert them to hex before sending to client. And you can convert them back to decimal on the server side.


"Eich: No one expects the rounding errors you get - powers of five aren't representable well. They round poorly in base-two. So dollars and cents, sums and differences, will get you strange long zeros with a nine at the end in JavaScript. There was a blog about this that blamed Safari and Mac for doing math wrong and it's IEEE double -- it's in everything, Java and C"

-Peter Seibel Interview with Brendan Eich "who under intense time pressure, created JavaScript", Coders at Work, p.136


Some languages have arbitrary precision types or use BCD. Next time you see someone using floating point types for money, please stop them. There's too many financial libraries out there like OFX4J which get this wrong (or at least did the last time I looked).


Python has Decimal for that. Unfortunately, the people who wrote the platform I use - which deals with money - decided they're "too slow".


> Unfortunately, the people who wrote the platform I use - which deals with money - decided they're "too slow".

I've had the same problem (that platform wouldn't ha. It's… annoying (also bullshit: in the worst case — CPython <= 3.2 — basic arithmetic operations will take ~20µs on my machine aside from divisions which take a bit more and if that shows up in profiling use cdecimal, CPython 3.3 [integrated cdecimal] or pypy)


If you don't have decimal types, then use fixed point. Or in simpler terms store cents (as integers) not dollars (as floats).

Then add a decimal point when you display.

If you are dealing with stocks or interest I suggest storing tenths, or hundreds of a cent to avoid rounding errors.


Shitty when your language doesn't have integers then, eh?


Javascript floats (all 64 bit floats actually) can store exact integers to 53 bits.

Which is enough for even the united states national budget, let alone more prosaic uses.


> Next time you see someone using floating point types for money

... apply that logic to their paycheck and let them deal with being underpaid by a bizarre amount because of rounding.

Or apply it to their investment accounts so the errors accumulate for a few decades.

Nothing like an object lesson to remedy bad behavior.


A lot of these issues are equally applicable to integer overflow (termination of loops and the "real web application disaster"). As a result, this is a deeper issue that extends beyond JavaScript into most languages—in general, programmers have to be aware that numbers in the runtime model of a language may not always work like you would expect theoretical, mathematical numbers to work.

Of course, floating point imprecision results in more surprising behaviors than integer overflow on the whole, but the danger is still there no matter what language you use (not counting the languages with full numeric towers, like Scheme).


This is not the only area in JS where it pays to be careful with numbers.

For example, the parseInt function mentioned in the article actually does magic base determination. Your string will be parsed into a base-10 number, unless it begins with '0x' in which case base-16 is used, or if it begins with a leading '0' then it is treated as base-8.

This last case has stung me on several occasions when parsing user input into numbers: User puts a leading-zero, and you magically end up with octal conversions. (I believe this whole octal thing has been deprecated in recent JS implementations).

In any case it's sensible to specify the 'radix' whenever using the parseInt function, as in parseInt(numberString, 10);


If the user deliberately puts in 0x it's probably because the number is more convenient in hex. Definitely get rid of octal though. The octal designation should have had a letter in it right from the start.


Yet another reason to use JSLint (it makes the second parameter of parseInt not optional)


you can also just use Number() to be safe


How is it different?


Think of parseInt() as what it says- parsing out the number value. Number() can be considered as a cast.

    parseInt('12', 10);  => 12
    Number('12');        => 12
    parseInt('12x', 10); => 12
    Number('12x');       => NaN


This bit a lot of twitter integrations hard, when tweet id's exceeded the non-lossy integer range of javascript floats.

All of a sudden, tweet ids got rounded off to point to completely different entries!

https://dev.twitter.com/docs/twitter-ids-json-and-snowflake

Other JSON parsers may also be affected.


Woo I didn't know this! When I mention twitter ids it was just an example.

So it can happen to anyone and it's a real issue for any JSON API.


For quite some time there were API calls in Facebook that would return the ID's as numbers, not strings. It wasn't specific to the entire API though, only some calls would show the behaviour. As you can imagine, PHP did not like it. It was rather irritating...


This is one reason I stick to statically typed languages. I do a lot of work with cash values and without an explicit decimal type, the shit hits the fan when you hit a float issue.

I get a lot of flak on here for that opinion which is odd.


Dynamically typed languages are not the same as weak typed languages. Dynamically-typed languages, like Ruby, still have data types like String, Float, etc. I've seen people use floats in statically typed languages before and end up with rounding errors simply because they chose Float instead of Decimal.


I am fully aware of the differences between weak typing and dynamic languages.

The issue I have with dynamic languages with types is that the type is usually inferred based on the operation. It's very easy to make mistakes and for those mistakes not to be apparent until the shit hits the fan.

Fully statically typed languages put these concerns right in your face and make you think about them.


> Fully statically typed languages put these concerns right in your face and make you think about them.

That depends slightly - if your compiler uses Damas-Milner type inference, it's possible that the compiler will assign the most general type signature possible to your function, which will not prevent this type of error, as operations such as addition work on either integral or floating point numbers.

What you're really after is a strongly typed language that also has explicit type declarations (or a programmer that's reliable enough always to include them even when they are optional).


One counter example is Java and Python. In Java this is allowed and the result requires knowledge of detailed rules:

    int i = 10;
    int j = 20;
    String s = "Test " + i + j;
In Python the equivalent is an error.

The 'in your face' difference between a statically typed language and dynamic one is the time difference between the compilation error and an execution error. One tool for working effectively with dynamically typed languages is to keep that time difference short with tools like unit tests.


> The issue I have with dynamic languages with types is that the type is usually inferred based on the operation.

That's about as wrong as can be.

In fact, it's the exact opposite (type inference can only be a property of statically typed languages)


It's not really about static typing or not, it's really about JS having some really strange and counter-intuitive behaviours in a lot of cases (numbers, default method scope, truthy values, ...). Part of it is probably due to the fact that the syntax looks so much like C and Java that you'd except it to behave the same. Part of it seem to be that it's hard to fix things without breaking existing codebases.


Every language has it's own quirks, some even weirder than JS. The floating point "errors" are common to a hundred programming languages, that's why BigNum classes/extensions exist.


The problem in this case is not floating point approximations. It is that some functionalities (parseFloat vs parseInt, ++, 45.0 being printed 45 in most browsers) makes the user believe that JS has an integer type.


Square peg in a round hole, you wouldn't try to understand Haskell with a Ruby background either.

Since it has no types per se, it doesn't make much sense to think of an 'integer type'. Javascript only has a number primitive, which is a IEEE 754 float.


This is about JS having a non existent numerical stack. A statically typed language which only had floats would have the same issues.


Dynamically typed languages still have types. Such as BigNum.


Yes and they are cumbersome.


Python has a decimal module, so this is not exclusive to statically typed languages.

http://docs.python.org/2/library/decimal.html


Yes but conversion and over the wire, it becomes a pain in the arse as a language extension.


Dynamically typed vs Weakly typed languages aside, you still only have one choice in the browser. Sure, you can write in something which has all the greatest features you love, but it will still be cross compiled to JavaScript, with all of its pitfalls right?


Simple solution: I wouldn't write anything that requires numeric accuracy to run in the browser.


> Sure, you can write in something which has all the greatest features you love, but it will still be cross compiled to JavaScript, with all of its pitfalls right?

By this logic, no bignum library could exist because machine code doesn't have bignums, and everything gets compiled to machine code and/or run by an interpreter that's been compiled to machine code.


For some reason, people here think Javascript is the best shit ever.


I got hit by JavaScript rounding on a project once. Funny thing is, IE8 was okay with it and Chrome was the one that caused issues.

Turns out that 1000000 * 8.2 = 8199999.999999999

Sometimes we learn the hard way. The bug was getting written into an XML document and pushed to an embedded device over HTTP which was expecting an int and caused the device to crash. We fixed both bugs.


    // Scheme
    (print (* 1000000 8.2))
    > 8199999.999999999

    // Python
    >>> 1000000 * 8.2
    8199999.999999999

    //PHP
    > 1000000 * 8.2
    > 8199999.999999999


    # Ruby 1.8.7
    1.8.7 :022 > 1000000 * 8.2
     => 8200000.0 

    # Ruby 1.9.2
    1.9.2p320 :001 > 1000000 * 8.2
     => 8199999.999999999


I can't reproduce the PHP example:

  $ uname -p
  x86_64
  $ php -v
  PHP 5.4.6-1ubuntu1 (cli) (built: Aug 22 2012 21:13:52)
  Copyright (c) 1997-2012 The PHP Group
  Zend Engine v2.4.0, Copyright (c) 1998-2012 Zend Technologies
  $ php -r 'var_dump(1000000 * 8.2);'
  float(8200000)
  $


My fault, I think I copied from the wrong REPL.


C/C++:

   printf("%lf\n", 1000000 * 8.2);
   8200000.000000
Here, printf() is doing the necessary rounding to hide the fact that 8.2 isn't exactly representable. It's interesting that other languages don't do something similar when outputting a float to text.


They do, you just have to request it:

    //Python
    >>> repr(1000000 * 8.2)
    '8199999.999999999'
    >>> "%lf" % (1000000 * 8.2)
    '8200000.000000'


1. IDs are strings, not numbers. This is the real problem twitter had. The ID size was only vaguely related to the number of tweets.

2. An event counter is not going to reach 2^50.


    I have determined by experience that 9007199254740995
    (which is 2^53+3) is the smallest not representable
    integer in Javascript.
Here's something to drop in your browser console as an amusing illustration of this

     9007199254740994 === 
    (9007199254740995 - 1)
and

     9007199254740995 === 
    (9007199254740995 - 1)


He’dn’t’ve had to “determine” it, had he known the basics of IEEE-754.


Hehe fun one! and also:

    9007199254740995 - 1 - 1 - 1 - 1


Interestingly, GWT gets around this problem entirely by having the 'long' datatype be created as two 32 bit numbers. Slows down calculation a little bit, but you get the correct values always.

https://developers.google.com/web-toolkit/doc/latest/DevGuid...


Emscripten does the same thing for uint64_t etc


For people using npm, check out the 'int' and 'num' modules.


Minor nit: isn't smallest integer not representable in JS -1000000000000000000000000000000000000 or something?


Is this in the JS standard, or is this in just some implementations?


http://interglacial.com/javascript_spec/a-8.html

The Number type has exactly 18437736874454810627 (that is, 264-253 +3) values, representing the double-precision 64-bit format IEEE 754 values as specified in the IEEE Standard for Binary Floating-Point Arithmetic, except that the 9007199254740990 (that is, 253-2) distinct "Not-a-Number" values of the IEEE Standard are represented in ECMAScript as a single special NaN value.


> Javascript doesn’t have integer type but lets you think it has

Actually, JS has integer typed arrays.

    new Uint8Array([1,100,300])
    [1, 100, 44]

    new Uint16Array([-1,2,300])
    [65535, 2, 300]

    new Uint32Array([-1,2,300])
    [4294967295, 2, 300]
Here's the spec:

https://www.khronos.org/registry/typedarray/specs/latest/#7




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: