Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
An error message if you put more than 2^24 items in a JS Map object (searchvoidstar.tumblr.com)
133 points by norescue on Aug 27, 2021 | hide | past | favorite | 89 comments


Definitely specific to the V8 JavaScript engine. I was able to put 40 million keys in a map in Firefox/SpiderMonkey. It was pretty fast and only took a few seconds.

The pitfall was that because I did it in the Developer Tools console, the CPU and memory usage would shoot through the roof from the autocompleter if I wrote any code that would cause a string representation of the map to be printed as a helpful hint.


A year or two ago, I helped the Cloud TPU team fix an obscure error when creating more than 200 TPUs.

Apparently no one had ever tried to create more than 200 TPUs before (https://battle.shawwn.com/swarm-training-v01a.pdf). I was pretty proud of that.

(As a bonus, I actually wanted to use all 200 TPUs. It wasn't merely a stress test. Big data indeed!)


Here I was having fun with my p3.16xlarge at work today, that's awesome!


If you like playing with ridiculous amounts of hardware, definitely try out TPU VMs with TFRC: https://blog.gpt4.org/jaxtpu

You can create up to 5 TPU v3-8's in europe-west4-a. Each core is roughly equivalent to a V100, so it's sort of like having 5 p3.16xlarge's. (The TPU has 96 CPUs and 330GB of system RAM, whereas a p3.16xlarge has 64 CPUs and 488GB of system RAM according to https://aws.amazon.com/ec2/instance-types/p3/).

The best part IMO is that you don't have to figure out how to link the cores together. JAX does that for you automatically. You get 100GigE from TPU to TPU too, which you can also leverage: https://twitter.com/theshawwn/status/1406171487988498433


The program is for researchers, not for play...


Are you so sure there's a difference?


One is a strict subset of the other


Really? A “strict subset”, come on…


Wow, that docdroid site is toxic waste. Don't click on anything but the back button if you follow that link.


Sorry. As an AI outsider, I wasn't sure how to publish to arxiv, so I used the first PDF hosting site I could find. I've mirrored it to https://battle.shawwn.com/swarm-training-v01a.pdf and edited my original comment with that instead. Thanks for pointing that out.


While lol-worthy in some ways (especially that error message - wow!): 16 million is a very "human-scale" number. I'm (somewhat) surprised that's the limit, and that I haven't heard about it before.


I've run up against this building a gigantic LZ dictionary for full text search[0] and I got around it[1] by using npm module called big associative that provides dropin big map and big set primitives.

but even in that code I think there was some error or inefficiency that I had to customize to make it usable. But in the end I was able to get around the 16 million limit on entries and good efficiency.

[0]: https://github.com/i5ik/futzz

[1]: https://github.com/i5ik/futzz/blob/master/src/futzz.js#L3


author of blogpost here. interestingly the actual thing (not mentioned in blog post) that I was developing was for text searching too...resulting library I made is here https://github.com/GMOD/ixixx-js/


Cool, man. I just forked it! Happy to take a look inside...figure out how it works; use the best stuff for mine...kidding. But I ran into issues with using LZ, but also was using a trie for the index structure (or on way to).


Do you have any details on your customizations? That’s interesting.


Let me take a look and double check that. It might have been a prior library, or maybe I did modify big-associative. I'll investigate. Be back here later

edit: So I looked and I think I didn't modify big-associative...could have been a library I used before I found that one.

Here's big-associative's repo: https://github.com/samchon/big-associative


Is it just me or does this site prevent the browser back button from working? I hate that!


Same here. Sometimes I wish the web was simpler and the history API was not a thing. Or barring that, that the browser would pop up a permissions prompt when a page attempted to access the history API.


Speaking of, there should probably also be a permission prompt when the site wants to use a 1GB hash map.


The back button in Chromium type browsers should ignore history entries added before a page receives a user gesture [0]. That did not seem to happen here!

You can try it yourself by loading a YouTube video via pasting into the URL box and letting it autoplay a few videos. The back button should take you to the new tab page. If you click anywhere on the page then those autoplay entries will not be skipped by the back button.

[0] https://bugs.chromium.org/p/chromium/issues/detail?id=907167


It's tumblr-powered, so ¯\_(ツ)_/¯


Several years ago, I wanted to make a huge lookup table in Node.js during a research project, but the program stopped making progress after a while. This incident was detailed in my Quora answer: https://www.quora.com/What-would-you-need-64GB-of-RAM-for/an...

I incorrectly concluded that Node.js has a limitation on how much memory it could use. After reading this article, I now understand that the real reason is that I'm setting too many properties on an Object, exceeding V8's limitation.


TIL You can write javascript numbers with underscore separators https://v8.dev/features/numeric-separators



And C# since 7.0

https://github.com/dotnet/csharplang/blob/main/proposals/csh...

First time I saw it was in C#, I love it personally


It is lifted from Ada. A language ahead of its time.



And Java since Java7 which was released in 2011.


Note that different languages have slightly different syntax rules.

0_0: Legal in Java, legal in Python, illegal in JavaScript.

0__0: Legal in Java, illegal in Python, illegal in JavaScript.

0x_0: Illegal in Java, legal in Python, illegal in JavaScript.

0_, 0_.0, 0._0: Illegal in all three.

It wouldn't surprise me if C# and C++(14) have differing edge cases too.


0_0, 0__0, 0x_0, 0_ and 0_.0 are legal in Rust. 0._0 is the only one of those that’s illegal (“`{integer}` is a primitive type and therefore doesn't have fields”).


To clarify the special case in Rust ( 0._0 ), it's due to te definition of identifiers: https://doc.bccnsoft.com/docs/rust-1.36.0-docs-html/referenc.... _0 is an identifier, as it's an underscore followed by one alphanumeric character. Thus, 0._0 means "the field _0 of 0". Since it's an identifier starting with _, you won't get any warnings if you don't use it. Small example: https://play.rust-lang.org/?version=stable&mode=debug&editio...


O_o


Interestin, i saw it first in Kotlin.


Interestingly, why do they specify base 10 integers as: nonzerodigit (["_"] digit)* | "0" (["_"] "0")*

Isn't digit (["_"] "0")* simpler?


012345 is the old way of doing octal notation in Python, which has its roots in C. I'm guessing they want to prevent ambiguity by making it so that it's an error for "012345" to be a decimal literal.


I always thought this was one of the worst notation conventions in modern languages.


I've felt for 35 years they could do everyone a favor and break everything that dependens on a leading zero to represent octal.


Not only that, octal notation is actually valid in non strict-mode versions of JavaScript (which is still the default unless you opt-in with "use strict"). So it would be backwards incompatible outside of strict mode, and very confusing if changing from non-strict to strict-mode changed the semantics without giving an error/warning.


Yes, it gives an error -- if you want decimals, you can't use leading 0's:

    Python 3.6.3 (v3.6.3:2c5fed8, Oct  3 2017, 18:11:49) [MSC v.1900 64 bit (AMD64)]
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

    In [1]: 01
      File "<ipython-input-1-3351d58b1d3b>", line 1
        01
         ^
    SyntaxError: invalid token


    In [2]: 0o1
    Out[2]: 1

    In [3]: 0o10
    Out[3]: 8


And Haskell with the NumericUnderscores GHC extension.

https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/nume...


Since 2019 in the leading browsers (Firefox and Chromium); 2020 if you use something else like Edge or Samsung Internet; 2021 for Chrome or Firefox on mobile... and of course you can forget about MSIE or Opera Mobile support as usual.

https://caniuse.com/mdn-javascript_grammar_numeric_separator...


You can also write JavaScript numbers with E notation (i.e. multiplied by 10 to the power of x), which is basically what I do with all numbers above 1,000 or below 0.001 now. It has the rare benefit, for big / small numbers, of being both more readable and more concise.

Example:

1e3 === 1000;

1e-3 === 0.001;

45.2e6 === 45200000;


D has had this for quite some time, C++ now too (with quote characters instead of underscores)


And also in Ruby.


I always thought Perl was the originator of this style, which would explain why it's been in Ruby forever.

I wonder where it started if it existed before Perl.


Ada and ALGOL dialects.


By default, Node has a relatively low limit on the amount of memory it will use (I believe 1GB). As you get close to this, it thrashes as it tries to garbage collect more and more frequently.

You can increase the limit by running `node --max-old-space-size=8192`. But that doesn't seem to fix this specific issue, either with the Map or the Object!


Ran into something similar years ago on the Blackberry where if you created more than 2^15 objects, performance could drop by something like 90%. There was no error but the performance hit was bad enough to make the app unusable.


16,777,216 keys ought to be enough for anyone.


Hmm, one key for even every possible Unicode code point (including surrogates, noncharacters and reserved ones, all the way from U+0000 to U+10FFFF) makes 1,114,112 keys. That’d be a big keyboard with rather poor random access characteristics. Allow 20×20mm for each key (seems a fairly typical sort of size) and that’s almost 446m². I dunno what you’ll use your 16,777,216 keys for (I agree, it ought to be enough for anyone), but as a full-sized keyboard it’s going to have a surface area of at least two thirds of a hectare. You could build a giant hamster ball a bit over 46m in diameter and lay the keys out on the inside, travelling the world as you spin the ball in order to find the right key. (The Unicode 13 hamster wheel keyboard might be more reasonable, its 143,859 keys requiring a diameter of less than five metres. Perhaps more manageable, but some keys might be harder to access as you roll through cities since the ball is no longer big enough to almost ignore many houses.)


I wonder how many octaves and what frequency you’d get to on a piano with this many keys?

This is 1,398,101.333333333 octaves so I guess even starting at 1hz the top frequency is 2^1398101 hz.

Gamma rays are about 10^22 so the frequency is probably so high as to blow up the universe or something.


I’d suggest smaller intervals, like a cent instead of a semitone, but the numbers are so absurdly large that dividing it by a hundred doesn’t really help anything. You’d need a good deal more than 64-bit floating point numbers for any pertinent calculations, 10³⁰⁸ is pretty tiny; and the theoretical string tensions involved would be quite something.

Perhaps an organ is the right answer instead, with one key per pipe rather than the usual multiplexing. The Wanamaker Organ makes a good start with 28,750 pipes across 484 ranks (though it multiplexes them across a paltry six manuals plus pedalboard), but you’re going to need to come up with a lot more sounds, and the speed of sound is low enough that it’s not going to work very well done acoustically. (Hmm… y’know, there’s actual scope for an art piece here, an organ but producing light instead of sound, like lidar is to sonar. I think you’d struggle to produce anything the human eye would appreciate, but it’s a fun concept.) Or perhaps it should be splitting each manual into one for each combination of sounds it controls. That might well go over the 2²⁴, I’m not thinking about the numbers too exactly.


A yes the Gamma Ray, one of my favourite phenomena. Giving off as much energy as our dear sun radiates in her lifetime.

Well, maybe we use 10e-absurdlyLargeNumber as our base note then, just to stay safe. Hm, how many years will we have to wait, till the lowest notes went through one cycle?


The most elaborate joke i read on HN so far, well done! Thank you and the others who rolled with it.


It was actually a case of me going and doing something else and then returning to this tab and reading first of all the comment that I responded to; yet having forgotten the context, I immediately thought of keyboard keys; and the concept amused me so I ran with it and it ended up somewhere a bit different from where I expected!


What a lucky accident.


Thank you. I’ll be here all week.



It's easy to imagine some app to reach that amount of keys pretty easily. Like games, scientific simulations or GIS apps.


What if I'm trying to model every atom in a multi-cellular being?


That question doesn’t show understanding of the relative sizes of atoms and cells.

https://www.thoughtco.com/how-many-atoms-in-human-cell-60388...:

“According to an estimate made by engineers at Washington University, there are around 10¹⁴ atoms in a typical human cell”

⇒ it isn’t necessary to use “multi-cellular” there by a wide, wide margin.


It would only be 800TB to store every cell as a long?

Samsung is making 512 GB ram sticks.


I think you’d have a hard time modeling a single celled organism


Hell, we can’t even simulate a single more complex protein.


The difference between multi cell and single cell is not the hard part.


Don’t use JS


Indeed - you'll want to upgrade to Basic.


PHP is webscale.


Mongodb! Great video by the way Link: http://www.mongodb-is-web-scale.com/


Firefox warns me of a potential security risk on that site.

(edit) on proceeding it redirects to a site that won't be found.


I feel like if you are working with genome levels of data maybe you shouldn't be using JS to do the work. JS is a great tool in some cases but not really a scientific one. I guess you use whatever you know.


(Not being sarcastic at all) I remember the times when bringing JavaScript anywhere near data analysis would be considered a cute joke.


Use what you're comfortable with if it'll do the job. In this case turns out it won't, but it was perfectly plausible it might've.


The problem, of course, is when you've worked on the assumption that it will do the job (hey, it was perfectly plausible!) and after a six months' work you run into a showstopper like this.


[flagged]


Please don't do this here.


The actual mistake here was trying to do bioinformatics in javascript.


Why? This is Hacker News. Because we can.


I always read these with my arms crossed like "pffffff yeah obviously"


There's no limit for array size in C++, or for any core structure. Other than the size types, which naturally scale up as we get newer architectures (obviously 2^64 for most people at the moment).


I think C++ pointers being 64 bit will stick with us forever, unless some fundamental paradigm shift happens in what C++ is used for. I also think the size of short, int, long long, float and double will not change anymore, ever. The reasons we had for changes in the past will not happen anymore from now on, and the cost of changes only increases over time. We might see the sizes (and IEEE-754 floats) be officially standardized at some point, like twos complement is standardized now.


I mean, yes, size_t will stay at 64bits for at least a while, 2^64 bytes is 16 exabytes. Once this is insufficient then it will be slowly switched to 128bits. Some systems already use 128 bits for storage-related arithmetics.


The C++ standard may not impose a limit, but any given C++ implementation will. Likewise, the limit in the OP here isn't part of the Javascript standard, just a Chrome implementation detail.


that would be..? std::numeric_limits<size_t>::max()?


Yes, size_t should be the limiting factor.


If we're saying "Pfffff" at things, let's take a moment to appreciate that by the magic of not actually storing them Rust gets to store all of nothing in no space.

Rust's arrays of something can only have up to std::isize::MAX elements. But by choosing to store nothing instead of something, we can go larger to std::usize::MAX with e.g. [(); std::usize::MAX] and that'll actually work, at runtime, even on a modest computer since you don't need space for std::usize::MAX of anything.

If usize::MAX of a zero size type isn't enough for you, might I interest you in an Iterator that contains not merely usize::MAX but infinity of any Default type of your choosing? And if you don't need that many, you can always just take() fewer of them no charge.


Right there with you


If the user has the memory there should be no hard limit. Oh wait… there’s the pesky issue of memory overcommit…


It COULD be written to be infinitely scalable in size (up to allowed memory) but that comes with a speed cost.

I remember something about NSArray switching its layout (and speed characteristics) after adding certain number of items, but I can't find a reference now.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: