Oh yeah! When I showed up at my current employer, I had a laptop with 32gb and 2...

pletnes · on April 3, 2018

Same story for me, give or take a few percent. I have to recommend dask for python though, it made out of memory errors largely disappear for me. It allows parallel processing of disk-size datasets with the convenience of in-memory datasets (almost).

joshu · on April 3, 2018

Holy cow, dask looks useful

pletnes · on April 3, 2018

The most impressive is how easy it is to use. Basically just reuse of the numpy API.

FridgeSeal · on April 4, 2018

They replicate the Pandas API, not the numpy one. Just for clarity's sake.

whynaut · on April 4, 2018

What a peculiar comment. Their homepage has examples of both - with Numpy first.

FridgeSeal · on April 4, 2018

Really? It's been a while since I've used it, and I remember a good portion of the documentation talking about how they replicate some, but not all of Pandas' API's (because of the sheer number of them).

They've probably updated it since then...

stustustu · on April 4, 2018

yup it really is

gambiting · on April 3, 2018

I'm a C++ programmer working in games and I run out of 64GB of ram in my workstation daily. I can't wait until we finally get all upgraded to 128 or 256GB of ram as standard.

wonderland83 · on April 3, 2018

Well, thats why we consumer have to buy better hardware and more ram? As an old former Amiga programmer I have always till this day been a less is more kind of guy. Make code run faster and make the program use less ram.

FractalLP · on April 3, 2018

Good in theory unless you need all of the data at once. There are things we do now that wouldn't have been possible (in the same sense) 25 years ago without a lot of work. We might use languages that are 200x slower, but they might be 10x more productive. That's a winning tradeoff for many people.

gambiting · on April 4, 2018

Nope, it has nothing to do with what you as a customer get as a final product. Loading the main map of the game uses about 30GB of ram in the editor + starting the main servers in a custom configuration will use that amount again. Systems like fastbuild can use several gigabytes when compiling. None of this has anything to do with the client, which will run with as little as 4GB of ram.

mf2hd · on April 4, 2018

Buy a 2 x Xeon workstation with 768GB ram?

super_mario · on April 3, 2018

Once your datasets go out of the bounds of single reasonable machine, it's time to switch to Apache Spark cluster (or similar).

You can still write your data analysis code in Python, but you get to leverage multiple machines and intelligent compute engine that knows how to distribute your computation across nodes automatically, keeping data linkage and parentage information, so computation is moved closest to where data is located.

znpy · on April 3, 2018

You know, sometimes you are in that uncomfortable spot where you have too much data for a single laptop but too little to justify running a whole computing cluster.

That is the kind of spot where you max out everything you can max out and just go take a break when something intensive is running.

makmanalp · on April 3, 2018

This - honestly depending on the task hundreds of GB can be still the "single computer" realm because it's just not worth it to set up a cluster in terms of time and money and also administration overhead. However parallel + out of core computation doesn't necessarily imply a cluster: single-node Spark or something like dask works fine if you're in the python world.

super_mario · on April 3, 2018

Setting up ad hoc (aka standalone) Spark cluster with a bunch of machines you have control over is ridiculously trivial task though. It's as easy as running spark --master=x where you designate one machine as master. All others started with --master=x become slaves of x. Then you just submit jobs to x and that's all.

bitL · on April 3, 2018

Spark is slow though. On the other hand, Pandas is also extraordinarily slow :D

walshemj · on April 4, 2018

Then you remote into a workstation as some one else in this thread said they did.

ska · on April 3, 2018

Running distributed like that always has a cost, both in inefficiency of the compute and in person-time.

If you still can run on one machine, it's almost always a win. 32Gb is a perfectly reasonable amount of memory to expect. 64Gb isn't outlandish at all for a workstation.

dagss · on April 6, 2018

Really depends on the computation.. what you say only make sense for some niches of computations.

mistermann · on April 3, 2018

What laptop are you using, I think my thinkpad W540 is capped at 32GB.

chasedehan · on April 3, 2018

Thinkpad P50

mr_toad · on April 3, 2018

Cloud is an option for really large memory requirements. You can provision machines with nearly 2TB of RAM in AWS, and its pretty cost effective if you only spin them up when you actually need them.