Hacker News new | past | comments | ask | show | jobs | submit login

It's also lack of understanding the problem you're trying to solve.

For example at my previous job we had a dedicated data store for performing lookups such as ip -> zip code and latitude/longitude -> zip code.

The company decided to use Oracle Coherence and store all data in memory, because it'll be fast. To store all of that information they needed 16 m3.medium machines.

Last year they had a great success optimizing it, because they managed to replace 16 x m3.medium machines to just 3 x c3.2xlarge machines running MongoDB (the data was ~12GB).

I did a POC and put the data in PostgreSQL with proper columns and indices (I just needed to install ip4range and PostGIS), the whole data fit in 600MB! the queries took under at most 2ms on cold cache but generally were under miliseconds, because all the data fit in RAM.




It seems more of a problem with the people than technologies.

Why would you need 16 m3.medium (60GB RAM total) to store 600MB of data ? I've used Oracle Coherence and other grid technologies and something doesn't sound right here. It is just a couple of distributed Java HashMaps we are taking about here.

Likewise if 600MB of data is expanding to 12GB in MongoDB then something is very, very wrong with the design of your schema.


Yes problem were people. Too much politics and that's why I left.

Why more data? In case of IP geolocation neither of that technology understood IPs not to mention being able to create a proper index for ranges.

So in case of Mongo, to get a good performance they decided to generate every possible IPv4 address and map it to zip code. To increase efficiency they stored every IP as a 64 bit integer.

In Coherence they did the same thing, but I guess less efficiently (did not look how it was done, since at the time coherence was in the process of being eliminated) I'm guessing maybe they stored is as a string?

Also note that Coherence is a distributed cache that supposed to withstand couple nodes going down, so a lot of data was duplicated.


> It seems more of a problem with the people than technologies.

Isn't that what "lack of understanding of the problem you're trying to solve" means?

Though in this case, if the problem had "IP4 ranges" and "geographic data and computation functions" in its scope, then MongoDB is quite inadequate compared to PostgreSQL.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: