Hacker News new | past | comments | ask | show | jobs | submit | jfxberns's comments login

It still is a valid pattern. As disks get bigger, the harder it is to move the data and the more efficient it is to move the compute to the data.


"Does anyone have a use case where data is on a single machine and map reduce is still relevant?"

No! MapReduce is a programming pattern for massively parallelizing computational tasks. If you are doing it on one machine, you are not massively parallelizing your compute and you don't need MapReduce.


"A lot of people fail to understand the overheads and limitations of this kind of architecture. Or how hard it is to program, especially considering salaries for this skyrocketed. More often than not a couple of large 1TB SSD PCIe and a lot of RAM can handle your "big" data problem."

It's not that hard to program... it does take a shift in how you attack problems.

If your data set fits on a few SSDs. then you probably don't have a real big data problem.

"Moving Big Data around is hard. Managing is harder."

Moving big data around is hard--that's why you have hadoop--you send the compute to where he data is, thus requiring a new way of thinking about how you do computations.

"Before doing any Map/Reduce (or equivalent), please I beg you to check out Introduction to Data Science at Coursera https://www.coursera.org/course/datasci"

Data science does not solve the big data problem. Here's my favorite definition of a big data problem: "a big data problem is when the size of the data becomes part of the problem." You can't use traditional linear programming models to handle a true big data problem; you have to have some strategy to parallelize the compute. Hadoop is great for that.

"A large telco has a 600 node cluster of powerful hardware. They barely use it."

Sounds more like organizational issues, poo planning and execution than a criticism of Hadoop!


Two years ago in January 2010, Yangon had their first Barcamp. People were afraid to talk politics. Most of the Internet was firewalled. Aung San Suu Kyi was under house arrest.

Two years later, Myanmar is awakening and filled with hope for the future.


I don't know; ML has a lot of applications and the bar for most people to be able to implement it is rather high. Lowering the bar so "mere mortals" can have some serious infrastructure and data that's a mere API call away seems pretty huge.


The problem is that if you're not confident enough to get these systems working yourself, you're probably not going to be confident enough in your business to pay by the sip for someone else's api.


I am surprised that O'Reilly's "Learning Python" hasn't received more nods. Granted it's long (1200 pages as I recall), but it takes the reader on the full journey through all of Python's multitude of features in a very structured way, with each chapter building on the foundations the other chapters lay.

I recommend "Dive Into Python" to people who want to start to get a taste of Python, but anybody that has developed a appetite for Python and wants to really, really Grok how Python works, I recommend O'Reilly's "Learning Python" and tell them to take the time to read it cover-to-cover.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: