"A lot of people fail to understand the overheads and limitations of this kind of architecture. Or how hard it is to program, especially considering salaries for this skyrocketed. More often than not a couple of large 1TB SSD PCIe and a lot of RAM can handle your "big" data problem."
It's not that hard to program... it does take a shift in how you attack problems.
If your data set fits on a few SSDs. then you probably don't have a real big data problem.
"Moving Big Data around is hard. Managing is harder."
Moving big data around is hard--that's why you have hadoop--you send the compute to where he data is, thus requiring a new way of thinking about how you do computations.
Data science does not solve the big data problem. Here's my favorite definition of a big data problem: "a big data problem is when the size of the data becomes part of the problem." You can't use traditional linear programming models to handle a true big data problem; you have to have some strategy to parallelize the compute. Hadoop is great for that.
"A large telco has a 600 node cluster of powerful hardware. They barely use it."
Sounds more like organizational issues, poo planning and execution than a criticism of Hadoop!
It's not that hard to program... it does take a shift in how you attack problems.
If your data set fits on a few SSDs. then you probably don't have a real big data problem.
"Moving Big Data around is hard. Managing is harder."
Moving big data around is hard--that's why you have hadoop--you send the compute to where he data is, thus requiring a new way of thinking about how you do computations.
"Before doing any Map/Reduce (or equivalent), please I beg you to check out Introduction to Data Science at Coursera https://www.coursera.org/course/datasci"
Data science does not solve the big data problem. Here's my favorite definition of a big data problem: "a big data problem is when the size of the data becomes part of the problem." You can't use traditional linear programming models to handle a true big data problem; you have to have some strategy to parallelize the compute. Hadoop is great for that.
"A large telco has a 600 node cluster of powerful hardware. They barely use it."
Sounds more like organizational issues, poo planning and execution than a criticism of Hadoop!