I've used hadoop at petabyte scale (2+pb input; 10+pb sorted for the job) for machine learning tasks. If you have such a thing on your resume, you will be inundated with employers who have "big data", and at least half will be under 50g with a good chunk of those under 10g. You'll also see multiple (shitty) 16 machine clusters, any of which -- for any task -- could be destroyed by code running on a single decent server with ssds. Let alone hadoop jobs running in emr, which is glacially slow (slow disk, slow network, slow everything.)
Also, hadoop is so painfully slow to develop in it's practically a full employment act for software engineers. I imagine it's similar to early ejb coding.
> Also, hadoop is so painfully slow to develop in it's practically a full employment act for software engineers.
It's comical how bad Hadoop is compared even to the CM Lisp described in Daniel Hillis' PhD dissertation. How do you devolve all the way from that down to "It's like map/reduce. You get one map and one reduce!"
Programming is very faddish. It's amazing how bad commonly used technologies are. I'm so happy I'm mostly a native developer and don't have to use the shitty web stack and its shitty replacements.
What really puzzles me is that Doug Cutting worked at Xerox PARC and Mike Cafarella has two (!) CS Masters degrees, a PhD degree, and is a professor at the University of Michigan. It's not like they were unaware of the previous work in the field (Connection Machine languages, Paralations, NESL).
Also, hadoop is so painfully slow to develop in it's practically a full employment act for software engineers. I imagine it's similar to early ejb coding.