I applaud the effort but is this really "big data" - the largest data sets they ...

physcab · on Sept 29, 2013

These technologies aren't useful just for storing data. Yes, you can do that on your MacBook. These are useful for when you need answers quickly or need to perform complex analysis. For example, spark (the engine behind shark) allows you to run things like logistic regressions at scale in just a few seconds. As far as I know, you can't load 150GB of data into memory in R on your MacBook and then run a logistic regression a few seconds later.

hobs · on Sept 29, 2013

Yeah, I was thinking that the first two sets I are sized around the size I would see on my piddly little SQL Server. 100m results is starting to get into the ballpark.

hnriot · on Sept 29, 2013

Big Data is less about the size, and more about characterizing how you want to deal with the data. Straightforward structures queries are the purview of RDBMS, whereas complex analytics is for "Big Data" frameworks like those evaluated.