I applaud the effort but is this really "big data" - the largest data sets they seem to test are ~150GB, that would fit comfortably on my Mac Book Pro a number of times over. Many of these systems being tested are designed to scale efficiently when the data starts peaking > 5TB and therefore I am dubious about the median response time results - things that work well for small datasets (where small is defined as < 1TB) easily fall apart when you scale them up a little bit more.
These technologies aren't useful just for storing data. Yes, you can do that on your MacBook. These are useful for when you need answers quickly or need to perform complex analysis. For example, spark (the engine behind shark) allows you to run things like logistic regressions at scale in just a few seconds. As far as I know, you can't load 150GB of data into memory in R on your MacBook and then run a logistic regression a few seconds later.
Yeah, I was thinking that the first two sets I are sized around the size I would see on my piddly little SQL Server. 100m results is starting to get into the ballpark.
Big Data is less about the size, and more about characterizing how you want to deal with the data. Straightforward structures queries are the purview of RDBMS, whereas complex analytics is for "Big Data" frameworks like those evaluated.