Hacker News new | past | comments | ask | show | jobs | submit login

I was being fairly liberal with the word index - by partitioning your data by time and selecting partitions in queries, time effectively becomes an "index" that allows you to avoid searching more of your data set than you have to.

table scans are fast in modern columnstores

I guess that depends on your expectations of 'fast'. Even with our smallish dataset, both BQ and Keen had multi-second responses -- frequently 10+s. It was totally unacceptable for user-facing analytics. And we had a lot of customers making a lot of queries - it started to get expensive fast.

I'm sure 10s responses would be very 'fast' for terabyte-sized data volumes. But that's not the problem we were trying to solve.




Yes, the problem is they just aren't a good fit for your data size.

Keen isn't a columnstore, it's a custom database built on top of Cassandra where they take JSON records and split them into compressed batches with each unique property stored in the CQL data model, and it's processed by Storm workers. It's an outdated architecture compared to modern columnstores that can now handle unstructured/nested data really well.

BigQuery is designed for throughput instead of latency. There is a minimum 3-5 seconds to schedule your query across the server pool before it even starts processing. It's also a single shared cluster for all customers so performance is variable, but the trade-off is that 100TB also takes seconds to scan.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: