Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I should have been more explicit: it won't be linear-ish. As you fill up any per-box sort of system, you start contending (pretend that you simulate 100 analysts) while the BigQuery results will be closer to linear. It does suck that to benchmark that "accurately", you have to make the queries different enough so that no reuse occurs (unless that's reasonable!).

Again, thanks for doing this!



When we've looked at BigQuery it seemed that if you prepay you essentially get a similar effect to what you're describing. You're given a certain number of "units" of compute, and if you exceeded your concurrent units available you end up with the same compute resource contention you would with an improperly scaled Snowflake warehouse or Redshift cluster.

If you're willing to just pay per gigabyte scanned with BigQuery you can scale near linearly I'm sure (although I haven't actually tried it), but you could accomplish the same thing using Snowflake's API to add warehouses as concurrent query load increases. That's what we do (although we just pre-allocate and suspend the warehouses because you only pay when they're on).

Redshift does suffer from this problem because the compute is tied to the data, but Redshift Spectrum is attempting to rectify that as well. I don't know anything about its performance though.


We're definitely going to revisit this periodically and I would love to address this in the next iteration. Would you mind creating an issue at https://github.com/fivetran/benchmark describing the concurrency trade-off that we're not accurately capturing?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: