Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> seems like you're trying to say that this situation requires 30K IOPS. That would be really naive :)

If you want fast reads, it'd take at least that many IOPS for a naieve solution where each timeseries had its own block you were appending to.

> A typical RDBMS naturally batches parallel writes, so a bunch of 3-4 HDDs providing totally 600+ IOPS would easily serve your situation.

You could do that if you only cared about writes. There's two problems with such an approach.

First reads would unbearably slow as to read a single timeseries for an hour at say 10s resolution would take 360 operations (one per batch), or around 3.6 seconds presuming 10ms seek time. Wanting to read a hundred timeseries at once over a day is not unusual, which would take 8640 seconds.

Secondly it's not likely to be good disk space wise. As each point is individually written you'd not be able to do intra-timeseries compression, so you're probably talking at least 16 bytes per sample. Probably nearer 100-200 bytes, if you write the metric name each time.

Contrast that with the approach taken by something like Prometheus. We build up 1kB chunks, which hold ~780 samples on average as Gorilla gets us ~1.3B/sample. We batch these up per time series across several hours, so accessing 6 hours is only one disk operation. With the above example of reading 100 timeseries for a day that's only 4 seconds, which is 3 orders of magnitude faster than what simple write batching allows for.

Obviously this is ignoring many details like caching, but the general point holds.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: