Timeseries Indexing at Scale

bvrmn · on Sept 2, 2024

It's a nice index structure with a metric name as a part of a key value. It allows to skip must have optimizations (like bitmap indexes or parallel index scans) in case of regular structure where metric name is also a tag. It kinda shards indexes and makes per tag id-set a lot smaller.

I'm now thinking about employing the same structure for hisser[1]. Now it uses a regular index structure and quite complex parallel index scan on top of lmdb[2].

[1]: https://github.com/baverman/hisser [2]: https://github.com/baverman/hisser/blob/master/hisser/lmdb_s...

bvrmn · on Sept 2, 2024

Another good property: it's trivial to make tag value autocompletion in query builder. A regular structure forces to partially evaluate a query to get meaningful results. With a metric prefix index it could be perfectly fine to return full scan results.

lopkeny12ko · on Sept 2, 2024

One thing that isn't clear to me is how, in the next generation solution, the time series ID is obtained when a metric is written, to derive the inverted index. The user only supplies the metric name and tags, how do you know that a particular combination should correspond to ID 1, 2, 3, etc?

misiek08 · on Sept 2, 2024

Not sure if I got the question, but during indexing e.g. cpu.total{env=prod,host=a} [id: 1] you insert cpu.total;env=prod and cpu.total;host=a entries with id 1. then you get cpu.total{env=prod,host=b} [id: 2] and you end up with 3 entries: cpu.total;env=prod => 1,2 cpu.total;host=a => 1 cpu.total;host=b => 2

If you query avg(cpu.total{env=prod}) you get [1, 2]. If you query avg(cpu.total{env=prod,host=a}) you get intersection([1, 2], [1]) which is [1].

lopkeny12ko · on Sept 2, 2024

> during indexing e.g. cpu.total{env=prod,host=a} [id: 1]

Right, how do you know at write-time that cpu.total{env=prod,host=a} is ID 1?

The user only supplies the metric name and tags.

lopkeny12ko · on Sept 2, 2024

Like, I can't imagine these are autoincrementing IDs. Because then every time a user writes a new value for cpu.total{env=prod,host=a}, you'd end up with a new unique timeseries ID.

nasretdinov · on Sept 2, 2024

Nice article. I think all of their choices make sense. I also imagine that given that it's based on RocksDB the overhead of CGo would also be eliminated, which might be important in saving CPU in their case.

yas_hmaheshwari · on Sept 2, 2024

Nicely written!

I think this should be a principle now (copied from Jeff Atwood's famous quote):

Any infra application that can be written in Rust, will eventually be written in Rust

nasretdinov · on Sept 2, 2024

Any CPU or Memory-bound one :). Lots of infra apps are just fine in Go too

yas_hmaheshwari · on Sept 3, 2024

Fair point.

nithril · on Sept 2, 2024

I had assumed that Datadog was using Elasticsearch as a storage solution, including for metrics. Seems not.

carefulfungi · on Sept 2, 2024

https://www.datadoghq.com/blog/engineering/introducing-husky...

tehlike · on Sept 2, 2024

One of my wishlist items - rocksdb storage engine for postgres.

whizzter · on Sept 2, 2024

Probably not in the near future, to even benefit they would probably need to complete the rewrite to a threaded design. There is a proposal to do this but the proposal notes many roadblocks since the Fork based model currently employed has left the code with quite a few places that still rely on this.

Secondly, there is a bunch of features from the SQL standard that is more or less complicated to get right with RocksDB under the hood, most importantly collations. This is because the bespoke table engines can create collation dependent indexes (ie how data is sorted on disk) whilst mapping an index to a Key-Value storage like RocksDB would require the ability to create rewrites of the strings to a binary encoding that remains sorted as per the collation (This is why MyRocks only supports binary, latin1_bin and utf8_bin collations).

See this page on limitations for MyRocks (the RocksDB MySQL engine), https://docs.percona.com/percona-server/5.7/myrocks/limitati...