It's a nice index structure with a metric name as a part of a key value. It allows to skip must have optimizations (like bitmap indexes or parallel index scans) in case of regular structure where metric name is also a tag. It kinda shards indexes and makes per tag id-set a lot smaller.
I'm now thinking about employing the same structure for hisser[1]. Now it uses a regular index structure and quite complex parallel index scan on top of lmdb[2].
Another good property: it's trivial to make tag value autocompletion in query builder. A regular structure forces to partially evaluate a query to get meaningful results. With a metric prefix index it could be perfectly fine to return full scan results.
One thing that isn't clear to me is how, in the next generation solution, the time series ID is obtained when a metric is written, to derive the inverted index. The user only supplies the metric name and tags, how do you know that a particular combination should correspond to ID 1, 2, 3, etc?
Not sure if I got the question, but
during indexing e.g. cpu.total{env=prod,host=a} [id: 1] you insert cpu.total;env=prod and cpu.total;host=a entries with id 1.
then you get cpu.total{env=prod,host=b} [id: 2] and you end up with 3 entries:
cpu.total;env=prod => 1,2
cpu.total;host=a => 1
cpu.total;host=b => 2
If you query avg(cpu.total{env=prod}) you get [1, 2].
If you query avg(cpu.total{env=prod,host=a}) you get intersection([1, 2], [1]) which is [1].
Like, I can't imagine these are autoincrementing IDs. Because then every time a user writes a new value for cpu.total{env=prod,host=a}, you'd end up with a new unique timeseries ID.
Nice article. I think all of their choices make sense. I also imagine that given that it's based on RocksDB the overhead of CGo would also be eliminated, which might be important in saving CPU in their case.
Probably not in the near future, to even benefit they would probably need to complete the rewrite to a threaded design. There is a proposal to do this but the proposal notes many roadblocks since the Fork based model currently employed has left the code with quite a few places that still rely on this.
Secondly, there is a bunch of features from the SQL standard that is more or less complicated to get right with RocksDB under the hood, most importantly collations. This is because the bespoke table engines can create collation dependent indexes (ie how data is sorted on disk) whilst mapping an index to a Key-Value storage like RocksDB would require the ability to create rewrites of the strings to a binary encoding that remains sorted as per the collation (This is why MyRocks only supports binary, latin1_bin and utf8_bin collations).
I'm now thinking about employing the same structure for hisser[1]. Now it uses a regular index structure and quite complex parallel index scan on top of lmdb[2].
[1]: https://github.com/baverman/hisser [2]: https://github.com/baverman/hisser/blob/master/hisser/lmdb_s...