Wow, this is a fantastic solution to some questions I've had rattling around in my head for years about the optimal bucket choices to minimize error given a particular set of buckets.
Do I read right that circllhist has a pretty big number of bin sizes and is not configurable (except that they're sparse so may be small on disk)?
I've found myself using high-cardinality Prometheus metrics where I can only afford 10-15 distinct histogram buckets. So I end up
(1) plugging in my live system data from normal operations and from outage periods into various numeric algorithms that propose optimal bucket boundaries. These algorithms tell me that I could get great accuracy if I chose thousands of buckets, which, thanks for rubbing it in about my space problems :(. Then I write some more code to collapse those into 15 buckets while minimizing error at various places (like p50, p95, p99, p999 under normal operations and under irregular operations).
(2) making sure I have an explicit bucket boundary at any target that represents a business objective (if my service promises no more than 1% of requests will take >2500ms, setting a bucket boundary at 2500ms gives me perfectly precise info about whether p99 falls above/below 2500ms)
(3) forgetting to tune this and leaving a bunch of bad defaults in place which often lead to people saying "well, our graph shows a big spike up to 10000ms but that's just because we forgot to tune our histogram bucket boundaries before the outage, actually we have to refer to logs to see the timeouts at 50 sec"
Prometheus is in the process of developing a similar automatic log-linear histogram bucket type. The goal is to make it as cheap as a 10-15 bucket histogram, but not require pre-defined buckets.
Correct me if I’m wrong but I thought the sparse histogram effort in Prometheus will still use single metric line counters for its storage abstraction?
I think it’s a great addition to the product and will excitedly use it but it’s a pretty big difference from a histogram centric db like circonus’ or a schema’d one like monarch.
I’ve used these log-linear history in a few pieces of code. There is some configurability in the abstract - you could choose a different logarithm base.
In practice none of the implementations seem to provide that. Within the each set of buckets for a given log base you have reasonable precision at that magnitude. If your metric is oscillating around 1e6 you shouldn’t care much about the variance at 1e2, and with this scheme you don’t have to tune anything to provide for that.
There are a large amount of subtle tradeoffs around the bucketing scheme (log, vs. log-linear, base) and memory layout (sparse, dense, chunked) the amount of configurability in the histogram space (circllhist, DDSketch, HDRHistogram, ...). A good overview is this discussion here:
As for the circllhist: There are no knobs to turn. It uses base 10 and two decimal digits of precision. In the last 8 years I have not seen a single use-case in the operational domain where this was not appropriate.
Right - it’s why I said “in the abstract.” You could do it and still have a log-linear format. Base 10 works great for real-world distributions.
Thanks for making and ossing circllhist. I’ve been close by to the whole “what’s the OTel histogram going to be” discussion for the last many months and learned a lot from that. That discussion is what introduced me to circllhist and got me using them.
Do I read right that circllhist has a pretty big number of bin sizes and is not configurable (except that they're sparse so may be small on disk)?
I've found myself using high-cardinality Prometheus metrics where I can only afford 10-15 distinct histogram buckets. So I end up
(1) plugging in my live system data from normal operations and from outage periods into various numeric algorithms that propose optimal bucket boundaries. These algorithms tell me that I could get great accuracy if I chose thousands of buckets, which, thanks for rubbing it in about my space problems :(. Then I write some more code to collapse those into 15 buckets while minimizing error at various places (like p50, p95, p99, p999 under normal operations and under irregular operations).
(2) making sure I have an explicit bucket boundary at any target that represents a business objective (if my service promises no more than 1% of requests will take >2500ms, setting a bucket boundary at 2500ms gives me perfectly precise info about whether p99 falls above/below 2500ms)
(3) forgetting to tune this and leaving a bunch of bad defaults in place which often lead to people saying "well, our graph shows a big spike up to 10000ms but that's just because we forgot to tune our histogram bucket boundaries before the outage, actually we have to refer to logs to see the timeouts at 50 sec"