Prometheus itself has no scalability at all. Without distributed evaluation they...

gttalbot · on May 14, 2022

This. Any new large query or aggregation in the Borgmon/Prometheus model requires re-solving federation, and continuing maintenance of runtime configuration. That might technically be scalable in that you could do it but you have to maintain it, and pay the labor cost. It's not practical over a certain size or system complexity. It's also friction. You can only do the queries you can afford to set up.

That's why Google spent all that money to build Monarch. At the end of the day Monarch is vastly cheaper in person time and resources than manually-configured Borgmon/Prometheus. And there is much less friction in trying new queries, etc.

buro9 · on May 14, 2022

That's what Mimir solves

deepsun · on May 14, 2022

How does it compare to VictoriaMetrics?

buro9 · on May 14, 2022

100% Prometheus compatible, proven to scale to 1B active series.

It's not about comparisons, every tool has it's own place and feature set that may be right for you depending on what you're doing. But if you've reached the end of the road with Prometheus due to scale and you need massive scale and perfect compatibility... Then Mimir stands out.

dilyevsky · on May 14, 2022

You can set up dist eval similar to how it was done in borgmon but you gotta do it manually (or maybe write an operator to automate). One of Monarchs core ideas is to do that behind the scenes for you

jeffbee · on May 14, 2022

Prometheus' own docs say that distributed evaluation is "deemed infeasible".

dilyevsky · on May 14, 2022

Thats just like.. you know.. their opinion, man. https://prometheus.io/docs/prometheus/latest/federation/

bpicolo · on May 14, 2022

Prometheus federation isn't distributed evaluation. It "federates" from other nodes onto a single node.

> Federation allows a Prometheus server to scrape selected time series from another Prometheus server

dilyevsky · on May 14, 2022

Collect directly from shard-level prometheus then aggregate using /federate at another level. That’s how thanos also works afaik

hiptobecubic · on May 15, 2022

This still requires the query to fit on a single node at some point unless you're doing multi-level aggregation. Monarch does that. Being able to push expensive, highly-parallelizable aspects of the query down into the leaves is a massive benefit.

dilyevsky · on May 15, 2022

Yep, in theory it’s great if you can afford a team of highly qualified engs to run it like google and dealing with huge metrics. In practice I found it was not hard to crash a monarch mixer (at least some years ago) + most orgs won’t be able to afford nor take advantage of it

preseinger · on May 14, 2022

Prometheus is highly scalable?? What are you talking about??

dijit · on May 14, 2022

It is not.

It basically does the opposite of what every scalable system does.

To get HA you double you’re number of pollers.

To get scale your queries you aggregate them into other prometheii.

If this is scalability: everything is scalable.

preseinger · on May 14, 2022

I don't understand how the properties you're describing imply that Prometheus isn't scalable.

High Availability always requires duplication of effort. Scaling queries always requires sharding and aggregation at some level.

I've deployed stock Prometheus at global scale, O(100k) targets, with great success. You have to understand and buy into Prometheus' architectural model, of course.

wbl · on May 15, 2022

And I've seen a system that did that. We had to predict what global aggregates we would need. Looking at a metric required finding the right instance to connect to if it wasn't in the bubbleup. Picking the right expressions to avoid double counts was hard. Want to do something fancy? No luck because of the lack of distributed querying.

dijit · on May 14, 2022

The ways in which you can scale Prometheus: you can scale anything.

It does not; itself, have highly scalable properties built in.

It does not do sharding, it does not do proxying, it does not do batching, it does not do anything that would allow it to run multiple servers and query over multiple servers.

Look. I’m not saying that it doesn’t work; but when I read about borgmon and Prometheus: I understood the design goal was intentionally not to solve these hard problems, and instead use them as primitive time series systems that can be deployed with a small footprint basically everywhere (and individually queried).

I submit to you, I could also have an influxdb in every server and get the same “scalability”.

Difference being that I can actually run a huge influxdb cluster with a dataset that exceeds the capabilities of a single machine.

preseinger · on May 14, 2022

It seems like you're asserting a very specific definition of scalability that excludes Prometheus' scalability model. Scalability is an abstract property of a system that can be achieved in many different ways. It doesn't require any specific model of sharding, batching, query replication, etc. Do you not agree?

jeffbee · on May 14, 2022

Prometheus cannot evaluate a query over time series that do not fit in the memory of a single node, therefore it is not scalable.

The fact that it could theoretically ingest an infinite amount of data that it cannot thereafter query is not very interesting.

preseinger · on May 14, 2022

It can? It just partitions the query over multiple nodes?

bboreham · on May 15, 2022

Where is the code to do that?

preseinger · on May 15, 2022

Oh, I see what you mean. Sure, it's in Thanos, or Grafana, or whatever layer above, not Prometheus itself.

dijit · on May 14, 2022

I’m not defining terms arbitrarily.

https://en.wikipedia.org/wiki/Database_scalability

Scalability means running a single workload across multiple machines.

Prometheus intentionally does not scale this way.

I’m not being mean, it is fact.

It has made engineering design trade offs and one of those means it is not built to scale, this is fine, I’m not here pooping on your baby.

You can build scalable systems on top of things which do not individually scale.

preseinger · on May 14, 2022

Scalability isn't a well-defined term, and Prometheus isn't a database. :shrug:

dijit · on May 14, 2022

Wrong on both counts

Sorry for being rude, but this level of ignorance is extremely frustrating.

preseinger · on May 14, 2022

Ignorance? I'm a core Prometheus contributor and a 20+ year distsys veteran. Prometheus is not a database, and scalability is not well-defined. These are not controversial statements.

dijit · on May 14, 2022

It is extremely unbecoming to lie about who you are on this forum.

Scalability is defined differently depending on context; in this context (a monitoring/time series solution) it is defined as being able to hold a dataset larger than a single machine that scales horizontally.

Downsampling the data or transforming it does not meet that criteria, since that’s no longer the original data.

The way Prometheus “scales” today is a bolt-on passthrough with federation. It’s not designed for it at all, and means that your query will use other nodes as data sources until it runs out of ram evaluating the query. Or not.

The most common method of “scaling” Prometheus is making a tree; you can do that with anything (so it is not inherent to the technology, thus not a defining characteristic, if everything can be defined the same way then nothing can be- the term ceases to have meaning: https://valyala.medium.com/measuring-vertical-scalability-fo...)

I’ll tell you how influx scales: your data is horizontally sharded across nodes, queries are conducted cross shards.

That’s what scalability of the database layer is.

Not fetching data from other nodes and putting it together yourself.

Rehydrating from many datasets is not the storage system scaling: the collector layer doing the hydration is the thing that is scaling.

If you sold me a solution that used Prometheus underneath but was distributed across all nodes, perhaps we could talk.

But scalability is not a nebulous concept.

You should refer to your own docs if you think Prometheus isn’t a database, it certainly contains one: https://prometheus.io/docs/prometheus/latest/storage/

I should add (and extremely frustratedly): if you’re not lying and you’re a core Prometheus maintainer, you should know this. I’m deeply embarrassed to be telling you this.

preseinger · on May 15, 2022

> in this context (a monitoring/time series solution) it is defined as being able to hold a dataset larger than a single machine that scales horizontally.

This just isn't true :shrug: Horizontal scaling is one of many strategies.

foota · on May 15, 2022

I think the disconnect is that promethus helps a user to shard things, but it's not automatic. Other time series databases and monitoring solutions automatically distribute and query across servers. It's like postgres vs newswl (aka foundationdb, spanner, etc.,).

While Prometheus supports sharding queries when a user sets it up, my understanding is that this has to be done manually, which is definitely less convenient. This is better than a hypothetical system that doesn't allow this at all, but still not the same as something that handles scaling magically.

hiptobecubic · on May 15, 2022

Prometheus supports sharding queries the way a screwdriver supports turning multiple screws at once. You can design a system yourself that includes the screwdriver, which will turn all the screws, but there's nothing inherent to the screwdriver that helps you with this. If "scalability" just means "you can use it to design something new from scratch that scales" then the term is pretty meaningless.

lokar · on May 14, 2022

Lots of systems provide redundancy with 2X cost. It's not that hard.

halfmatthalfcat · on May 14, 2022

Can you elaborate? I’ve ran Prometheus at some scale and it’s performed fine.

lokar · on May 14, 2022

You pretty quickly exceed what one instance can handle for memory, cpu or both. At that point you don't have any real good options to scale while maintaining a flat namespace (you need to partition).

preseinger · on May 14, 2022

Sure? Prometheus scales with a federation model, not a single flat namespace.

gttalbot · on May 14, 2022

This means each new query over a certain size becomes a federation problem, so the friction for trying new things becomes very high above the scale of a single instance.

Monitoring as a service has a lot of advantages.

preseinger · on May 14, 2022

Well you obviously don't issue metrics queries over arbitrarily large datasets, right? The Prometheus architecture reflects this invariant. You constrain queries against both time and domain boundaries.

gttalbot · on May 14, 2022

Monarch can support both ad-hoc and periodic, standing queries of arbitrarily large size, and has the means to spread the computation out over many intermediate mixer and leaf nodes. It does query push-down so that the "expensive" parts of aggregations, joins, etc., can be done in massively parallel fashion at the leaf level.

It scales so well that many aggregations are set up and computed for every service across the whole company (CPU, memory usage, error rates, etc.). For basic monitoring you can run a new service in production and go and look at a basic dashboard for it without doing anything else to set up monitoring.

xyzzy_plugh · on May 15, 2022

That's a choice Prometheus has made, not an invariant. Many scalable systems support arbitrarily large queries as they themselves scale. Prometheus pretty much prevents creating arbitrarily large datasets to begin with, so the point is kind of moot.

Requiring query authors to understand the arrangement of their aggregation layer seems like a reasonable idea but is in fact quite ridiculous.

hiptobecubic · on May 15, 2022

To be fair, monarch has scaling limits that require you to be aware of how aggregation is done as well. It's amazing what it can do, but depending on how much data you have you might need to design your schema/collection with monarch's architecture in mind.

gttalbot · on May 14, 2022

Also, very large ad-hoc queries are supported, with really good user isolation, so that (in general) users don't harm each other.