I don't understand how the properties you're describing imply that Prometheus is...

wbl · on May 15, 2022

And I've seen a system that did that. We had to predict what global aggregates we would need. Looking at a metric required finding the right instance to connect to if it wasn't in the bubbleup. Picking the right expressions to avoid double counts was hard. Want to do something fancy? No luck because of the lack of distributed querying.

dijit · on May 14, 2022

The ways in which you can scale Prometheus: you can scale anything.

It does not; itself, have highly scalable properties built in.

It does not do sharding, it does not do proxying, it does not do batching, it does not do anything that would allow it to run multiple servers and query over multiple servers.

Look. I’m not saying that it doesn’t work; but when I read about borgmon and Prometheus: I understood the design goal was intentionally not to solve these hard problems, and instead use them as primitive time series systems that can be deployed with a small footprint basically everywhere (and individually queried).

I submit to you, I could also have an influxdb in every server and get the same “scalability”.

Difference being that I can actually run a huge influxdb cluster with a dataset that exceeds the capabilities of a single machine.

preseinger · on May 14, 2022

It seems like you're asserting a very specific definition of scalability that excludes Prometheus' scalability model. Scalability is an abstract property of a system that can be achieved in many different ways. It doesn't require any specific model of sharding, batching, query replication, etc. Do you not agree?

jeffbee · on May 14, 2022

Prometheus cannot evaluate a query over time series that do not fit in the memory of a single node, therefore it is not scalable.

The fact that it could theoretically ingest an infinite amount of data that it cannot thereafter query is not very interesting.

preseinger · on May 14, 2022

It can? It just partitions the query over multiple nodes?

bboreham · on May 15, 2022

Where is the code to do that?

preseinger · on May 15, 2022

Oh, I see what you mean. Sure, it's in Thanos, or Grafana, or whatever layer above, not Prometheus itself.

dijit · on May 14, 2022

I’m not defining terms arbitrarily.

https://en.wikipedia.org/wiki/Database_scalability

Scalability means running a single workload across multiple machines.

Prometheus intentionally does not scale this way.

I’m not being mean, it is fact.

It has made engineering design trade offs and one of those means it is not built to scale, this is fine, I’m not here pooping on your baby.

You can build scalable systems on top of things which do not individually scale.

preseinger · on May 14, 2022

Scalability isn't a well-defined term, and Prometheus isn't a database. :shrug:

dijit · on May 14, 2022

Wrong on both counts

Sorry for being rude, but this level of ignorance is extremely frustrating.

preseinger · on May 14, 2022

Ignorance? I'm a core Prometheus contributor and a 20+ year distsys veteran. Prometheus is not a database, and scalability is not well-defined. These are not controversial statements.

dijit · on May 14, 2022

It is extremely unbecoming to lie about who you are on this forum.

Scalability is defined differently depending on context; in this context (a monitoring/time series solution) it is defined as being able to hold a dataset larger than a single machine that scales horizontally.

Downsampling the data or transforming it does not meet that criteria, since that’s no longer the original data.

The way Prometheus “scales” today is a bolt-on passthrough with federation. It’s not designed for it at all, and means that your query will use other nodes as data sources until it runs out of ram evaluating the query. Or not.

The most common method of “scaling” Prometheus is making a tree; you can do that with anything (so it is not inherent to the technology, thus not a defining characteristic, if everything can be defined the same way then nothing can be- the term ceases to have meaning: https://valyala.medium.com/measuring-vertical-scalability-fo...)

I’ll tell you how influx scales: your data is horizontally sharded across nodes, queries are conducted cross shards.

That’s what scalability of the database layer is.

Not fetching data from other nodes and putting it together yourself.

Rehydrating from many datasets is not the storage system scaling: the collector layer doing the hydration is the thing that is scaling.

If you sold me a solution that used Prometheus underneath but was distributed across all nodes, perhaps we could talk.

But scalability is not a nebulous concept.

You should refer to your own docs if you think Prometheus isn’t a database, it certainly contains one: https://prometheus.io/docs/prometheus/latest/storage/

I should add (and extremely frustratedly): if you’re not lying and you’re a core Prometheus maintainer, you should know this. I’m deeply embarrassed to be telling you this.

preseinger · on May 15, 2022

> in this context (a monitoring/time series solution) it is defined as being able to hold a dataset larger than a single machine that scales horizontally.

This just isn't true :shrug: Horizontal scaling is one of many strategies.

foota · on May 15, 2022

I think the disconnect is that promethus helps a user to shard things, but it's not automatic. Other time series databases and monitoring solutions automatically distribute and query across servers. It's like postgres vs newswl (aka foundationdb, spanner, etc.,).

While Prometheus supports sharding queries when a user sets it up, my understanding is that this has to be done manually, which is definitely less convenient. This is better than a hypothetical system that doesn't allow this at all, but still not the same as something that handles scaling magically.

hiptobecubic · on May 15, 2022

Prometheus supports sharding queries the way a screwdriver supports turning multiple screws at once. You can design a system yourself that includes the screwdriver, which will turn all the screws, but there's nothing inherent to the screwdriver that helps you with this. If "scalability" just means "you can use it to design something new from scratch that scales" then the term is pretty meaningless.

lokar · on May 14, 2022

Lots of systems provide redundancy with 2X cost. It's not that hard.