More

netingle · 2024-03-23T12:48:57 1711198137

> it wasn't as simple as it says

mind elaborating? we built loki for some pretty massive scale but I've always tried to make it work at super small scale to. what went wrong?

netingle · on March 15, 2023

Hi! Tom from Grafana Labs here, super excited to welcome the Pyroscope team and can’t wait to see what they achieve.

Continuous profiling is a next big thing IMO - easier to get started with than distributed tracing and delivers immediate value.

anbotero · on March 15, 2023

I’m really, really liking this.

I’ve been testing the whole stack on a local server, finding kinks, documenting workflows, because I hope I move us to this soon. Now with Pyroscope I would love to try it out even more. I kid you not, we were just testing Datadog Continuous Profiling for our legacy Ruby application not two days ago and it was quite lackluster.

Not saying this will be better (have yet to try), but I’d prefer to support and report feedback back on this for a brighter future for our community.

Keep it up!

Rperry2174 · on March 15, 2023

Ryan here from Pyroscope -- Awesome to hear you're interested in migrating over. Would love for you to try out our Ruby integration and let us know what you think: https://pyroscope.io/docs/ruby/.

We've added a lot of features like tags/labels, integration with tracing: https://github.com/pyroscope-io/otel-profiling-ruby, integration with CI/CD (in rails), etc.

Feedback very welcome!

felixge · on March 15, 2023

> we were just testing Datadog Continuous Profiling

Felix from Datadog here :). We'd love to hear your thoughts on our profiler if you're willing to share them. My e-mail is in my profile.

PS: Congrats to Ryan, Dmitry and team :).

anbotero · on March 17, 2023

Hey! It was alright, but at least for Ruby it didn’t report on the memory allocation of the calls, which is what we were looking for in the first place. We do not use it anywhere else, we only tried it for this pesky legacy system.

abuani · on March 15, 2023

This is exciting news Tom! Looking forward to seeing some posts in the future on how using Pyroscope reduced spend by finding perf improvements.

netingle · on June 14, 2022

Its very much our aim to make this mix of self-hosted and cloud services as easy as going all-cloud; but I agree we're not quite there yet.

Do you mind if I ask what isn't super-easy about linking self-hosted loki search queries with SaaS-Prometheus? You should be e.g. able to add a Prometheus data source to your local Grafana (or securely expose your Loki to the internet and add a Loki data source to your Cloud Grafana)

sandstrom · on June 15, 2022

Honestly I haven't tried that much, but didn't find anything in the docs so I assumed it wasn't a prioritized area.

In our particular scenario, we'd probably want to run Loki + Grafana locally, and then hosted Prometheus + hosted Grafana for metrics.

But would be great if we could just tell the two about each other, and under which domains they exist. That way, Prometheus-grafana could construct URLs that linked straight into Loki-grafana (that we host) for e.g. the same interval, or the same label filter (GET params).

But it would only work if I (the end-user) had access to both. That way, we don't have to expose Loki to the internet. But linking would still work.

There are quite a lot of services that does this with Github and commits. You can link from e.g. Bugsnag to Github by only telling Bugsnag your org and repo names. But Bugsnag won't have read access to Github (they also have another integration method which does require access, but that's not the one I'm talking about here).

Those types of "linking into a known URL pattern of another service" integrations are easy to setup and very easy to secure.

netingle · on March 30, 2022

I don't think so! I think thats being used in Tempo, but I'm not sure.

number101010 · on March 31, 2022

We are definitely investigating columnar formats in Tempo to store traces. We expect it to drastically accelerate search as well as open up more complex querying and eventually metrics from distributed tracing data.

However, we are currently primarily targeting Parquet as our columnar format in object storage.

Expect an announcement soon!

netingle · on March 30, 2022

We tried to address this question on the Q&A blog post: https://grafana.com/blog/2022/03/30/qa-with-our-ceo-about-gr...

It doesn't have to mean the end for Cortex, but others will have to step up to lead the project. We've tried to put other maintainers in place to kick start this.

sciurus · on March 30, 2022

I was going to ask what the migration path was from Cortex to Mimir, but I see you've documented that at https://grafana.com/docs/mimir/latest/migration-guide/migrat... . Thanks for the work you've done to make this easy.

pracucci · on March 30, 2022

This video also shows a live migration from Cortex to Mimir (running in Kubernetes): https://www.youtube.com/watch?v=aaGxTcJmzBw&ab_channel=Grafa...

netingle · on March 30, 2022

(Tom here; I started the Cortex project on which Mimir is based and lead the team behind Mimir)

Thanos is an awesome piece of software, and the Thanos team have done a great job building an vibrant community. I'm a big fan - so much so we used Thanos' storage in Cortex.

Mimir builds on this and makes it even more scalable and performance (with a sharded compactor and query engine). Mimir is multitenant from day 1, whereas this is a relatively new thing in Thanos I believe. Mimir has a slightly different deployment model to Thanos, but honestly even this is converging.

Generally: choosing Thanos is always going to be a good choice, but IMO choosing Mimir is an even better one :-p

AndyNemmity · on March 30, 2022

Okay, but why? I am using Thanos today. It works, it's complex, when it breaks, it's a bit of a challenge to fix, but it happens. It doesn't break often.

It does the job. Mimir, which is based on Cortex, using either Mimir, or Cortex, what benefit am I getting?

I get asked every few months about moving off of Thanos to Cortex, and today now Mimir, and I don't have any substantial reason to do so. It feels like moving for the sake of moving.

I need to see some real reasoning as to why I am going to add value to move everything to Mimir.

netingle · on March 30, 2022

Sounds like Thanos is working well for you, so in your position I wouldn't change anything.

There are a bunch of other reasons why people might choose Mimir; perhaps they have out grown some of the scalability limits, or perhaps they want faster high cardinality queries, or a different take on multi-tenancy.

Do remember Cortex (on which Mimir is based) predates Thanos as a project; Thanos was started to pursue a different architecture and storage concept. Thanos storage was clearly the way forward, so we adopted it. The architectures are still different: Thanos is "edge"-style IMO, Mimir is more centralised. Some people have a preference for one over the other.

AndyNemmity · on March 30, 2022

That's fair, thanks for the input. The only reason we implemented Thanos in the first place was a particular feature that we needed at the time of implementation. Now using it in an extremely large environment, I haven't seen any scalability limits. Speed of queries isn't a driver of anything.

Multi Tenancy certainly is, but we have our own custom multi tenancy solution over top of it we built ourselves. I'd like to get rid of that ultimately, but we're not utilizing whatever multi tenant features exist at the moment. Perhaps that will be a driver.

Appreciate your thoughts.

daviziko9 · on March 31, 2022

We were struggling with Cortex a couple years ago, then we tried VictoriaMetrics and haven't look back. It goes pretty much unattended with just monitoring disk space to make sure we still have room to continue pouring in metrics. When a component crashes (not often) it recovers pretty much without noticing.

notacoward · on March 30, 2022

Multi-tenancy is something that shouldn't be underestimated. A lot of people think it's just a checklist item until (a) they need it or (b) they try to implement it in an existing system. Kudos for making it a day-one feature.

vladvasiliu · on March 30, 2022

While I agree with your point in the general case, would you mind elaborating on the specific case of Prometheus?

My understanding is that the recommended best-practice for Prometheus is to deploy as many of them as necessary, as close to the monitored infrastructure as possible.

What use case would require deploying a single Mimir, so supposedly Prometheus (cluster) in the case of serving multiple tenants? Why not just deploy a dedicated Prometheus / Mimir stack per client?

notacoward · on March 31, 2022

I don't know Prometheus, but I would imagine the answer depends on just how many clients you have. Probably doesn't matter if you're talking just a few. If it's a lot, then separate instances can be very expensive in terms of operational complexity and waste due to resource fragmentation. Multi-tenancy is good for bringing both of those back under control. Is there something about Prometheus that would negate that?

vladvasiliu · on April 1, 2022

For one, it doesn't really support authentication (although it's on the roadmap).

I'm no Prometheus expert, but since you're pretty much expected to be running a bunch of servers anyway, the operational complexity has to be handled even for just one client.

You do have a point on resource fragmentation, but IME Prometheus' resource usage is fairly predictable, so you could probably mitigate that to a point.

netingle · on March 30, 2022

I agree! Which is why I put one in the blog post ;-) https://grafana.com/blog/2022/03/30/announcing-grafana-mimir...

krnlpnc · on March 30, 2022

I'm not seeing a comparison to Thanos

alrlroipsp · on March 30, 2022

Why would you? Parent says its a comparison of Mimir and Cortex.

krnlpnc · on March 30, 2022

Re-read the full thread...

>>Grafana Labs needs to make a convincing comparison chart of some kind between Mimir, Thanos, and Cortex.

>I agree! Which is why I put one in the blog post ;-)

netingle · on Feb 2, 2022

For now, yes. Long term we're trying to offer everything we do both on premise and in the cloud. It's a bit tricky, so we can't say when....

chosenken · on Feb 2, 2022

Would it be possible to have a split offering, with both on prem and cloud? In my mind I would prefer to have things like Prometheus, Logs, and Metrics stored on prem mainly due to the volume of logs and metrics we create. Then use Grafana cloud for Grafana Dashboards, Loki logs, and incident management that pull directly from my on prem data stores. I bring this up as it may be cost prohibitive for us to store our metrics in the cloud ( we make so many metrics and logs! ) but I would love to off load hosting the front end. Grafana cloud takes care of managing and maintaining Grafana Dashboard and backend database, Authentication, updates, ect. I'm fine hosting Prometheus and Loki locally, have been for a long time! I just get annoyed having to host Grafana and setting it up, the database up, configuring auth, etc.

bboreham · on Feb 2, 2022

I’m pretty sure that is doable today: Hosted Grafana with data sources pointing at your on-prem Prometheus and Loki.

https://grafana.com/docs/grafana-cloud/fundamentals/gs-visua...

(I work for Grafana Labs, but not on this part)

BeefWellington · on Feb 3, 2022

> It's a bit tricky, so we can't say when....

I'm curious about this part, and I can absolutely understand if you don't want to answer but I do have the following question:

Why is it tricky to ensure an application can run on a cloud deployed system or a local Kubernetes/Docker Swarm/newfangle containerization mechanism of choice/etc. system?

Specifically I'm wondering what barriers you're running into that are pushing the focus to go cloud only.

mikewave · on Feb 2, 2022

Is there any hope of a Grafana Cloud data access proxy that runs on prem and enables us to give the Cloud access to databases we cannot expose?

netingle · on Feb 2, 2022

Yes! It’s something we’ve be mulling for a while, and I was just talking to one of the PMs about it this morning. This year for sure I hope.

netingle · on April 1, 2021

Grafana Labs | Full-time | 100% remote (world) | Software Engineer

Grafana Labs is hiring! Come work on Grafana, Prometheus, Cortex, Loki, Tempo and more - lots of opensource source, with both a SaaS and an enterprise team. We use Golang, JS, Typescript, Kubernetes, Jsonnet, Tanka, CUE and more.

We're growing fast, have lots of happy customers and many exciting projects in the pipeline. We need good engineers to help us take Grafana and Prometheus to the masses.

netingle · on Oct 28, 2020

PRs welcome!

corford · on Oct 28, 2020

You just raised $50 million....

buro9 · on Oct 28, 2020

We also have open roles https://grafana.com/about/careers/#jobs

corford · on Oct 29, 2020

Not sure why this is being downvoted... Hiring people to fix things is a lot more reasonable (and a good idea!) than asking for free labour when you have $50 million in the bank.

nullsense · on Oct 28, 2020

Bug fixes appreciated