Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I can't recommend serious use of an all-in-one local Grafana Loki setup (utcc.utoronto.ca)
126 points by ink_13 on April 28, 2023 | hide | past | favorite | 93 comments


Ed here, (@slim-bean on github), one of the original authors and lead of the Loki project.

The criticisms brought by Chris are valid, and it's good feedback. I just wish it was done so in a more constructive manner.

I personally know what Loki is capable of, I've watched it grow from something tiny to something I'm incredibly proud of and in awe of.

I run Loki on several Raspberry Pi's ingesting 20-100GB a day, I also run Loki clusters on thousands of cores ingesting hundreds of TB's a day. It's an amazingly flexible and capable project built by an amazing team by a company I'm incredibly proud to work for.

That being said....

> Loki doesn't feel like it's been built to be operable by people who don't know its code and its internal details

How true this is... while I know for a fact there are hundreds to thousands of folks out there who are successfully running Loki (and thank you to those who share your success stories, it means the world to us), it can be very rough around the edges...

But we are working hard to improve this, we are doing the best we can.

All I really ask is for folks who want to give feedback about Loki, PLEASE DO! but please be patient.

The author of this article says they tried to work with us to improve Loki and that it didn't work out. I'm truly sorry for that, these posts are full of constructive feedback and I would really love to see this put into issues and pull requests to make the project better.


As a counterpoint to the article, I use loki, grafana, and promtail to manage logs for 5 hosts running a number of different services. I have had absolutely no issues for the last year. I haven't run into any of the issues the OP did. I suppose it's possible that he's running at a scale where certain problems happen that I won't see with my setup.

But honestly I sort of assumed that if you are going to manage hundreds of machines you should probably look at the more scalable configurations anyway. But if you are just doing a handful of machines it's more than capable in the local store only version.


Can you talk more about your setup and process with Loki on the RPIs, please


If you use Loki as the default logging driver with Docker and the Loki container shuts down, the rest of your containers will freeze up. This has been an issue for almost 3 years.

https://github.com/grafana/loki/issues/2361


That sounds like docker issue on top of loki... regardless of the provider of the logging it should not do that by default. I can see some cases where you wouldn't ever want to lose a log message, but in vast majority of cases sacrificing some stdout logs for application actually working is preferable


Docker has a long history of issues with the daemon causing containers to freeze when dependencies go away of have issues. It's one of the reasons I prefer other container runtimes.


Was a pain to get setup initially with all of the moving parts and the docker plugin for it, but has been working well for me ever since in my homelab. Smooth sailing.


Parseable is an open source Loki alternative.

- Single binary - Written in Rust (lightweight, fast and stable) - Use S3 bucket or Mount point - Visualize with Grafana

https://github.com/parseablehq/parseable

(founder here)


Post a Show HN?


Will do soon!


Can Parseable be sharded, or can you just run a single instance?


Not yet, but we have this on roadmap. We'll add this within a few months.


It really is amazing at just how bad the various log shipping systems are for the simple use case of "I have logs on some servers and I want them to be over here." We somehow peaked at rsyslog and have been struggling ever since.

If you don't follow the one-true-architecture you will get bitten in a million ways.

* Log ingestion on the host pulls logs from the application/system/whatever, timestamps the logs itself (bc when you're interested in failure states do you really trust the log emitted by a broken app? Also because devs are famously bad a timezones), adds it's own metadata, and stores them in a local outbox queue.

* Local log ingestion determines where to send logs based on service discovery and periodically updates.

* Log ingestor ships the logs to a durable queue and flushes only after getting an ACK from the queue.

* Log processor reads from the queue and ships the logs off to persistent storage or a dead letter queue where you get an alert if it ever has something in it. Log processor only ACKs back to the queue only once it gets an ACK from the db. Logstash used to sin in this regard.

* Persistent storage treats logs as opaque blobs from the perspective of how they're physically stored. Indexes are time-window based depending on your volume, usually daily, and shipped off to different tiers / deleted on that basis.

This stack can horizontally scale indefinitely up to (and past since the queue backups allow you to temporarily fake more throughput than you really have) the throughput of your backing database.

I loathe how complicated and brittle the ELK stack is but they get this exactly right and if you implement it it becomes nigh-impossible to lose data. The market for "ELK style architecture but not the size of a 400 lb gorilla" has got to be huge but is seemingly untapped last I checked.


Here is a question, I mean it honestly, I'm relatively old school and have built many apps using syslog. When it comes to doing log mining, I've got a fairly old school utility belt, I poke around with less, I cat through grep (really ripgrep), I cat through grep and pipe to awk and extract things. Sometimes I fire up cut. I get a ton of milage from sort and uniq. Obviously, I fire up zcat in place of cat when needed. I also generously apply find when needed. It feels like I find what I need pretty quickly. Admittedly, I generally don't have terabytes of logs with these tools but it handles 10s of GB shockingly well.

With Splunk, ELK, Greylog, it feels insanely pokey. I know they have the parsers and such. At times I've kind of boned up on their search syntax but I've never gone "all in" with any of them, maybe because they all don't seem like a really solid long term solution. They seems to have a different kind of model than what I want, the time range is kind of nice but often times I won't have a time range until later. My model involves winnowing down the the data I want and then extracting pieces and viewing the data different ways. Am I just using all these tools the wrong way? Is my mental model off? Maybe it's a log consistency thing, it's always sort of a great day when you get "Error: abc failed because xyz and def." and that's the answer to everything. Many times I'll be spending time looking at logs and I'll notice an increase in a certain behavior happened before the outage happened and that's the give away.. Then a new grafana dashboard is created with a new metric to try and identify that before it happens again.

Loki kind of looks like it supports my method but again, I'm back to that "I haven't gone all in" with it problem. As I'm rambling, I've seen these sexy dashboards with like red/yellow/green lights and some latency graphs and cool looking stuff and then a little table of the last 20 "log messages" and maybe I'm used to looking at logs that you don't show in your dashboard or something like that.

They all feel like a square hole to my round peg. Maybe it's just me.


At FOSDEM the talk on loki was described as a modern version of what you and I do with syslog servers

mine come in to an anycast IP on the network, one file per host, the syslog stamps the receive time at the start in "y-m-d-h-m-s+0000" format, in a y/m/d directory struture, bzip2 after a few days

I have a few scripts which I use to parse the logs and pull reports out (BGP drops/recover times for example), but most of the time tail/cat/sort/grep/cut/etc does the job. Where I differ is I use perl rather than awk.

Sure it doesn't scale to millions of terrabytes a second of minable personal information or whatever the average modern LAMP stack generates, but it currently records about 15G a day from 400 different devices just fine.


The whole stuff is so much worse than "old unix" architecture of "you give logger an address to push stuff, and it pushes stuff there".

We have DNS, we don't need log sender to have a service discovery mechanism on top of that. Set it to log server address and be done, scale at that point if you need to, we know how to do it.

Log processor doesn't need a fucking queue. Log sender does, for network reliability one. And that gives you ability to restart log processor quickly (only need to process current message in transit and close) and with zero impact (as long as you're down shorter than the logger's queue)

Only reason to add queue is if you have multiple readers for logs. That also conveniently gives you a form of QoS on log processor, if you read with equal rate from all sources the most spammy ones will hit their own internal queue limit first and wont cause other servers to miss the logs. Even then you might just op for the loggers sending things into 2 places at once.

The "shit logs" (whether by volume or needing messaging decoding) is a problem that's complex but IMO most of that should be within log processor, as it should be. That's also a good place to resolve any geoip or DNS if needed.


> We have DNS, we don't need log sender to have a service discovery mechanism on top of that.

Having service discovery solves some issues.

* DNS TTL and applications holding on to DNS names indefinitely (prometheus, haproxy, nginx, and I bet your app somewhere all do this).

* Applications that don't support DNS record priorities.

* Serving different results to different clients based on their identity that isn't isn't random.

> Log processor doesn't need a fucking queue. Log sender does, for network reliability one

Yes. That's what the queue is for. The log sender also has a queue but as it lives on the host itself minimizing its use is how you don't lose logs on server crashes. If your architecture is the log processor accepts logs, and stores them in a queue for buffering then you've implemented the same architecture. But if that queue lives on the log processor itself then you risk data loss if that server dies. Having a shared queue in front of the pool of log processors, is simpler, has better throughput, easier to shard, and more reliable. Logs can't get stuck on a particular processor anymore because its lease will end and another worker will pick it up.


Whenever I have to deal with logs, it's either:

a) simplicity of rsyslog

b) monstruosity of ELK | Grafana | etc.

Somehow I like Prometheus (I think it's "simple"), but it's not enough to display and search for logs. Somehow, none of the companies I have worked for, have used "simple tools" like rsyslog to handle logs. They all used cloud (Datadog, New relic) or self hosted (ELK, Prometheus + Grafana). I wonder why (I guess it's because "money buys you simplicity")

I just want the following:

- on each machine I want to get logs from: install the agent (a simple binary) + simple /etc/myagent.conf. The agent forwards logs to my "main log server"

- on my "main log server": install the "log processor" (again, just a binary please!) + simple /etc/mylogprocessor.conf. The "log processor" shows me a nice localhost:9090/ web interface in which I can search for logs (indexed by any field I want).

Easy, no? My use case is not thousands of machines nor Terabytes of data logs per second. I just have a few machines and I don't want to deal with multi-clustered solutions or anything like that. Just 2 binaries! Does that exist?


We just use rsyslog to send to ELK instance but it's less than perfect and it doesn't log everything we want to coz not every app have very good login.

The problem I have encountered that even "simple" (just my home NAS + few devices) setups require some log mungling to get useful info into whatever system uses it. Many apps don't have "log in JSON" option in the first place, and near-always there is no real standard in fields of that message either.

And also near-always I want to filter out or rate-limit some particularly spammy message or service just because I don't even want to look at it when browsing logs as it is just noise

> Easy, no? My use case is not thousands of machines nor Terabytes of data logs per second. I just have a few machines and I don't want to deal with multi-clustered solutions or anything like that. Just 2 binaries! Does that exist?

...graylog I guess ? I looked at it and it is apparently pretty integrated, but price on higher volumes made us do ELK on "actual big stuff"


> it becomes nigh-impossible to lose data

I imagine if you implement it "correctly" you don't lose data but my experience with Elastic Search has been horrible.

I've lost data many times, for things like logs reaching an artificial maximum number of indices, and ES shutting down, or just not being able to support the simple case of a log coming both as a json and as a plain-text; there's no setting to say "just cast to text if there's a conflict", it drops the log and the workaround is to find among the many outdated ES posts out there, a piece of Ruby code to fix that one case. Many other issues (I compiled a list of like 20 stupid things about ES and the many ways I've lost data and gave up adding stuff).


I spent 2 days fixing a Graylog instance last week. When the elasticsearch nodes gets too big they tend be quite hard to work with. And of course you only log in when there's a problem and forgot everything about the setup in the meantime.


> * Log ingestor ships the logs to a durable queue > * Log processor reads from the queue and ships the logs off to persistent storage

Why do we need the durable queue in between? Why not let the Log Ingestor ship the logs off to persistent storage?


Something needs to buffer in case of network errors. Although you're right that having that as separate element is not very useful.

Queue is useful if you want to write those logs into multiple places at once


Because this was created for 400lb gorillas.


Quickwit is an open source Loki alternative too.

Like said in one comment here, it works well on billions of logs on one modest instance. And Grafana integration is on the way :)

https://github.com/quickwit-oss/quickwit

(disclaimer: I'm one of the cofounders)


Care to give the Quickwit pitch? Especially on why it is better than Loki.


Good point.

Unlike index-free solutions like Loki or Parseable, Quickwit is built on top of a modern full-text search index (tantivy). At query time, Quickwit produces much faster results (all other things being equal: CPU, memory, etc.), especially when the volume of data to analyze is large or queries are complex (high cardinality values, aggregations). Quickwit also stores data in a columnar format, so it's also good at OLAP-style queries (no joins though).

This comes with a cost during ingestion; Quickwit is more resource hungry than Loki but can still ingest at 20MB/s to 40 MB/s on a commodity instance with 4CPU. Similarly, regarding storage footprint, Loki compresses logs better because it does not maintain those extra data structures. Still, a Quickwit index tends to be much smaller than an Elasticsearch index.

The next release of Quickwit (may) will be shortly followed by the publication of a benchmark against Elasticsearch/OpenSearch, and by another one later, against Loki. You'll be able to see for yourself.


From my little knowledge of Loki's internals. I think contrary to Loki, Quickwit uses a fully feature search engine library underneath called Tantivy (https://github.com/quickwit-oss/tantivy). Quickwit offers different services (indexer, searcher, ..) that can be ran and scaled independently. It also supports indexing from various sources including file, Rest API, Kafka, Pulsar, Kinesis and more are planned based on community interest. Last but not least, Elasticsearch query API support is being worked on.


I installed promtail a few weeks back and I ran into this bug, that had been outstanding for months: https://github.com/grafana/loki/issues/8663 (e.g. a fix had been written but had not been released):

Due to a buffering issue, Loki would exit in case of configuration error without printing any error message or anything at all.

There is definitely something weird about how the project is run.


I've run Loki in production for over a year... as far as I know, Loki is not designed to be done all-in-one, that's kind of a dev/local design.

N.b., I've run this in both AWS/GCP in a k8s scenario, against S3/GCS respectively as a long-term store.

I've also run Mimir, and they both work _fantastic_ in this deployment scenario (as described).


Not loki but regarding tempo, I've had good experience with running the components in kubernetes using the operator.

These components are typically build to be "cloud-native" which often means run on kubernetes. If you already run on kunernetes, grafana products are typically straightforward to run.

In general I think that is the target customer and use case for out of grafana cloud deployments. The all in one binaries are more toys to say hello world or maybe test things in local development.


Nice! RedHat built the Tempo operator and I have not had a chance to use it yet. We are looking to collaborate more closely on it soon.


It seems Red Hat believes in Loki

Red Hat logging product manager says: "We made the decision to move to Loki and Vector" https://www.youtube.com/watch?v=QZ4Hv85lEJ0&t=938s


Vector is wonderful, and while I have no experience with Loki, if it's as bad as this post and thread suggest, perhaps Vector makes it more manageable by normalising and buffering everything coming into it.

For example, another comment here talks about Loki locking up Docker if it's the logging backend and the container crashes. I suspect that wouldn't be possible, or would be less likely and more manageable with Vector in the middle because it will buffer. I've also dealt with normalising logs from different sources and it can be a pain, but Vector will do some or all of that already, reducing the requirements put on Loki.


Seems like a 'low-key' bad decision from the sounds of it.


That doesn't mean much unless you're their customer. "Works well enough and we have people that know it" is perfectly fine way to pick a tool, even if it is not technically the best one.


not sure that's a ringing endorsement, given all the other things Red Hat believes in.


They also push systemd…


For as much functionality as Systemd replaced (when you think about it it's an extremely ambitious project), it has been an extremely smooth transition.


Systemd is great.


I’ve been using quickwit.io for some local data processing job logs and it seems to be very easy to run, not very IO intensive and running fine on a single node with modest hardware with >2 billion log rows. It has a really cool dynamic schema feature too.

I found it easier to setup and configure than Loki.

The UI is very basic for now but I’m excited to see what the future holds for this project!


Thanks a lot for your kind comment!

To complete your description of Quickwit, It is a distributed search engine for logs and traces. It's written in Rust, ingest at speed, horizontally scalable, and separates compute from storage.

Last but not least, Grafana integration is planned for next month :)


While I myself use Loki for log aggregation for a small web service that I run, and have gotten a lot of value out of it, I agree with the author that the product is not friendly to use.

It's notoriously difficult to know why ingestion of certain logs failed, to the point where I run a staging monitoring environment to debug issues like these.


Grafana is a terribly ran company that once made a pretty decent OSS dashboard system. Stay away from them for anything but some charts.


And even the dashboard system kinda feels like there should be something better, but somehow there isn't (or I don't know it).


Is Grafana (the product) really that bad? we've been using it for years with hundreds of dashboards and have never had any complaints*

*until recently... the new "time series" panel is a disaster


It's not bad. You can question some decisions about it but currently and for last few years it's absolute peak with no competition in industry


no, it's not "that bad", but it also constantly annoys me with things that don't feel right, even though they generally can be made to do what I want in the end. It just feels odd to me that this is the best there is given how big a deal has been made about monitoring in the past decade.


You only think that because you haven't seen the alternatives...


I'm building a one-man SaaS and I'm currently shopping for a hosted monitoring service. Got the free trial of Grafana Cloud but I'm not a fan of their products in general. I know Prometheus, I have used DataDog in the past but seems crazy expensive spread thin with 1001 features.

What's a good all-in-one cloud monitoring solution that might optionally deal with logs as well?

Log monitoring is not actually a priority at this stage, I just want something to track metrics, chart them and to alert me when the servers are on fire.

Something open-source adjacent and not crazily expensive would be best. As I said I know Prometheus, but for some reason it's so flexible and free form I really do not enjoy using it.


Check out Coralogix! It comes in at a fraction of the price of DataDog (typically a 40-70% cost reduction for migrating customers), no service tiering (so you instantly get support, managed onboarding etc) with a complete, end to end, open source experience (even archived logs are stored in Parquet).

For your use case, Coralogix is awesome. It comes with a managed Grafana instance, if you wish to use that interface, or a custom dashboarding solution, metric driven alarms, release tagging, and much more.

You can find out more at https://coralogix.com/platform/metrics/


Regarding the docs being a bit out of touch I can agree, although it was helpful to find the `loki -print-config-stderr` command (that should be default when setting it up). It will print all of the current config options and their values. Very helpful since there was no GRPC TLS Client settings in the docs, but according to the source code there should be.

All in all I find a pretty new player on the market, but there is not much to compare it with. Given other product from Grafana I guess it will mature as well. There's always more mature projects like Graylog, but compared to that Loki is pretty small. But yeah, it got teeths, but dang it's fast!


I don't like being negative and I always appreciate open source / free software but..

I tried using it in a small k8s cluster on digital ocean. Initial installation using the recommended helm package was easy enough. However, it only saved a very short period of log data. I spent a fair amount of time searching docs and the web about how to increase the storage, with no luck. Such an obvious and common need should not be so difficult to configure. You should not have to deep dive, reverse engineer, and read source code in order to solve such simple problems.


In defense of Grafana Loki, it's awesome... if you have a lot of information about your telemetry data up front - i.e volumes, use cases, how you wish to query it etc.

A broader solution like Coralogix (https://coralogix.com/) is more appropriate if you're venturing into the unknown and you need more of a data discovery capability. The problem with most of THESE "all in one" platforms is they tried flexibility and feature breadth for cost. Coralogix is a wee bit different, and it gives users an awesome set of cost optimization tools, as well as a simple pricing model, to offer data discovery without a sudden increase in costs (or worse, surprise overages!).


I’ve always viewed all in one mode for Loki as a demo or very small install version. It’s meant to be run in its component architecture as a scaling ingestion and query engine.

That said: 1) I feel like Loki is languishing and not reaching its full potential. User experience needs a lot of work.

2) Grafana is a for profit company

3) Grafana sees its future in the margin rich Saas offering

4) open source is still supported, but only to a certain point and the rest is commercial. I wouldn’t expect material support if you’re not paying for it.


There's also Signoz (https://signoz.io) a YC-backed company but open source with (recently) paid hosted.


Looked interesting so I did what I try to do every time, checked the license.

Among other licenses I found this in one of the sub folders (/EE).

> This software and associated documentation files (the "Software") may only be used in production, if you (and any entity that you represent) have agreed to, and are in compliance with, the SigNoz Subscription Terms of Service, available via email (hello@signoz.io) (the "Enterprise Terms"), or other agreement governing the use of the Software, as agreed by you and SigNoz, and otherwise have a valid SigNoz Enterprise license for the correct number of user seats. [...]

I guess it is for enterprise edition or something but it was not immediately obvious to me what parts are under EE and which parts are under the MIT Expat license.


The base license seems like an MIT style license; with the comment that 'things in an ee/ directory are goverend by the license in that directory' -- so Amazon will just rewrite only that functionality when / if they decide to eat signoz's lunch.

Top level license:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.


Grafana Loki stores every data chunk in a separate file [1]. Data chunks are created every 2 hours per every stream [2] which receives at least a single log entry during the last 2 hours. This creates 12 * 30 = 360 chunk files per month per every active stream. If Grafana Loki is used for collecting logs from a thousand of services, and each service generates 10 different log streams, then the number of chunk files created by Loki during month will be 1000 * 10 * 360 = 3.6 millions [3]. This sounds like very strange design decision.

This was one of the reasons why we at VictoriaMetrics decided to start working on better solution for logs - VictoriaLogs [4].

[1] https://utcc.utoronto.ca/~cks/space/blog/sysadmin/GrafanaLok...

[2] https://grafana.com/docs/loki/latest/fundamentals/labels/

[3] https://utcc.utoronto.ca/~cks/space/blog/sysadmin/GrafanaLok...

[4] https://www.youtube.com/watch?v=Gu96Fj2l7ls&t=1950s

[5] https://www.slideshare.net/VictoriaMetrics/victorialogs-prev...


As noted [1] they are configurable.

  ingester:
    chunk_retain_period: 30s
    chunk_idle_period: 5m0s
    chunk_block_size: 262144
    chunk_target_size: 1572864
    chunk_encoding: gzip
    max_chunk_age: 2h0m0s

[1] https://grafana.com/docs/loki/latest/operations/storage/file...


While this post does raise some valid points, it seems to underscore an underlying issue with many open source software: the lack of comprehensive documentation and the assumption of a high level of technical know-how. The issues with Loki seem largely symptomatic of this trend. The good news is, it's a fixable problem. The bad news is, it's often not prioritized by the developers.

Moreover, the post speaks volumes about the potential pitfalls of "all-in-one" solutions. These can seem appealing due to their simplicity, but as this author has experienced, they often come with their own challenges, especially when things go wrong.

The friction with the Loki setup underscores the importance of having a good contingency plan in place for log management systems, as loss of log data can be a severe issue.

As for the author's point about the devs wanting you to use their cloud service, it's a common business model. Free or cheap software that's difficult to set up on your own, but with a paid service that makes it easy. The question is whether this trade-off is worth it for your specific use case.


> An attempt to upgrade our Loki 2.7.4 to 2.8.1 failed badly and could not be reverted, forcing us to delete our entire accumulated log data for the second time in a few months (after the first time).

This sucks, but it’s also why you take filesystem snapshots or perform a backup before upgrades.


And then you lose logs since the bad upgrade instead of until the bad upgrade. Definitely not better.


The last paragraph adds some context

> PS: Loki also has some container-ized multi-component run-it-yourself example setups. I don't have any experience with them so I have no idea if they're better supported and more reliable in practice than the all-in-one version (which isn't particularly, as we've seen). A container based setup ingesting custom application logs with low label cardinality and storing the actual logs in the cloud instead of the filesystem may be a much better place to be for using Loki in practice than 'all in one systemd journal ingestion to the filesystem'.

Author may be holding the tool wrong, using it for a scenario it was not optimized for


Loki and Mimir seems like just a complete skip compared to respective competitors when it comes to storing data/logs. Grafana itself is great and Kibana could certainly use some competition


Are there some more reasonable open-source alternatives?



So far, no problems using Vector to ship logs to Grafana Cloud (Loki).


That's kind of his point. They don't seem to care about local.


I understand. For us, Loki locally is just to provide parity for test environments.

However, I would have used vector anyway locally as I prefer to have a centralised collector.


While it's not a tool I love, we have been running it for over a year, without issue.

We store maybe 250GB worth of logs in each instance, and ingest an estimated 1-2k lines a second.


Would a php script which fires up grep to sift through rsyslogd files be feasible as an alternative?

I'm considering right now to implement this.


Syslog-ng has a feature where each line can be piped into a long running program (eg Perl) and parse/match each line as it arrives. Then pump matched lines to the necessary thing.


Loki is open source. OP should take some of that frustrated energy and create a PR to improve the docs. I don't think this is a matter of Grafana Inc not wanting people to run Loki themselves but rather a resource constraints issue.


Unfortunately, Grafana discourages contributions from the community, even for its documentation.


Loki docs are here: https://github.com/grafana/loki/tree/main/docs/sources they are OSS.

The last contributor to the docs was an hour ago (at time of writing this comment) and came from a maintainer not employed by Grafana Labs.

Looking down the recent commits I see lots of activities from non-Grafana employees that have been accepted.

If there are specific issues with contributing docs or code please do point me towards them.


I work on Grafana Tempo and this is not at all how I see things. I'm sorry if you've had a bad experience, but we work quite hard to field questions, PRs, and suggestions.

Every day I wake up and spend at least an hour reading and responding to issues and PRs in the Tempo repo.


I discovered that Grafana doesn't share the info how to build custom packages either – https://github.com/grafana/grafana/issues/30963


Not that I don't believe that's entirely possible, but do you have a link to something demonstrating this behavior? A _very_ cursory google search didn't come up with anything immediately.

Like this article might just be for show but its the first thing that came up ¯\_(ツ)_/¯ https://grafana.com/docs/grafana/latest/developers/contribut...


Grafana requires you to sign a CLA before they will accept any work, which can be really expensive (unless you have in-house lawyers or don't care about understanding the real ramifications of a contract): https://grafana.com/docs/grafana/latest/developers/cla/


This appears to be a 7 clause contract written in good faith to ensure Grafana Labs can continue building a product and service offering around their open source project after accepting your contribution.

Am I missing something? Do you have any specific problems with the CLA? Is there an alternative option to a CLA that ensures the original copyright owner can continue offering the code base under multiple licenses after accepting an external contribution?


Perhaps my contributions are also made in good faith with full awareness that they are subject to the licenses?

We did OSS for decades without CLAs and most projects still do not require CLAs.


Subject to which licenses? The current license? Is Grafana obligated to keep the licenses they distribute the current software under unmodified forever after you contribute? Do they loose their right to modify the license of the codebase with your contribution? If they decide to move it to GPL, do they need to get your permission first? If they decide to take the project closed source, do they need your permission? Can you sue for damages if they don’t get your permission first?

Your good faith doesn’t hold up in court and I understand why they’d want to clarify ownership of the contributions. Just because we’ve always done it this way doesn’t mean people aren’t open to liability. Just because another project accepts the risk of a random contributor winning a lawsuit against them doesn’t mean Grafana should. I’m surprised CLAs aren’t more common.

I was personally surprised at how generous their CLA was with ownership rights for you and your contribution to the project. You retain a lot when contributing.


Do you also run every single EULA through your lawyers?

CLAs are becoming common because litigation is becoming common. It's a product of our times and mostly a safeguard for companies in case a person is able to slip in some malevolent code or write hate speech in docs, or if someone tries to claim copyright on docs/code.


Is there anything nonstandard or suspicious about this CLA?


The fact it is CLA means they want to have option to close the code.

I can see someone giving their work to OSS project not wanting the corporation that took it to have option to just take it and close.

Linux kernel uses "Developer Certificate of Origin" which is basically just "I certify that I contribute stuff I have rights for". That is enough.

CLA is entirely to detriment of actual OSS


I don't know. Do you really know what you're giving up by signing these? I'd have to study CLAs and hope that I am interpreting them correctly within our respective jurisdictions, or ask a lawyer.

All of this stands in the way of contributing. And this is their decision to make, of course, but it is hostile to would-be contributors.


If you don't understand the legal implications of the license and CLA, please don't start rumours like "Grafana discourages contributions from the community, even for its documentation".

I could try explain to you what the purpose of a CLA is, but you could also easily put in the effort by going on Google.

Make sure to also look into the implications of contributing code with an OSS license, any license. That's as much contract as a CLA is.


> Do you really know what you're giving up by signing these?

Yes. Everything. That's the point of near-every CLA.

So the corporation behind it have option to close the code if they want to, taking your contributions with it.

Some corps might not ever do it, but any company is one MBA away from "what we can cut from OSS version and move to enterprise to get more customers"?


I mean, this flatly contradicts this line in the CLA:

> Except for the license granted herein to Grafana Labs and recipients of software distributed by Grafana Labs, You reserve all right, title, and interest in and to Your Contributions. [emphasis mine]

You can always use the version of the code your contributions went into. You still own the code you wrote, and still retain the rest of theirs under its original license. You're just not entitled to future versions of the source like a copyleft license without a CLA would grant.

I understand not being excited to serve corporate interests (that's what a CLA does), but posting intellectually uncurious flamebait as a result makes for boring reading.


I do, actually. It looks like a standard CLA. It grants them a perpetual license to whatever code you're contributing and allows them to use it as they please, and not affecting any of your other rights. There's also some stuff in there about you only contributing code that you actually have the right to contribute, and how to manage edge cases around that. It's fairly standard when someone intends to be able to use your contribution as part of a greater open-core / closed-source distribution and to be able to simplify licensing matters. They have to protect their own interests and they're doing so in a manner that basically doesn't affect you at all.

Do you hire a lawyer any time you encounter an unfamiliar OSS license? I assume you don't, or you'd have a hard time using any modern packaging ecosystem.


That's a run-of-the-mill CLA.


Not sure what you mean. Grafana as in the dashboard do seem to not accept patches in some sense, I think it's related to their enterprise offering. Loki was quite the breeze to send patches to. Thanks chaudum!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: