I can't recommend serious use of an all-in-one local Grafana Loki setup

edapted · on April 28, 2023

Ed here, (@slim-bean on github), one of the original authors and lead of the Loki project.

The criticisms brought by Chris are valid, and it's good feedback. I just wish it was done so in a more constructive manner.

I personally know what Loki is capable of, I've watched it grow from something tiny to something I'm incredibly proud of and in awe of.

I run Loki on several Raspberry Pi's ingesting 20-100GB a day, I also run Loki clusters on thousands of cores ingesting hundreds of TB's a day. It's an amazingly flexible and capable project built by an amazing team by a company I'm incredibly proud to work for.

That being said....

> Loki doesn't feel like it's been built to be operable by people who don't know its code and its internal details

How true this is... while I know for a fact there are hundreds to thousands of folks out there who are successfully running Loki (and thank you to those who share your success stories, it means the world to us), it can be very rough around the edges...

But we are working hard to improve this, we are doing the best we can.

All I really ask is for folks who want to give feedback about Loki, PLEASE DO! but please be patient.

The author of this article says they tried to work with us to improve Loki and that it didn't work out. I'm truly sorry for that, these posts are full of constructive feedback and I would really love to see this put into issues and pull requests to make the project better.

zaphar · on May 1, 2023

As a counterpoint to the article, I use loki, grafana, and promtail to manage logs for 5 hosts running a number of different services. I have had absolutely no issues for the last year. I haven't run into any of the issues the OP did. I suppose it's possible that he's running at a scale where certain problems happen that I won't see with my setup.

But honestly I sort of assumed that if you are going to manage hundreds of machines you should probably look at the more scalable configurations anyway. But if you are just doing a handful of machines it's more than capable in the local store only version.

_boffin_ · on April 28, 2023

Can you talk more about your setup and process with Loki on the RPIs, please

j0ner · on April 28, 2023

If you use Loki as the default logging driver with Docker and the Loki container shuts down, the rest of your containers will freeze up. This has been an issue for almost 3 years.

https://github.com/grafana/loki/issues/2361

ilyt · on April 28, 2023

That sounds like docker issue on top of loki... regardless of the provider of the logging it should not do that by default. I can see some cases where you wouldn't ever want to lose a log message, but in vast majority of cases sacrificing some stdout logs for application actually working is preferable

zaphar · on May 1, 2023

Docker has a long history of issues with the daemon causing containers to freeze when dependencies go away of have issues. It's one of the reasons I prefer other container runtimes.

nickstinemates · on April 28, 2023

Was a pain to get setup initially with all of the moving parts and the docker plugin for it, but has been working well for me ever since in my homelab. Smooth sailing.

tiwarinitish86 · on April 28, 2023

Parseable is an open source Loki alternative.

- Single binary - Written in Rust (lightweight, fast and stable) - Use S3 bucket or Mount point - Visualize with Grafana

https://github.com/parseablehq/parseable

(founder here)

tomrod · on April 28, 2023

Post a Show HN?

tiwarinitish86 · on April 29, 2023

Will do soon!

the_duke · on April 28, 2023

Can Parseable be sharded, or can you just run a single instance?

tiwarinitish86 · on April 29, 2023

Not yet, but we have this on roadmap. We'll add this within a few months.

Spivak · on April 28, 2023

It really is amazing at just how bad the various log shipping systems are for the simple use case of "I have logs on some servers and I want them to be over here." We somehow peaked at rsyslog and have been struggling ever since.

If you don't follow the one-true-architecture you will get bitten in a million ways.

* Log ingestion on the host pulls logs from the application/system/whatever, timestamps the logs itself (bc when you're interested in failure states do you really trust the log emitted by a broken app? Also because devs are famously bad a timezones), adds it's own metadata, and stores them in a local outbox queue.

* Local log ingestion determines where to send logs based on service discovery and periodically updates.

* Log ingestor ships the logs to a durable queue and flushes only after getting an ACK from the queue.

* Log processor reads from the queue and ships the logs off to persistent storage or a dead letter queue where you get an alert if it ever has something in it. Log processor only ACKs back to the queue only once it gets an ACK from the db. Logstash used to sin in this regard.

* Persistent storage treats logs as opaque blobs from the perspective of how they're physically stored. Indexes are time-window based depending on your volume, usually daily, and shipped off to different tiers / deleted on that basis.

This stack can horizontally scale indefinitely up to (and past since the queue backups allow you to temporarily fake more throughput than you really have) the throughput of your backing database.

I loathe how complicated and brittle the ELK stack is but they get this exactly right and if you implement it it becomes nigh-impossible to lose data. The market for "ELK style architecture but not the size of a 400 lb gorilla" has got to be huge but is seemingly untapped last I checked.

TheCondor · on April 28, 2023

Here is a question, I mean it honestly, I'm relatively old school and have built many apps using syslog. When it comes to doing log mining, I've got a fairly old school utility belt, I poke around with less, I cat through grep (really ripgrep), I cat through grep and pipe to awk and extract things. Sometimes I fire up cut. I get a ton of milage from sort and uniq. Obviously, I fire up zcat in place of cat when needed. I also generously apply find when needed. It feels like I find what I need pretty quickly. Admittedly, I generally don't have terabytes of logs with these tools but it handles 10s of GB shockingly well.

With Splunk, ELK, Greylog, it feels insanely pokey. I know they have the parsers and such. At times I've kind of boned up on their search syntax but I've never gone "all in" with any of them, maybe because they all don't seem like a really solid long term solution. They seems to have a different kind of model than what I want, the time range is kind of nice but often times I won't have a time range until later. My model involves winnowing down the the data I want and then extracting pieces and viewing the data different ways. Am I just using all these tools the wrong way? Is my mental model off? Maybe it's a log consistency thing, it's always sort of a great day when you get "Error: abc failed because xyz and def." and that's the answer to everything. Many times I'll be spending time looking at logs and I'll notice an increase in a certain behavior happened before the outage happened and that's the give away.. Then a new grafana dashboard is created with a new metric to try and identify that before it happens again.

Loki kind of looks like it supports my method but again, I'm back to that "I haven't gone all in" with it problem. As I'm rambling, I've seen these sexy dashboards with like red/yellow/green lights and some latency graphs and cool looking stuff and then a little table of the last 20 "log messages" and maybe I'm used to looking at logs that you don't show in your dashboard or something like that.

They all feel like a square hole to my round peg. Maybe it's just me.

ta1243 · on April 28, 2023

At FOSDEM the talk on loki was described as a modern version of what you and I do with syslog servers

mine come in to an anycast IP on the network, one file per host, the syslog stamps the receive time at the start in "y-m-d-h-m-s+0000" format, in a y/m/d directory struture, bzip2 after a few days

I have a few scripts which I use to parse the logs and pull reports out (BGP drops/recover times for example), but most of the time tail/cat/sort/grep/cut/etc does the job. Where I differ is I use perl rather than awk.

Sure it doesn't scale to millions of terrabytes a second of minable personal information or whatever the average modern LAMP stack generates, but it currently records about 15G a day from 400 different devices just fine.

ilyt · on April 28, 2023

The whole stuff is so much worse than "old unix" architecture of "you give logger an address to push stuff, and it pushes stuff there".

We have DNS, we don't need log sender to have a service discovery mechanism on top of that. Set it to log server address and be done, scale at that point if you need to, we know how to do it.

Log processor doesn't need a fucking queue. Log sender does, for network reliability one. And that gives you ability to restart log processor quickly (only need to process current message in transit and close) and with zero impact (as long as you're down shorter than the logger's queue)

Only reason to add queue is if you have multiple readers for logs. That also conveniently gives you a form of QoS on log processor, if you read with equal rate from all sources the most spammy ones will hit their own internal queue limit first and wont cause other servers to miss the logs. Even then you might just op for the loggers sending things into 2 places at once.

The "shit logs" (whether by volume or needing messaging decoding) is a problem that's complex but IMO most of that should be within log processor, as it should be. That's also a good place to resolve any geoip or DNS if needed.

Spivak · on April 28, 2023

> We have DNS, we don't need log sender to have a service discovery mechanism on top of that.

Having service discovery solves some issues.

* DNS TTL and applications holding on to DNS names indefinitely (prometheus, haproxy, nginx, and I bet your app somewhere all do this).

* Applications that don't support DNS record priorities.

* Serving different results to different clients based on their identity that isn't isn't random.

> Log processor doesn't need a fucking queue. Log sender does, for network reliability one

Yes. That's what the queue is for. The log sender also has a queue but as it lives on the host itself minimizing its use is how you don't lose logs on server crashes. If your architecture is the log processor accepts logs, and stores them in a queue for buffering then you've implemented the same architecture. But if that queue lives on the log processor itself then you risk data loss if that server dies. Having a shared queue in front of the pool of log processors, is simpler, has better throughput, easier to shard, and more reliable. Logs can't get stuck on a particular processor anymore because its lease will end and another worker will pick it up.

danwee · on April 28, 2023

Whenever I have to deal with logs, it's either:

a) simplicity of rsyslog

b) monstruosity of ELK | Grafana | etc.

Somehow I like Prometheus (I think it's "simple"), but it's not enough to display and search for logs. Somehow, none of the companies I have worked for, have used "simple tools" like rsyslog to handle logs. They all used cloud (Datadog, New relic) or self hosted (ELK, Prometheus + Grafana). I wonder why (I guess it's because "money buys you simplicity")

I just want the following:

- on each machine I want to get logs from: install the agent (a simple binary) + simple /etc/myagent.conf. The agent forwards logs to my "main log server"

- on my "main log server": install the "log processor" (again, just a binary please!) + simple /etc/mylogprocessor.conf. The "log processor" shows me a nice localhost:9090/ web interface in which I can search for logs (indexed by any field I want).

Easy, no? My use case is not thousands of machines nor Terabytes of data logs per second. I just have a few machines and I don't want to deal with multi-clustered solutions or anything like that. Just 2 binaries! Does that exist?

ilyt · on April 28, 2023

We just use rsyslog to send to ELK instance but it's less than perfect and it doesn't log everything we want to coz not every app have very good login.

The problem I have encountered that even "simple" (just my home NAS + few devices) setups require some log mungling to get useful info into whatever system uses it. Many apps don't have "log in JSON" option in the first place, and near-always there is no real standard in fields of that message either.

And also near-always I want to filter out or rate-limit some particularly spammy message or service just because I don't even want to look at it when browsing logs as it is just noise

> Easy, no? My use case is not thousands of machines nor Terabytes of data logs per second. I just have a few machines and I don't want to deal with multi-clustered solutions or anything like that. Just 2 binaries! Does that exist?

...graylog I guess ? I looked at it and it is apparently pretty integrated, but price on higher volumes made us do ELK on "actual big stuff"

lazyant · on April 28, 2023

> it becomes nigh-impossible to lose data

I imagine if you implement it "correctly" you don't lose data but my experience with Elastic Search has been horrible.

I've lost data many times, for things like logs reaching an artificial maximum number of indices, and ES shutting down, or just not being able to support the simple case of a log coming both as a json and as a plain-text; there's no setting to say "just cast to text if there's a conflict", it drops the log and the workaround is to find among the many outdated ES posts out there, a piece of Ruby code to fix that one case. Many other issues (I compiled a list of like 20 stupid things about ES and the many ways I've lost data and gave up adding stuff).

isodude · on April 28, 2023

I spent 2 days fixing a Graylog instance last week. When the elasticsearch nodes gets too big they tend be quite hard to work with. And of course you only log in when there's a problem and forgot everything about the setup in the meantime.

danwee · on April 28, 2023

> * Log ingestor ships the logs to a durable queue > * Log processor reads from the queue and ships the logs off to persistent storage

Why do we need the durable queue in between? Why not let the Log Ingestor ship the logs off to persistent storage?

ilyt · on April 28, 2023

Something needs to buffer in case of network errors. Although you're right that having that as separate element is not very useful.

Queue is useful if you want to write those logs into multiple places at once

marcosdumay · on April 28, 2023

Because this was created for 400lb gorillas.

francoismassot · on April 28, 2023

Quickwit is an open source Loki alternative too.

Like said in one comment here, it works well on billions of logs on one modest instance. And Grafana integration is on the way :)

https://github.com/quickwit-oss/quickwit

(disclaimer: I'm one of the cofounders)

SEJeff · on April 28, 2023

Care to give the Quickwit pitch? Especially on why it is better than Loki.

francoismassot · on April 28, 2023

Good point.

Unlike index-free solutions like Loki or Parseable, Quickwit is built on top of a modern full-text search index (tantivy). At query time, Quickwit produces much faster results (all other things being equal: CPU, memory, etc.), especially when the volume of data to analyze is large or queries are complex (high cardinality values, aggregations). Quickwit also stores data in a columnar format, so it's also good at OLAP-style queries (no joins though).

This comes with a cost during ingestion; Quickwit is more resource hungry than Loki but can still ingest at 20MB/s to 40 MB/s on a commodity instance with 4CPU. Similarly, regarding storage footprint, Loki compresses logs better because it does not maintain those extra data structures. Still, a Quickwit index tends to be much smaller than an Elasticsearch index.

The next release of Quickwit (may) will be shortly followed by the publication of a benchmark against Elasticsearch/OpenSearch, and by another one later, against Loki. You'll be able to see for yourself.

evanxg852000 · on April 28, 2023

From my little knowledge of Loki's internals. I think contrary to Loki, Quickwit uses a fully feature search engine library underneath called Tantivy (https://github.com/quickwit-oss/tantivy). Quickwit offers different services (indexer, searcher, ..) that can be ran and scaled independently. It also supports indexing from various sources including file, Rest API, Kafka, Pulsar, Kinesis and more are planned based on community interest. Last but not least, Elasticsearch query API support is being worked on.

remram · on April 28, 2023

I installed promtail a few weeks back and I ran into this bug, that had been outstanding for months: https://github.com/grafana/loki/issues/8663 (e.g. a fix had been written but had not been released):

Due to a buffering issue, Loki would exit in case of configuration error without printing any error message or anything at all.

There is definitely something weird about how the project is run.

pnathan · on April 28, 2023

I've run Loki in production for over a year... as far as I know, Loki is not designed to be done all-in-one, that's kind of a dev/local design.

N.b., I've run this in both AWS/GCP in a k8s scenario, against S3/GCS respectively as a long-term store.

I've also run Mimir, and they both work _fantastic_ in this deployment scenario (as described).

jerrygenser · on April 28, 2023

Not loki but regarding tempo, I've had good experience with running the components in kubernetes using the operator.

These components are typically build to be "cloud-native" which often means run on kubernetes. If you already run on kunernetes, grafana products are typically straightforward to run.

In general I think that is the target customer and use case for out of grafana cloud deployments. The all in one binaries are more toys to say hello world or maybe test things in local development.

number101010 · on April 28, 2023

Nice! RedHat built the Tempo operator and I have not had a chance to use it yet. We are looking to collaborate more closely on it soon.

eriksjolund · on April 28, 2023

It seems Red Hat believes in Loki

Red Hat logging product manager says: "We made the decision to move to Loki and Vector" https://www.youtube.com/watch?v=QZ4Hv85lEJ0&t=938s

danpalmer · on April 28, 2023

Vector is wonderful, and while I have no experience with Loki, if it's as bad as this post and thread suggest, perhaps Vector makes it more manageable by normalising and buffering everything coming into it.

For example, another comment here talks about Loki locking up Docker if it's the logging backend and the container crashes. I suspect that wouldn't be possible, or would be less likely and more manageable with Vector in the middle because it will buffer. I've also dealt with normalising logs from different sources and it can be a pain, but Vector will do some or all of that already, reducing the requirements put on Loki.

smcleod · on April 28, 2023

Seems like a 'low-key' bad decision from the sounds of it.

ilyt · on April 28, 2023

That doesn't mean much unless you're their customer. "Works well enough and we have people that know it" is perfectly fine way to pick a tool, even if it is not technically the best one.

felixgallo · on April 28, 2023

not sure that's a ringing endorsement, given all the other things Red Hat believes in.

medellin · on April 28, 2023

They also push systemd…

SteveNuts · on April 29, 2023

For as much functionality as Systemd replaced (when you think about it it's an extremely ambitious project), it has been an extremely smooth transition.

spicyusername · on April 28, 2023

Systemd is great.

madjam002 · on April 28, 2023

I’ve been using quickwit.io for some local data processing job logs and it seems to be very easy to run, not very IO intensive and running fine on a single node with modest hardware with >2 billion log rows. It has a really cool dynamic schema feature too.

I found it easier to setup and configure than Loki.

The UI is very basic for now but I’m excited to see what the future holds for this project!

evanxg852000 · on April 28, 2023

Thanks a lot for your kind comment!

To complete your description of Quickwit, It is a distributed search engine for logs and traces. It's written in Rust, ingest at speed, horizontally scalable, and separates compute from storage.

Last but not least, Grafana integration is planned for next month :)

supriyo-biswas · on April 28, 2023

While I myself use Loki for log aggregation for a small web service that I run, and have gotten a lot of value out of it, I agree with the author that the product is not friendly to use.

It's notoriously difficult to know why ingestion of certain logs failed, to the point where I run a staging monitoring environment to debug issues like these.

berkle4455 · on April 28, 2023

Grafana is a terribly ran company that once made a pretty decent OSS dashboard system. Stay away from them for anything but some charts.

detaro · on April 28, 2023

And even the dashboard system kinda feels like there should be something better, but somehow there isn't (or I don't know it).

x86a · on April 28, 2023

Is Grafana (the product) really that bad? we've been using it for years with hundreds of dashboards and have never had any complaints*

*until recently... the new "time series" panel is a disaster

ilyt · on April 28, 2023

It's not bad. You can question some decisions about it but currently and for last few years it's absolute peak with no competition in industry

detaro · on April 28, 2023

no, it's not "that bad", but it also constantly annoys me with things that don't feel right, even though they generally can be made to do what I want in the end. It just feels odd to me that this is the best there is given how big a deal has been made about monitoring in the past decade.

ilyt · on April 28, 2023

You only think that because you haven't seen the alternatives...

sph · on April 28, 2023

I'm building a one-man SaaS and I'm currently shopping for a hosted monitoring service. Got the free trial of Grafana Cloud but I'm not a fan of their products in general. I know Prometheus, I have used DataDog in the past but seems crazy expensive spread thin with 1001 features.

What's a good all-in-one cloud monitoring solution that might optionally deal with logs as well?

Log monitoring is not actually a priority at this stage, I just want something to track metrics, chart them and to alert me when the servers are on fire.

Something open-source adjacent and not crazily expensive would be best. As I said I know Prometheus, but for some reason it's so flexible and free form I really do not enjoy using it.

ChrisCooney · on May 3, 2023

Check out Coralogix! It comes in at a fraction of the price of DataDog (typically a 40-70% cost reduction for migrating customers), no service tiering (so you instantly get support, managed onboarding etc) with a complete, end to end, open source experience (even archived logs are stored in Parquet).

For your use case, Coralogix is awesome. It comes with a managed Grafana instance, if you wish to use that interface, or a custom dashboarding solution, metric driven alarms, release tagging, and much more.

You can find out more at https://coralogix.com/platform/metrics/

isodude · on April 28, 2023

Regarding the docs being a bit out of touch I can agree, although it was helpful to find the `loki -print-config-stderr` command (that should be default when setting it up). It will print all of the current config options and their values. Very helpful since there was no GRPC TLS Client settings in the docs, but according to the source code there should be.

All in all I find a pretty new player on the market, but there is not much to compare it with. Given other product from Grafana I guess it will mature as well. There's always more mature projects like Graylog, but compared to that Loki is pretty small. But yeah, it got teeths, but dang it's fast!

teliskr · on April 28, 2023

I don't like being negative and I always appreciate open source / free software but..

I tried using it in a small k8s cluster on digital ocean. Initial installation using the recommended helm package was easy enough. However, it only saved a very short period of log data. I spent a fair amount of time searching docs and the web about how to increase the storage, with no luck. Such an obvious and common need should not be so difficult to configure. You should not have to deep dive, reverse engineer, and read source code in order to solve such simple problems.

ChrisCooney · on May 3, 2023

In defense of Grafana Loki, it's awesome... if you have a lot of information about your telemetry data up front - i.e volumes, use cases, how you wish to query it etc.

A broader solution like Coralogix (https://coralogix.com/) is more appropriate if you're venturing into the unknown and you need more of a data discovery capability. The problem with most of THESE "all in one" platforms is they tried flexibility and feature breadth for cost. Coralogix is a wee bit different, and it gives users an awesome set of cost optimization tools, as well as a simple pricing model, to offer data discovery without a sudden increase in costs (or worse, surprise overages!).

fnordpiglet · on April 28, 2023

I’ve always viewed all in one mode for Loki as a demo or very small install version. It’s meant to be run in its component architecture as a scaling ingestion and query engine.

That said: 1) I feel like Loki is languishing and not reaching its full potential. User experience needs a lot of work.

2) Grafana is a for profit company

3) Grafana sees its future in the margin rich Saas offering

4) open source is still supported, but only to a certain point and the rest is commercial. I wouldn’t expect material support if you’re not paying for it.

wiradikusuma · on April 28, 2023

There's also Signoz (https://signoz.io) a YC-backed company but open source with (recently) paid hosted.

eitland · on April 28, 2023

Looked interesting so I did what I try to do every time, checked the license.

Among other licenses I found this in one of the sub folders (/EE).

> This software and associated documentation files (the "Software") may only be used in production, if you (and any entity that you represent) have agreed to, and are in compliance with, the SigNoz Subscription Terms of Service, available via email (hello@signoz.io) (the "Enterprise Terms"), or other agreement governing the use of the Software, as agreed by you and SigNoz, and otherwise have a valid SigNoz Enterprise license for the correct number of user seats. [...]

I guess it is for enterprise edition or something but it was not immediately obvious to me what parts are under EE and which parts are under the MIT Expat license.

cduzz · on April 28, 2023

The base license seems like an MIT style license; with the comment that 'things in an ee/ directory are goverend by the license in that directory' -- so Amazon will just rewrite only that functionality when / if they decide to eat signoz's lunch.

Top level license:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

valyala · on April 28, 2023

Grafana Loki stores every data chunk in a separate file [1]. Data chunks are created every 2 hours per every stream [2] which receives at least a single log entry during the last 2 hours. This creates 12 * 30 = 360 chunk files per month per every active stream. If Grafana Loki is used for collecting logs from a thousand of services, and each service generates 10 different log streams, then the number of chunk files created by Loki during month will be 1000 * 10 * 360 = 3.6 millions [3]. This sounds like very strange design decision.

This was one of the reasons why we at VictoriaMetrics decided to start working on better solution for logs - VictoriaLogs [4].

[1] https://utcc.utoronto.ca/~cks/space/blog/sysadmin/GrafanaLok...

[2] https://grafana.com/docs/loki/latest/fundamentals/labels/

[3] https://utcc.utoronto.ca/~cks/space/blog/sysadmin/GrafanaLok...

[4] https://www.youtube.com/watch?v=Gu96Fj2l7ls&t=1950s

[5] https://www.slideshare.net/VictoriaMetrics/victorialogs-prev...

isodude · on April 28, 2023

As noted [1] they are configurable.

  ingester:
    chunk_retain_period: 30s
    chunk_idle_period: 5m0s
    chunk_block_size: 262144
    chunk_target_size: 1572864
    chunk_encoding: gzip
    max_chunk_age: 2h0m0s

[1] https://grafana.com/docs/loki/latest/operations/storage/file...

disrupthq · on April 28, 2023

While this post does raise some valid points, it seems to underscore an underlying issue with many open source software: the lack of comprehensive documentation and the assumption of a high level of technical know-how. The issues with Loki seem largely symptomatic of this trend. The good news is, it's a fixable problem. The bad news is, it's often not prioritized by the developers.

Moreover, the post speaks volumes about the potential pitfalls of "all-in-one" solutions. These can seem appealing due to their simplicity, but as this author has experienced, they often come with their own challenges, especially when things go wrong.

The friction with the Loki setup underscores the importance of having a good contingency plan in place for log management systems, as loss of log data can be a severe issue.

As for the author's point about the devs wanting you to use their cloud service, it's a common business model. Free or cheap software that's difficult to set up on your own, but with a paid service that makes it easy. The question is whether this trade-off is worth it for your specific use case.

ochoseis · on April 28, 2023

> An attempt to upgrade our Loki 2.7.4 to 2.8.1 failed badly and could not be reverted, forcing us to delete our entire accumulated log data for the second time in a few months (after the first time).

This sucks, but it’s also why you take filesystem snapshots or perform a backup before upgrades.

remram · on April 28, 2023

And then you lose logs since the bad upgrade instead of until the bad upgrade. Definitely not better.

verdverm · on April 28, 2023

The last paragraph adds some context

> PS: Loki also has some container-ized multi-component run-it-yourself example setups. I don't have any experience with them so I have no idea if they're better supported and more reliable in practice than the all-in-one version (which isn't particularly, as we've seen). A container based setup ingesting custom application logs with low label cardinality and storing the actual logs in the cloud instead of the filesystem may be a much better place to be for using Loki in practice than 'all in one systemd journal ingestion to the filesystem'.

Author may be holding the tool wrong, using it for a scenario it was not optimized for

ilyt · on April 28, 2023

Loki and Mimir seems like just a complete skip compared to respective competitors when it comes to storing data/logs. Grafana itself is great and Kibana could certainly use some competition

nine_k · on April 28, 2023

Are there some more reasonable open-source alternatives?

e12e · on April 28, 2023

qryn?

Ed: https://qryn.metrico.in

pachico · on April 28, 2023

So far, no problems using Vector to ship logs to Grafana Cloud (Loki).

Twirrim · on April 28, 2023

That's kind of his point. They don't seem to care about local.

pachico · on April 28, 2023

I understand. For us, Loki locally is just to provide parity for test environments.

However, I would have used vector anyway locally as I prefer to have a centralised collector.

href · on April 28, 2023

While it's not a tool I love, we have been running it for over a year, without issue.

We store maybe 250GB worth of logs in each instance, and ingest an estimated 1-2k lines a second.

blueflow · on April 28, 2023

Would a php script which fires up grep to sift through rsyslogd files be feasible as an alternative?

I'm considering right now to implement this.

djbusby · on April 28, 2023

Syslog-ng has a feature where each line can be piped into a long running program (eg Perl) and parse/match each line as it arrives. Then pump matched lines to the necessary thing.

ciguy · on April 28, 2023

Loki is open source. OP should take some of that frustrated energy and create a PR to improve the docs. I don't think this is a matter of Grafana Inc not wanting people to run Loki themselves but rather a resource constraints issue.

dpkirchner · on April 28, 2023

Unfortunately, Grafana discourages contributions from the community, even for its documentation.

buro9 · on April 28, 2023

Loki docs are here: https://github.com/grafana/loki/tree/main/docs/sources they are OSS.

The last contributor to the docs was an hour ago (at time of writing this comment) and came from a maintainer not employed by Grafana Labs.

Looking down the recent commits I see lots of activities from non-Grafana employees that have been accepted.

If there are specific issues with contributing docs or code please do point me towards them.

number101010 · on April 28, 2023

I work on Grafana Tempo and this is not at all how I see things. I'm sorry if you've had a bad experience, but we work quite hard to field questions, PRs, and suggestions.

Every day I wake up and spend at least an hour reading and responding to issues and PRs in the Tempo repo.

obscurette · on April 28, 2023

I discovered that Grafana doesn't share the info how to build custom packages either – https://github.com/grafana/grafana/issues/30963

gganley · on April 28, 2023

Not that I don't believe that's entirely possible, but do you have a link to something demonstrating this behavior? A _very_ cursory google search didn't come up with anything immediately.

Like this article might just be for show but its the first thing that came up ¯\_(ツ)_/¯ https://grafana.com/docs/grafana/latest/developers/contribut...

dpkirchner · on April 28, 2023

Grafana requires you to sign a CLA before they will accept any work, which can be really expensive (unless you have in-house lawyers or don't care about understanding the real ramifications of a contract): https://grafana.com/docs/grafana/latest/developers/cla/

r3trohack3r · on April 28, 2023

This appears to be a 7 clause contract written in good faith to ensure Grafana Labs can continue building a product and service offering around their open source project after accepting your contribution.

Am I missing something? Do you have any specific problems with the CLA? Is there an alternative option to a CLA that ensures the original copyright owner can continue offering the code base under multiple licenses after accepting an external contribution?

dpkirchner · on April 28, 2023

Perhaps my contributions are also made in good faith with full awareness that they are subject to the licenses?

We did OSS for decades without CLAs and most projects still do not require CLAs.

r3trohack3r · on April 28, 2023

Subject to which licenses? The current license? Is Grafana obligated to keep the licenses they distribute the current software under unmodified forever after you contribute? Do they loose their right to modify the license of the codebase with your contribution? If they decide to move it to GPL, do they need to get your permission first? If they decide to take the project closed source, do they need your permission? Can you sue for damages if they don’t get your permission first?

Your good faith doesn’t hold up in court and I understand why they’d want to clarify ownership of the contributions. Just because we’ve always done it this way doesn’t mean people aren’t open to liability. Just because another project accepts the risk of a random contributor winning a lawsuit against them doesn’t mean Grafana should. I’m surprised CLAs aren’t more common.

I was personally surprised at how generous their CLA was with ownership rights for you and your contribution to the project. You retain a lot when contributing.

guhcampos · on April 28, 2023

Do you also run every single EULA through your lawyers?

CLAs are becoming common because litigation is becoming common. It's a product of our times and mostly a safeguard for companies in case a person is able to slip in some malevolent code or write hate speech in docs, or if someone tries to claim copyright on docs/code.

striking · on April 28, 2023

Is there anything nonstandard or suspicious about this CLA?

ilyt · on April 28, 2023

The fact it is CLA means they want to have option to close the code.

I can see someone giving their work to OSS project not wanting the corporation that took it to have option to just take it and close.

Linux kernel uses "Developer Certificate of Origin" which is basically just "I certify that I contribute stuff I have rights for". That is enough.

CLA is entirely to detriment of actual OSS

dpkirchner · on April 28, 2023

I don't know. Do you really know what you're giving up by signing these? I'd have to study CLAs and hope that I am interpreting them correctly within our respective jurisdictions, or ask a lawyer.

All of this stands in the way of contributing. And this is their decision to make, of course, but it is hostile to would-be contributors.

the-grump · on April 28, 2023

If you don't understand the legal implications of the license and CLA, please don't start rumours like "Grafana discourages contributions from the community, even for its documentation".

I could try explain to you what the purpose of a CLA is, but you could also easily put in the effort by going on Google.

Make sure to also look into the implications of contributing code with an OSS license, any license. That's as much contract as a CLA is.

ilyt · on April 28, 2023

> Do you really know what you're giving up by signing these?

Yes. Everything. That's the point of near-every CLA.

So the corporation behind it have option to close the code if they want to, taking your contributions with it.

Some corps might not ever do it, but any company is one MBA away from "what we can cut from OSS version and move to enterprise to get more customers"?

striking · on April 28, 2023

I mean, this flatly contradicts this line in the CLA:

> Except for the license granted herein to Grafana Labs and recipients of software distributed by Grafana Labs, You reserve all right, title, and interest in and to Your Contributions. [emphasis mine]

You can always use the version of the code your contributions went into. You still own the code you wrote, and still retain the rest of theirs under its original license. You're just not entitled to future versions of the source like a copyleft license without a CLA would grant.

I understand not being excited to serve corporate interests (that's what a CLA does), but posting intellectually uncurious flamebait as a result makes for boring reading.

striking · on April 28, 2023

I do, actually. It looks like a standard CLA. It grants them a perpetual license to whatever code you're contributing and allows them to use it as they please, and not affecting any of your other rights. There's also some stuff in there about you only contributing code that you actually have the right to contribute, and how to manage edge cases around that. It's fairly standard when someone intends to be able to use your contribution as part of a greater open-core / closed-source distribution and to be able to simplify licensing matters. They have to protect their own interests and they're doing so in a manner that basically doesn't affect you at all.

Do you hire a lawyer any time you encounter an unfamiliar OSS license? I assume you don't, or you'd have a hard time using any modern packaging ecosystem.

the-grump · on April 28, 2023

That's a run-of-the-mill CLA.

isodude · on April 28, 2023

Not sure what you mean. Grafana as in the dashboard do seem to not accept patches in some sense, I think it's related to their enterprise offering. Loki was quite the breeze to send patches to. Thanks chaudum!