Syslog is terrible

TheSwordsman · on Sept 4, 2016

> A popular sentiment is that binary logs are evil and the only way to properly log information is by using plain text.

> I don’t particularly care about the argument between plain text versus binary logs. However, if your reasoning for not wanting to use binary logs is because they are opaque and can be corrupted then you should take a close look at your log rotation or archival process. If you are archiving logs by first compressing them using something like gzip, you no longer have plain text log files.

I found quite a few people disliked binary log files due to concerns around their portability. Specifically when they depend on a platform-specific utility to read them. If they are compressed using a standard algorithm, it alleviates concerns about portability.

Johnny555 · on Sept 4, 2016

If you are archiving logs by first compressing them using something like gzip, you no longer have plain text log files.

Well yeah, but that's because I'm already done parsing the log and/or using it for debugging. If I wasn't done with it, I wouldn't have archived it.

But I know that when I pull a 3 year old log out of the archive, I can gunzip it and will still be able to read it, no matter how much the file format has changed since then, and I don't have to dig up a 3 year old magic decoder program (that may only run on one operating system) to be able to see what's in the file.

kiallmacinnes · on Sept 4, 2016

This kinda feels like a really really minor point to the article, the rest of the article - revolving around parseability of logs is a REALLY important part of running any production infrastructure.

Anyone who's ran production infra for long enough has written all the regexes, mostly from scratch, mostly badly. Myself included. None of my parsers would have caught the ssh 'root from 8.8.8.8'@server log.

Applications that offer structured logging are superior (in terms of logging), and we need to ensure we build applications that support this going forward.

For those in the python world, https://pypi.python.org/pypi/python-logstash is great. It's somewhat undocumented, but it can be used without logstash, writing python logs with their original arguments to JSON on disk (which we then had an agent ship to logstash, because, disk turns out to be a decent buffer for handling network glitches).

e.g. LOG.error("Invalid user %(user)s", {"user": "Kiall"}) can be logged as "ERROR: Invalid user Kiall", or:

   {"message": "Invalid user Kiall", "level": "ERROR", "extras": {"user": "Kiall"}}.

I'd love to get the raw uninterpolated message in there too, so that I could match on that, then show the formatted message, while allowing filtering on the extras.

pfranz · on Sept 4, 2016

Not only that, but zcat is standard on every system I've tried (which, admittedly, isn't very many) and vim reads gzipped files as if they're plaintext.

Last time I was writing a logging system the only reason I couldn't write directly to gzipped files was that I couldn't append to a gzipped file using Python (fairly important in logging). It looked like the reason was a Window's specific workaround having to do with seeking from the end of the file.

wtbob · on Sept 4, 2016

> But I know that when I pull a 3 year old log out of the archive, I can gunzip it and will still be able to read it, no matter how much the file format has changed since then, and I don't have to dig up a 3 year old magic decoder program (that may only run on one operating system) to be able to see what's in the file.

Oh, but you can rest assured that systemd will never, ever have a breaking change. Lennart Poettering can be trsuted to never do something that would cause anybody any trouble! /s

And that's the problem: systemd is a great idea, and it does a lot of really great things, but the man in charge simply can't be trusted not to break things. It was one thing when he was breaking desktops with PulseAudio; it's an entirely different thing now that his decisions can break servers (how many systems have broken due to nohup no longer doing its job?), and break the ability to retrieve logs over time.

I honestly think that systemd is going to arrive at a good place eventually, and a great deal of credit will belong to Poettering. But it will also cause a great deal of harm before it gets there, and he deserves a great deal of the blame.

brainfire · on Sept 4, 2016

Can you summarize how he "[broke] desktops with PulseAudio"? From my perspective as an end user it didn't seem to go badly at all. Was I insulated by my distribution maintainers?

wtbob · on Sept 4, 2016

> Can you summarize how he "[broke] desktops with PulseAudio"?

For a long while it was flaky and buggy, with audio periodically failing. It was never the end of the world, but it was annoying, and I think it runs decently enough now.

justinsaccount · on Sept 4, 2016

> how many systems have broken due to nohup no longer doing its job

Zero? As far as I know the change that defaulted KillUserProcesses to yes was reverted by every distribution that ships systemd.

parenthephobia · on Sept 4, 2016

The options aren't freeform text or an opaque proprietary binary format.

The problems discussed in the article could be solved with any structured data format. Logs could be streams of objects encoded using e.g. JSON, bencoding, or protocol buffers.

user5994461 · on Sept 4, 2016

First,

The article seems to mostly complain about text logs VS structured logs. That is unrelated to syslog because it is simply a log transport mechanism.

Applications should send structured logs. (e.g. JSON messages). The message may be delivered by whatever means (e.g. syslog).

---

Second,

I am surprised that the article doesn't mention the issues with syslog not being defined properly and suffering from interoperability issues.

Truth is: "syslog" refers to multiples, incompatible protocols that evolved other time. See two independent specifications for example:

https://tools.ietf.org/html/rfc5424 and https://tools.ietf.org/html/rfc3164

Applications, libraries and middleware (graylog/ELK/rsyslog/syslog-ng/fluentd) have interoperability issues because of the different syslog protocols. Just because two things are speaking "syslog" doesn't mean they speak the same "syslog".

Switches/routers/appliances cause even more issue because the custom implementations tend to not format messages perfectly for any of the RFC. ^^

justinsaccount · on Sept 4, 2016

Hi, author here :-)

> That is unrelated to syslog because it is simply a log transport mechanism.

Only partially, RFC5424 defines the structured data format (that unfortunately is not widely used)

syslog is also the syslog() interface, and as far as I know there is no reference implementation for encoding structured data in the format the RFC describes.

Personally I'd rather it be something like json.

I could patch openssh to log using json, but the chances of that ever getting merged are probably 0.

> See two independent specifications for example:

Well, RFC5424 says it obsoletes 3164 at least. 3164 is just embarrassing, especially the section on timestamps: https://tools.ietf.org/html/rfc3164#section-5.1

Adding a year to the date was "not consistent with the original intent of the order and format of the fields". 5424 at least says "Use this specific form of RFC3339"

> Switches/routers/appliances cause even more issue because the custom implementations tend to not format messages perfectly for any of the RFC

Ugh, I know it. As I said in another comment, we run syslog collection as a service so we receive logs from a large number of different devices not under our direct control.

user5994461 · on Sept 5, 2016

> about json and syslog structured data

I found syslog design to be well done about that, ignoring whether it was accidental or not :D

+1 for json. Format messages as JSON for logging. Update/reconfigure applications to write json logs.

The structured data headers of syslog can be used for enriching log messages with metadata (instance id, instance ip, tags, environment, etc...).

The RFC 5xxx design allows to manage the message (application's responsibility) and the metadata (relay's responsibility) separately, so it's nice.

[Note: If ALL messages were JSON logs and the tooling could manipulate json messages directly to add/remove fields, there would be no need for a different metadata channel.]

Hnrobert42 · on Sept 4, 2016

I went in to this article thinking, "this guy is stupid." I came away thinking, "syslog is stupid."

TenOhms · on Sept 4, 2016

Working with SIEMS a lot, I agree it needs a serious revision.

1. Define an encryption standard, both symmetric and asymmetric (to prevent log tampering). 2. Define a compression standard, with scheduling options similar to cron jobs but defined in syslog.conf. 3. Overhaul the facility field, define some generic ones like "Auth", "Audit", "Kernel" etc and don't hardcode any numeric mappings. Article covers this well. 4. Make CEF the standard format for writing logs to disk or forwarding to other devices. All modern applications and kernels should be writing their logs in CEF format.

debinguy · on Sept 4, 2016

Using DAQ's properly eliminate process blocking when using TCP delivery and there are connectivity issues. You can do TLS encryption natively. Rsyslog action templates are complex but extremely powerful. After reading this post I just wonder if you have ever read the Rsyslog documentation or done any large scale deployments with it? Our network is handling over 50k log messages a second using Rsyslog and while it's not perfect I can't think of any other standards based system I could rely on.

justinsaccount · on Sept 4, 2016

Hi, author here :-)

We do use rsyslog.. The rates we see aren't terribly high, but it is a fairly large deployment. We have a setup consisting of redundant tcp load balancers fronting redundant syslog relays feeding into additional applications and an archival box.

Basically this example in the RFC, but with a frontend load balancer cluster and additional collectors:

   +----------+         +-----+            +---------+
   |Originator|---->----|Relay|---->-------|Collector|
   |          |-+       +-----+        +---|         |
   +----------+  \                    /    +---------+
                  \     +-----+      /
                   +->--|Relay|-->--/
                        +-----+

For the pieces that we control and can use RELP things work great. The problem is that we run this as a service for a large group of heterogeneous systems. We split out logs by hostname, and it's not uncommon to wind up with an INFO.log at the end of the day because someone is sending us INFO where the hostname should be.

If every client could use rsyslog+relp and things like imfile to send us application logs, the whole system would work a lot better.

notaplumber · on Sept 4, 2016

OpenBSD's sendsyslog(2) made syslog_r(3) really cheap and usable virtually everywhere, also solved fd exhaustion issues.. and despite any protocol quirks.. syslog has a standard place in Unix arcana.

It would be great of other systems adopted this.

http://man.openbsd.org/OpenBSD-current/man2/sendsyslog.2

ycmbntrthrwaway · on Sept 4, 2016

At least we have wtmp/utmp that can be reliably parsed, unless musl libc is used.

ibotty · on Sept 5, 2016

An aside: I like writing the log message in logfmt: https://brandur.org/logfmt

kanwisher · on Sept 4, 2016

So syslog supports structured logs via JSON natively now.

justinsaccount · on Sept 4, 2016

Hi, author here :-)

Does syslog actually support this? I mean, you can send a JSON blob as part of the MSG field, but you still need to send the

  PRI VERSION TIMESTAMP HOSTNAME APP-NAME PROCID MSGID

header.

You can't just send something like

    {"ts": "2016-09-..", "host": "www1", "proc": "nginx", ..}

to a syslog server and have it understand it.

We do ship some json logs in rsyslog using imfile, but that's a bit different.