Ehm, the contrast between my answer and everyone else's here makes me feel surpr...

nlh · on April 27, 2015

100% agree. Don't try to build your own when there are some excellent free (and commercial) ones that are battle-tested.

grep, cut, tail, etc. work quite well if you're working on a single machine or small number of machines.

The "ELK stack" (ElasticSearch, Logstash, Kibana) is a step up in complexity but gives you much more power than command line tools.

There are also some great commercial solutions that abstract away some of that complexity if you don't feel like rolling your own (Scalyr, Splunk, SumoLogic, etc.).

But regardless of the path you take, don't reinvent the wheel!

(Disclosure - I work for a company that provides one of the commercial solutions above - https://scalyr.com)

oconnore · on April 27, 2015

What? You just said you agree and then said that complex solutions are worth it because they're more powerful, which is the opposite.

kiyoto · on April 27, 2015

Unstructured logging, while easy for application developers and ops people, is a major source of headache for data scientists/engineers.

I am a big proponent of JSON-based, semi-structured logging. Most of log data today can be parsed with reasonable rigor at source, and doing this before shipping data to the backend saves so much future agony.

I recently blogged about this here: http://radar.oreilly.com/2015/04/the-log-the-lifeblood-of-yo...

thaumaturgy · on April 27, 2015

That was a good read (I wish there was more like it posted to HN). I think there's something to be said though for not adding complexity before it's needed, which isn't really covered in your article. You assume you're writing for an audience that needs to process transactions and events for thousands of servers, but I'd bet that's not the case for most of the people reading your article.

That kind of fractal complexity -- complexity added to every single layer of software architecture -- is a real pain in smaller environments where there's just one guy responsible for figuring out what the heck just went pear-shaped in the application. (Although, the real world problem I've had more often, as that guy, is nonexistent logs...)

Ideally, I'd like to see both purposes handled -- applications logging semi-structured JSON with a utility that's real-time translating the output to a human readable format, or, preferably, vice-versa, with applications logging unstructured data to a file that's then being munged into JSON (or whatever) for machine consumption.

I'm not such a fan of tools that are intended for humans to mine structured logs as though they were text files, because it's too easy to miss important data.

jordanthoms · on April 27, 2015

That's great if you have only one server, not so much if you want to search across many apps and servers at once. We use ELK, works fairly nicely.

patmcc · on April 28, 2015

Log rotation to a centralized log server solves that problem, although maybe at the loss of easy real-time access.

thaumaturgy · on April 27, 2015

Fair enough, I haven't had to deal with complex multi-server application logging.

CraigJPerry · on April 28, 2015

Your way still works. Syslog can ship over the network since forever.

It's a security pattern to have the logs shipped to another host in production anyway.

thaumaturgy · on April 28, 2015

I did seriously consider suggesting that (it would be my default approach) but since I don't actually have any experience with big complex multi-server applications, I decided to keep my mouth shut.

There might be a good reason why they don't do it that way and I wouldn't have a clue.

zero-g · on April 28, 2015

Second that. I tried to use ELK stack for application logs but it felt too windows-ish. I had to use my mouse and a lot of mouse-clicking to find what I needed. I was happy at my previous job when we had a single log-server which we ssh-ed to and got access to logs from all machines via NFS. It wasn't the fastest way to examine logs but it was very comfortable using all these grep, awk, sed, cut and etc tools.