Ehm, the contrast between my answer and everyone else's here makes me feel surprisingly greybearded, but...
Application logging has been a solved problem for decades now. syslog or direct-to-disk in a reasonable format, let logrotate do the job it's faithfully done for years and let the gzipped old files get picked up by the offsite backups that you're surely running, and use the standard collection of tools for mining text files: grep, cut, tail, etc.
I'm a little weirded out that "my logs are too big" is still a thing, and that the most common answer to this is "glue even more complexity together".
100% agree. Don't try to build your own when there are some excellent free (and commercial) ones that are battle-tested.
grep, cut, tail, etc. work quite well if you're working on a single machine or small number of machines.
The "ELK stack" (ElasticSearch, Logstash, Kibana) is a step up in complexity but gives you much more power than command line tools.
There are also some great commercial solutions that abstract away some of that complexity if you don't feel like rolling your own (Scalyr, Splunk, SumoLogic, etc.).
But regardless of the path you take, don't reinvent the wheel!
(Disclosure - I work for a company that provides one of the commercial solutions above - https://scalyr.com)
Unstructured logging, while easy for application developers and ops people, is a major source of headache for data scientists/engineers.
I am a big proponent of JSON-based, semi-structured logging. Most of log data today can be parsed with reasonable rigor at source, and doing this before shipping data to the backend saves so much future agony.
That was a good read (I wish there was more like it posted to HN). I think there's something to be said though for not adding complexity before it's needed, which isn't really covered in your article. You assume you're writing for an audience that needs to process transactions and events for thousands of servers, but I'd bet that's not the case for most of the people reading your article.
That kind of fractal complexity -- complexity added to every single layer of software architecture -- is a real pain in smaller environments where there's just one guy responsible for figuring out what the heck just went pear-shaped in the application. (Although, the real world problem I've had more often, as that guy, is nonexistent logs...)
Ideally, I'd like to see both purposes handled -- applications logging semi-structured JSON with a utility that's real-time translating the output to a human readable format, or, preferably, vice-versa, with applications logging unstructured data to a file that's then being munged into JSON (or whatever) for machine consumption.
I'm not such a fan of tools that are intended for humans to mine structured logs as though they were text files, because it's too easy to miss important data.
I did seriously consider suggesting that (it would be my default approach) but since I don't actually have any experience with big complex multi-server applications, I decided to keep my mouth shut.
There might be a good reason why they don't do it that way and I wouldn't have a clue.
Second that. I tried to use ELK stack for application logs but it felt too windows-ish. I had to use my mouse and a lot of mouse-clicking to find what I needed.
I was happy at my previous job when we had a single log-server which we ssh-ed to and got access to logs from all machines via NFS. It wasn't the fastest way to examine logs but it was very comfortable using all these grep, awk, sed, cut and etc tools.
Application logging has been a solved problem for decades now. syslog or direct-to-disk in a reasonable format, let logrotate do the job it's faithfully done for years and let the gzipped old files get picked up by the offsite backups that you're surely running, and use the standard collection of tools for mining text files: grep, cut, tail, etc.
I'm a little weirded out that "my logs are too big" is still a thing, and that the most common answer to this is "glue even more complexity together".