I very much agree that kubernetes is useful in an environment that doesn’t need to scale, but do tell how it enables consolidated logging and monitoring, since my medium/small shop is spending quite some time setting up our own infrastructure for it.
Installing a managed log ingestor is stupidly easy in Kubernetes. For example, on GCP here's the guide to getting it done [1]. Two kubectl commands, and you get centralized logging across hundreds of nodes in your cluster and thousands of containers within them. Most other platforms (like Datadog) have similar setups.
Infrastructure level monitoring is also very easy. For example, if you're on Datadog, you flip KUBERNETES=true as an environment variable in the datadog agent, and you'll instantly get events for stopped containers, with stopped reason (OOM, evictions, etc), which you can configure granular alerting on.
Let's say you're in a service-oriented environment and you want detailed network-level metrics between services (request latency, status codes, etc). No problem, two commands and you have Istio [2]. Istio has Jaeger built-in for distributed tracing, with an in-cluster dashboard, or you can export the OpenTracing spans to any service that supports OpenTracing. You can also export these metrics to Datadog or most other metrics services you use.
I will admit that these things are slightly easier Kubernetes, my original point was mostly just to say that Kubernetes itself doesn't really provide any of these things in meaningful ways - you just described a bunch of separate, nontrivial systems, that solve many but not all logging/monitoring needs.
I run a Filebeat container with privileges to read stdout/stderr of all other pods, which then forwards to ElasticSearch. (https://www.elastic.co/guide/en/beats/filebeat/master/runnin...). It's fairly straight forward, then Kibana + Watcher can ship alters to PagerDuty based on log patterns / queries / limits, etc. I think Watcher is open-source/free now?
I also have Prometheus + grafana, which similarly collects lots of stats from around the cluster, but I'm fairly sure I'm the only person who uses that dashboard, since the only things hooked up to Prometheus are databases and such, no internal applications (yet!).
Being able to aggregate stdout/stderr across dozens of machines previously would have cost either a lot of Chef setup time or a contract with some provider. Now I get a fairly straight forward open-source stack that can be refined over time, and the yaml re-used very easily in any cluster. Plus, the metadata collected from Kubernetes about each log line is extremely useful (For example, out of the box you can query by Kubernetes labels for your graphs etc)
We are. I guess I wouldn’t consider that enough for our purposes. Actually retaining and alerting on logs, or alerting on anything for that matter is not out of the box unless I’m missing something.
The typical approach is to setup Fluentd for logging. You set it up as a daemonset, and have it mount /var/docker from the host. That gives it access to all container logs, which you then stream to your desired store.
Yeah - that’s far from batteries included though, and comes with many limitations, especially for non-12-factor apps. It also doesn’t begin to answer questions about what that log store is, or how to alert on the contents of logs.