> Now servers were effectively dumb boxes running containers, either on their ow...

> Now servers were effectively dumb boxes running containers, either on their own with Docker compose or as part of a fleet with Kubernetes/ECS/App Engine/Nomad/whatever new thing that has been invented in the last two weeks.

No joke, containers are amazing, regardless of how quickly you try to move or how often you need to deploy.

I remember a project where the performance turned out to be horrible because someone was running Oracle JDK 8 instead of OpenJDK 8 and that was enough to result in a huge discrepancy, here's an example of the request processing times during load tests: https://blog.kronis.dev/images/j/d/k/-/t/jdk-testing-compari...

That would have been solved by Ansible or something like it, of course, but containers get rid of that risk altogether, since you need to package the JDK your app needs (and that it will be tested on).

With a bit of work, using containers can be quite consistent and manageable - have Ansible or something similar set up your nodes that will run the containers, run a Docker Swarm, Hashicorp Nomad or Kubernetes cluster (K3s is great) that's more or less vanilla, something like Portainer or Rancher for easier management, Skywalking or one of those OpenTelemetry solutions for tracing and observability, throw in some uptime monitoring tools like Uptime Kuma, maybe even something like Zabbix or a more modern alternative for node monitoring and alerting and you're set. Anything that's self-hostable and doesn't tie you up with restrictive licenses (this also applies to using PostgreSQL or MariaDB instead of something like Oracle, if you can).

You don't need to have every team branch out into completely different tools because those are the new hotness, you don't need to run everything on PaaS/SaaS platforms when IaaS is enough, realistically most of what you need can be stored in a Git repo that will contain a pretty clear history of why things have been changed and even some Wiki pages and/or ADRs that explain how you've gotten here.

The situations in the article feel very much like corporate not caring and teams not talking to one another and having no coordination, or growing to a scale where direct communication no longer works yet not having anything in place to address that. If you're at that point, you should be able to throw money and human-years of work at the problem until it disappears, provided that people who hold the bag actually care.

For what it's worth, regardless of the tech you use or the scale you're at, you can still have someone in charge of the platform (or a team, where applicable), you can still have a DBA or a sysadmin, if you recognize their skills as important and needed.