I have built a panel like the one I mentioned for fun with friends! The goal of ...

dewey · 2025-05-12T03:56:01 1747022161

One persons toil is another persons fun.

strogonoff · 2025-05-12T04:24:25 1747023865

And sometimes a person is paid to pretend toil is fun. We are talking about spending hours setting up telemetry instead of playing a game.

dewey · 2025-05-12T04:32:17 1747024337

Not everyone is into gaming. I rather code on my side projects than use my console. Or people tweak and customize their Linux installation instead of doing work on it. Some people like to work on their cars, driving is a small part of it.

strogonoff · 2025-05-12T04:55:45 1747025745

I agree, and I am as guilty of procrastination. However, the author is not really procrastinating—he gets paid for this. Me, I do in fact procrastinate on setting up a Minecraft server infra in the cloud. Maybe that’s precisely why the solution to this problem strikes me as inadequate:

> So, the Minecraft server should work reliably and, if it goes down, I should know well before they do

How are metrics helpful? There is so much fun that could be had in setting up an actually resilient system instead.

Why worry over metrics and alerts when you could orchestrate an infrastructure that grants you the superpower of being able to spin up a server with a copy of the world on a whim instead (or even a system that auto-starts one whenever there is demand)?

dewey · 2025-05-12T05:07:41 1747026461

You are somehow very negative about this piece and are not understanding that your definition of fun is not universal.

As you said "There is so much fun that could be had in setting up an actually resilient system instead.", maybe the author has more fun setting up alerts and metrics instead of a resilient system like you do?

The truth is that in most real-world scenarios getting alerts, metrics is much more important than building a fully resilient system (Expensive, maybe overengieering for early stage etc.).

> However, the author is not really procrastinating—he gets paid for this. As the first sentence in the blog post says "One of the secret pleasures of life is to be paid for things you would do for free.", which I can very much understand as I often work or play with things I could use at work in my free time.

strogonoff · 2025-05-12T09:23:55 1747041835

> The truth is that in most real-world scenarios getting alerts, metrics is much more important than building a fully resilient system (Expensive, maybe overengieering for early stage etc.).

Funny, because I have the opposite opinion. Build for failure first; if it’s critical/production then also monitor, but if an earthquake takes down an EC2 zone and you have no ability to spin it up exactly the way it was then the avalanche of alerts and metrics falling off a cliff[0] isn’t exactly going to help you (or your mental well-being).

Generally speaking, if you build for failure first, then monitoring becomes much more useful and actionable; and simultaneously it becomes much less important for a hobby project.

[0] That assuming you gather them from a different zone that wasn’t affected by the same downtime in the first place; speaking of, how are you monitoring your monitors? and so on.

dewey · 2025-05-12T10:07:49 1747044469

This thread isn't going anywhere. If your startup hasn't found paying customers there's no need to build earthquake-resilient software. For most businesses that are not billion dollar companies there isn't.

Of course for engineers that's a nice challenge, but that's the reason why engineers without a business sense have a hard time building their own companies if you prioritize perfect code and overengineered infrastructure over talking to customers or building the business.

strogonoff · 2025-05-12T10:16:44 1747045004

I don’t think running a container, which takes one command and one small YAML file, is either overengineering or difficult.

mmanciop · 2025-05-12T07:26:33 1747034793

> As you said "There is so much fun that could be had in setting up an actually resilient system instead.", maybe the author has more fun setting up alerts and metrics instead of a resilient system like you do?

Adding the backup for the world files, already having Systemd bringing back a crashing server, makes the setup rather resilient. Sure, there's infinite more things that can go wrong, but with swiftly decreasing likelihood.

> The truth is that in most real-world scenarios getting alerts, metrics is much more important than building a fully resilient system (Expensive, maybe overengieering for early stage etc.).

This, very much this.

> However, the author is not really procrastinating—he gets paid for this. As the first sentence in the blog post says "One of the secret pleasures of life is to be paid for things you would do for free.", which I can very much understand as I often work or play with things I could use at work in my free time.

Yes :-)

mmanciop · 2025-05-12T07:23:36 1747034616

> How are metrics helpful? There is so much fun that could be had in setting up an actually resilient system instead.

Metrics are the means to an end of alerting. And with alerting, I mean getting pinged on my phone when something important breaks. Like, you know, the server going down.

> Why worry over metrics and alerts when you could orchestrate an infrastructure that grants you the superpower of being able to spin up a server with a copy of the world on a whim instead (or even a system that auto-starts one whenever there is demand)?

As somebody who has run cloud and enterprise software for almost two decades now, I can be that needs monitoring too. The more moving parts there are, the more things go wrong. The more things go wrong, and the more you care they get fixed, the more monitoring you need :-)

strogonoff · 2025-05-12T09:39:39 1747042779

Do you really need to be urgently made aware that it’s down, if the system could simply spin up a new container and keep on as it were? You could still see that it had to do it, and if in the mood investigate it, but the matter of first importance is taken care for you.

> As somebody who has run cloud and enterprise software for almost two decades now, I can be that needs monitoring too

To be clear, I strongly believe that if you run anything seriously in production, you must monitor it—but first you need to be able to spin it back up with minimal effort. It may take a while to get there if you just inherited a rusty legacy snowflake monolith that no one dares to breathe the wrong way near, but if you are starting anew it is a bad mistake to not have that down first considering how straightforward it is nowadays.

Then, for hobby projects of low criticality (because people in this thread mistakenly assume I mean any personal project, I have to reiterate: nothing controlling points of ingress into your house or the like), you may find that once you have the latter, the former becomes optional and not really that interesting anymore.

mmanciop · 2025-05-12T07:21:29 1747034489

I swear I had a lot of fun setting doing the setup.

I am also a massive observability nerd, so YMMV :-)

strogonoff · 2025-05-12T09:56:48 1747043808

I believe you! Just due to your affiliation I wanted to highlight to any newbie SREs in the audience that perhaps there is a better way. I still think my approach is better, but we can do things differently.

mmanciop · 2025-05-12T10:35:35 1747046135

Indeed if there were “official” container images out there, I might have instead run the server on Google Cloud Run or AWS AppRunner, without having to take care of the Linux underneath. Or an Amazon ECS task. I don’t have a Kubernetes cluster, but I will at some point make a version of this blog to run it on K8s :-)