Hacker News new | past | comments | ask | show | jobs | submit login
How I run my servers (2022) (wesleyac.com)
430 points by ingve on July 16, 2023 | hide | past | favorite | 174 comments



> In order to provide isolation, I run each service as its own unix user account.

systemd's DynamicUser feature could save some time here. It can allocate a uid, then create directories for logs/state with the correct permissions.

https://0pointer.net/blog/dynamic-users-with-systemd.html


It's really easy to create new "system" users with systemd-sysusers too, if you need the uid to be persistent!

You just drop a small text file (often a single line) into /etc/sysusers.d/ with the information about the user, like username, home directory and whatever, and then invoke the sysusers command or service!


Thanks for sharing this! My server setup is similar to the one described in this article, minus isolating apps with separate users. I'll give dynamic users a go next time I tweak the setup.


Pretty standard to use separate service principals for each service/app. Should also use separate servers


> service principals

Kerberos much? =)


Edit: the term service principal is current and isn’t specific to Kerberos

https://istio.io/latest/docs/concepts/security/#principals


It’s all fun and games until your KDC goes down!


HTTP triggered cloud functions are my new favorite thing. They can evaporate complexity if you dance around the various vendors carefully enough. This is the only cloud-native abstraction that feels "end game" to me. I still haven't been able to deploy a cloud function and get the "runner" into a state where I'd have to contact support or issue arcane console commands. I've done well over 2000 deploys by now for just one app alone with a 100% success rate.

Performance is fantastic (using isolated app service plans in Azure), and I am completely over the ideological principals against per-request billing. Absolutely, you can do it cheaper if you own the literal real estate the servers are running inside of. Paying for flat colo fees makes per-request costs look ludicrous on the surface. But, achieving all of the other attributes of simple HTTP triggered functions in a DIY context is very challenging without also spinning up a billion dollar corporation and hiring 1k more people. Compliance, audits, etc are where it gets real super fast.

The "what about lock-in?" argument doesn't work for me anymore. HTTP triggers are a pretty natural interface to code against: "I've got a HTTP request for you to review, give me some HTTP response please". The only stuff that is vendor-specific are the actual trigger method signatures and specific contexts beyond HTTP, such as OIDC or SAML claims. You'd really have to go out of your way to design an HTTP trigger web app/API solution that is impossible to refactor for another vendor within a week or so.

If your business is a personal blog, then yeah I get it. It's more fun to buy a VM in Hetzner and get all artisanal about it. Also, if you are operating in a totally unregulated industry, perhaps you can make some stronger arguments against completely outsourcing the servers in favor of splitting hairs on margin vs complexity.


> But, achieving all of the other attributes of simple HTTP triggered functions in a DIY context is very challenging without also spinning up a billion dollar corporation and hiring 1k more people.

I literally rolled my eyes reading that. How do you think we did before cloud computing?

I am currently in charge of multiple teams scaling and deploying innovative applications for a large industrial company. We are using Azure for everything. Our cloud costs are insane for our number of users. I used to manage applications with ten times more user for one hundredth of the cost and less complexity. It’s billed to another part of the company which is responsible for this dubious choice (I really hope someone is getting nice business trips paid by MS so it’s not a complete waste) so I don’t care but how people can blindly put faith in the cloud is beside me.


…And you’re only one mistake away from the spam/abuse detection bot from locking you out of your account and shutting off your business for 12-72 hours.


He said Microsoft, not Google.

For all of Microsoft's faults, you can get a person on the phone when you pay money.

There is a reason Microsoft kicks Google's ass all over the room in the enterprise space.


Interesting. When you say pay money do you mean as in paying for Azure resources, or paying EXTRA for somebody to pick up the phone?


He said Microsoft. Not Amazon.


For all the bad things I have to say about the cost, it’s still Microsoft. We have a direct line to them and they are here when you need them.


I work in an extremely highly regulated industry and I don't understand your last sentence. It is in our best interest to run all our own hardware. We can't even take pictures in the office, there's an absolute 0% chance we're trusting a cloud provider with anything.


> extremely highly regulated industry

Which one? We are using this stack in fintech, and are subject to PCI-DSS, SOX, AML, SOC2, etc. Many of our customers (small US banks & credit unions) are very interested in this kind of cookie-cutter, cloud-native stack in 2023.

> We can't even take pictures in the office, there's an absolute 0% chance we're trusting a cloud provider with anything.

Sounds like you work for an F100. Our IT budget is 5 figures and we are doing business with clients who have 6 figure IT budgets at the high end. Forcing an on-prem architecture would make our solution less secure in most situations, especially for those customers who do not have the confidence to run a complex SAAS solution on their premises. Many of our customers actively understand this reality and are very open to the idea of offloading complexity to the cloud.


> Forcing an on-prem architecture would make our solution less secure in most situations, especially for those customers who do not have the confidence to run a complex SAAS solution on their premises.

Yeah. I spent a couple of years at a billion-$ company in the telco industry which was subject to all sorts of US federal and foreign regulations (because they operated in 20+ countries.) They ran almost everything onprem, but seeing how that was managed, cloud would have 1000% more secure for them. At one point, the entire senior staff of the IT department was fired because of a security breach that was pretty clearly due to their poor decisions.

Companies do exist where their onprem operations do seem very secure, but you need really big budgets and good management to do that properly. Most places are not like that, even in the highly regulated spaces.


Sounds very similar to Optus breach in Australia.


This was a US company. Another fun thing that happened while I was there is that they had to throw out a major codebase and start from scratch because of security compromises, i.e. people working on it that shouldn't have been allowed to.

I suppose in the end they achieve a kind of security with this behavior, but it would be a lot better to avoid such incidents in the first place - which would be perfectly possible, with good decision-making.


Ah, yes, also fintech, but different scale. The magnitude of our IT spend is very different.


I'd bet pharma or defense


Function is an ok place to host a simple api. But don't you require more things for your apps to be useful, like databases which add cost?


Yes the functions use Azure SQL Hyperscale for persistence.

> which add cost

We aren't interested in free. You get what you pay for.


>You get what you pay for.

I would say it's a somewhat weak correlation that breaks down completely if you consider a wider range of architectures. Back when Google was a startup "you get what you pay for" is what people running Oracle on Solaris would tell you.

And there was a sense in which it was true. There was no more reliable and feature rich way of scaling up. But starting something like Google on top of this egregiously expensive platform would have been completely uneconomical. It just wouldn't have happened.

And I think it's the same thing today. There are good reasons for using those egregiously expensive offerings from the largest incumbants but if you are starting something where servers are a significant cost factor then you are well advised to look at other options that are orders of magnitude cheaper, even accounting for labour.


> HTTP triggered cloud functions are my new favorite thing.

What exactly are you talking about? Request handlers in HTTP servers? Function-as-a-service cloud computing services? Plain old RPC?


What do your functions do? If some script kiddie comes along and decides to query you 10 times per minute, how long until the function becomes more expensive than a VPS?


They are authenticated via AAD and effectively serve various SSR web applications. You can't actually invoke our functions without passing authentication first.


I see. Where does authentication and authorization happen?


Authentication occurs in the IdP (AAD). Authorization occurs within our application based upon claims returned by the IdP. We don't rely on any AD-specific mechanisms to authorize users (i.e. security groups). We associate application-specific roles with the user principal name in our database.


This has inspired me, I never thought of serving static sites on lambda but it makes a lot of sense. Cheers!


But why would one need any backend logic to fulfill that? Just upload your static files to S3, enable static website hosting and youre done. Takes you a few minutes. Haven't worked with one of the other vendors but I'd assume it's equally trivial ;)



Sure, in this case I would just use GitHub pages. Hosting a site on lambda is just something to try lol


Even better if the lambda is coded in PHP /s


I have a similar setup for my personal and project websites. Some similarities and differences:

* I use Linode VMs ($5/month).

* I too use Debian GNU/Linux.

* I use Common Lisp to write the software.

* In case of a personal website or blog, a static website is generated by a Common Lisp program. In case of an online service or web application, the service is written as a Common Lisp program that uses Hunchentoot to process HTTP requests and return HTTP responses.

* I too use systemd unit files to ensure that the website/service starts automatically when the VM starts or restarts. Most of my unit files are about 10-15 lines long.

* The initial configuration of the VM is coded as a shell script: https://github.com/susam/dotfiles/blob/main/linode.sh

* Project-specific or service-specific configuration is coded as individual Makefiles. Examples: https://github.com/susam/susam.net/blob/main/Makefile and https://github.com/susam/mathb/blob/main/Makefile

* I do not use containers. These websites have been running since several years before containers were popular. I have found that the initialization script and a Makefile have been sufficient for my needs so far.

* I use Nginx too. Nginx serves the static files as well as functions as a reverse proxy when there are backend services involved. Indeed TLS termination is an important benefit it offers. Other benefits include rate limiting requests, configuring an allowlist for HTTP headers to protect the backend service, etc.

* I have a little private playbook with a handful of commands like:

  curl LINK -o linode.sh && sh linode.sh
  git clone LINK && cd PROJECT && sudo make setup https
* The `make` targets do whatever is necessary to set up the website. This includes installing tools like Nginx, certbot, sbcl, etc., setting up Nginx configuration, setting up certificates, etc. Once the `make` command completes, the website is live on the world wide web.


Which Common Lisp implementation do you use? If it’s SBCL, has memory usage been a problem? Edit: I see SBCL in one of your Makefiles.

I use a VPS with 512 MB of RAM, but each SBCL instance uses roughly 100 MB of RAM, so I can only have a couple services at once.

I’ve considered moving the lowest traffic services to CLISP, but it’s missing at least one feature I use (package—local nicknames).


SBCL needs a relatively large amount of memory to start with, but beyond that it doesn't require much. I run a few small services written in Common Lisp. Instead of running a separate process for every one of them I run a single SBCL process and run each service in a different thread.

This has some downsides: Firstly they run in the same address space, so global state is shared and serious bugs can affect another service. Secondly they run as a single systemd service, so they're not easy to manage individually. Still I've found it to work quite nicely for things that are simple and don't really need attention.


Thanks! That setup is actually my backup plan if I start to run out of memory.

Maybe it’s my imagination but it seems like CL isn’t as suited to sharing code and read-only data pages across processes (e.g. in shared libraries). Or maybe there’s a solution I just haven’t found yet…


Yes, I use SBCL. For static websites, the memory usage is not a problem because SBCL exits after generating the website.

For long running services, it does consume about 100 MB of memory. For example, I have a service running right now with an uptime of 270 days. SBCL is currently consuming 108 MB of memory. This has not been a problem either because for low traffic websites like mine, this memory consumption size remains fairly stable. I have not found it to be varying much.


But however will you scale to 14 billion users when, one morning, waking up from anxious dreams, you discover that in bed you have been changed into a monstrous verminous 'berg?


You will probably have different problems when this happens.


Then, Mr. Gregor Samsa, your position is not to be envied.


Like a Zucker?


Time for Kafka?


Over the years, I kept tweaking my setup and now settled with running everything as a docker container. The orchestrator is docker-compose instead of systemd. The proxy is caddy instead of nginx. But same as the author, I also write a deploy script for each project I need to run. Overall I think it's quite similar.

One of the many benefits of using docker is that I can use the same setup to run 3rd party software. I've been using this setup for a few years now and it's awesome. It's robust like the author mentioned. But if you need the flexibility, you can also do whatever you want.

The only pain point I have right now is on rolling deployment. As my software scales, a few second of downtime every deployment is becoming an issue. I don't have a simple solution yet but perhaps docker swarm is the way to go.


I do the same as you using Caddy.

To avoid downtime try using:

    health_uri /health
    lb_try_duration 30s
Full example:

    api.xxx.se {
      encode gzip
      reverse_proxy api:8089 {
        health_uri /health
        lb_try_duration 30s
      }
    }
This way, Caddy will buffer the request and give 30 seconds for your new service to get online when you're deploying a new version.

Ideally, during deployment of a new version the new version should go live and healthy before caddy starts using it (and kills the old container). I've looked at https://github.com/Wowu/docker-rollout and https://github.com/lucaslorentz/caddy-docker-proxy but haven't had time to prioritize it yet.


That's neat, I wonder if there's a way to do that with nginx?

edit: closest I found is this manual way, using Lua: https://serverfault.com/questions/259665/nginx-proxy-retry-w...


If I understand you correctly, you do a sort of blue green deploy? Load balancing between two versions while deploying but only one most of the time?

How do you orchestrate the spinning up and down? Just a script to start service B, wait until service B is healthy, wait 10 seconds, stop service A, and caddy just smooths out the deployment?


Thanks for that. Didn't know this is a thing in Caddy. Seems low effort so I'll probably do that for now. I omitted it but I'm actually using caddy-docker-proxy. It's awesome, makes the config section be part of each project nicely. Haven't seen docker-rollout though. Seems like it could be promising.


If you've got a load balancer (like Caddy) in front of your pods you can configure it to hold requests while the new pod comes up: https://twitter.com/bradleyjkemp/status/1486756361845329927

It's not perfect but it means rather than getting connection errors, browsers will just spin for a couple seconds.

The same technique is used by https://mrsk.dev/


If you have more than one backend you can also reconfigure caddy on the fly to only serve from active ones while each one is being updated


I‘ve built up the software stack of the startup I work for from the beginning, and directly went for Docker to package our application. We started with compose in production, and improved by using a CD pipeline that would upgrade the stack automatically. Over time, the company and userbase grew, and we started running into the problems you mention: Restarting or deploying would cause downtime. Additionally, a desire to run additional apps came up; every time, this would necessitate me preparing a new deployment environment. I dreaded the day we’d need to start using Kubernetes, as I’ve seen the complexity this causes first-hand before, and was really weary of having to spend most of the day caressing the cluster.

So instead, we went for Swarm mode. Oh, what a journey that is. Sometimes Jekyll, sometimes Hide. There are some bugs that simply nobody cares fixing, some parts of the Docker spec that simply don’t get implemented (but nobody tells you), implementation choices so dumb you’ll rip your hair out in anger over, and the nagging feeling that Docker Inc employees seem incapable to talk to each other, think things through, or stay focused on a single bloody task for once.

But! There is also much beauty to it. Your compose stacks simply work, while giving you opportunities to grow in the right places. Zero-downtime deployments, upgrades, load balancing, and rollbacks work really well if you care to configure them properly. Raft is as reliable in keeping the cluster working as everywhere else. And if you put in some work, you’ll get a flexible, secure, and automatically distributed, self-service platform for every workload you want to run - for a fraction of the maintenance budget of K8s.

Prepare, however, for getting your deployment scripts right. I’ve spent quite a while to build something in Python to convert valid Docker-spec compose files to valid Swarm specs, update and clean up secrets, and expand environment variables.

Also, depending on your VPS provider, make sure you configure network MTU correctly (this has shortened my life considerably, I’m sure of it).


That's encouraging, thanks. Are you able to share your python convertor script by any chance?


I've extracted a gist: https://gist.github.com/Radiergummi/fe14c4ed93c68f2928a6a275...

Let me know if that helps, or if you need more guidance. Maybe I could open source the whole thing properly, if that is useful to someone :)


What's the correct configuration of MTU?


there is no one size fits all answer to that. The standard is 1500; but MTU lowers with levels of encapsulation. (since you need those bytes for the encapsulation overhead).

There's also "Jumbo Frames" though you're not likely to encounter that day to day in a VPS.


I do the same. Swarm is the way to go since you already have compose files, but I have made the choice that it is not worth it. Until you hit scaling issues (as in many customers/users).


I built a similar setup but I don't like to push the images with docker save and docker import over ssh. Do you run your own registry?


Nowadays I use github's packages registry. I used to run my own registry in the past along with the docker save method. But both of them are annoying to deal with. I have Github Pro so it's pretty much free. However even if I need to pay for it in the future, I'll probably do so. It's just not worth the headache.


How often do you rebuild your containers?


Whenever I have anything to deploy, so depends on the project. On actively developing ones, could be once or twice a day. On slower days maybe once every 2/3 days.


My physical server:

Podman Pods (which contains PostgreSQL database and the app) all running in localhost on ports > 5000 and Caddy running on 443 as reverse proxy.

I use systemd Timer to dump all the databases at 4:55 PM in a directory. Then there is DejaDup [1] which automatically backs up $HOME (with no cache files of course) at 5 PM daily to external HDD. This backup includes the database dumps.

The OS is Debian with GNOME Core [2] and a firewalld rule to allow only 80, 443 and a customized SSH port. The SSH is key based with no password auth.

The most boring way but it just works :D

1 - https://flathub.org/apps/org.gnome.DejaDup

2 - https://packages.debian.org/bookworm/gnome-core


Out of curiosity, why do you run Gnome on your sever?


Okay here "my" means the University server I maintain :)

GUI is needed because the office staff doesn't know SSH or CLI or Linux at all ^^

(She liked GNOME once I showed it to her tho)


Any reason to use systemd timers instead of cron jobs?


Exactly 1. Job security.


Nowadays I just use one server with a Dokku setup. It’s easy to manage, easy to deploy for devs (just git push, the Heroku way), and it has a lot of plugins so it takes max 10s to add a database or set HTTPS up.


I use this method also, super convenient to be able to git push. Adding apps, databases managing environment variables and managing domains is all very straight-forward.


As a fan of simple setups, this looks enjoyable to work with! It is probably good enough for 99% of services.

I think I would use Ansible to setup the servers and use it for the deployment script as well.

This would document the servers and make deployment script perhaps simpler.

I wouldn't shy away from accessing the servers manually when debugging or checking things, though.


> I think I would use Ansible to setup the servers and use it for the deployment script as well.

I do use ansible but I'm reconsidering whether it is worth it, while my scripts have barely changed in the last 7 years, they tend to be slow and require me keeping multiple files around.

Related to debugging, you can expose the app logs through a password-protected endpoint by nginx.


People sometimes forget that CI/CD and effective server management was common practice before the cloud :)


> The server software is written in Rust. It's statically linked, and all of the html, css, config, secrets, etc are compiled into the binary.

I’ve recently taken to doing this in Go and absolutely love how easy it makes writing and deploying software that depends on static files.


Including secrets in the compiled binary seems questionable still - using env variables or a config is the "standard" way for secrets, and although it adds another step before you can run, it avoids the case of sharing your binary with someone and forgetting that you had compiled in some secret that goes unnoticed. Unpacking a binary to find strings is pretty trivial.

Having the static frontend assets baked in along with a default config is a huge boon though.


You can include encrypted secrets and deploy the key out-of-band (eg just copy the private key with scp). This is much more secure than env variables which are prone to leakage. Our open source solution for this (cross-platform, cross-language): https://neosmart.net/blog/securestore-open-secrets-format/

It supports embedding the encrypted secrets in the binary or loading them from a file. The secrets would actually be stored (encrypted) alongside the code, even versioned in git.

Eg this is the rust version on GitHub: https://github.com/neosmart/securestore-rs/tree/master


Hey! Your rust (and C# I guess) secrets library looks super cool. I'm going to look at using this in my next project. Thanks for sharing it.


Thanks for the words of gratitude, kind stranger! Glad to have potentially written something of some value to you.


I am not a fan of go but I find myself using it for this reason. Doing it with rust - especially cross compiling from mac to linux - is relatively painful, while with Go it is trivial and built into the go tool. It makes it so, so easy to remove any friction from finishing and deploying a side project.


Might be more complicated than you need, but I added Goreleaser to my CI/CD for my little tools. Now I can when I push a git tag it runs lints, runs tests, builds binaries, and updates Homebrew and Scoop repos.

See https://github.com/bbkane/grabbit for an example

Makes it trivial to run `brew/scoop update myapp` from another computer


> Doing it with rust - especially cross compiling from mac to linux - is relatively painful

I have used cargo-dist and the process is quite smooth, it could be worth giving it a try.


I'm doing the same way for https://play.clickhouse.com/play?user=play

But there is one question. The article says:

> I get my HTTPS certs from Let's Encrypt via certbot — this handles automatic renewal so I don't have to do anything to keep it working.

But I'm using cross-region setup with two servers and a geo-DNS. With this setup, the certbot only works for the server, located in the US, and I have to manually copy the certificates to the server in Europe. Any idea how to overcome this?

PS. Read about ClickHouse Playground here: https://ghe.clickhouse.tech/


Yes! I do the same. I serve my web applications from the same statically compiled service that serves the backend API. In CI, I run an npm build process then embed the output. Makes running a local test or demo instance a snap.


Interesting post, I liked the deploy script that keeps the app versioned in the server, which is helpful to do rollbacks.

I have been running a similar setup for many years with some differences:

1. Use `EnvironmentFile` on systemd to load environment variables instead of bundling secrets into the binary.

2. Set `LimitNOFILE=65535` on the service to avoid reaching the file open limit on the app.

3. Set `StandardError=journal` and `StandardOutput=journal` so that `journalctl` can display the app logs.

4. Use postgres instead of sqlite, DO is taking regular backups for me and postgres maintenance is almost null for simple apps.

5. Nginx can have password-protected endpoints, which are useful to expose the app logs without requiring to ssh into the VM.

6. Nginx can also do pretty good caching for static assets + API responses that barely change, this is very helpful for scaling the app.

At last, I use ansible but I'm considering if its worth it, replacing it seems simple and I'd be able to keep a single deploy file that runs faster than ansible.


Once every 2 or 3 years I configure a new VPS with the latest Ubuntu LTS server release, install latest PostgreSQL, NGINX, Redis, Node.js. No containers, just standard Linux user accounts for each app. Pretty boring to be honest, but I don't have a problem that requires more complexity. I once tried a more complex distributed approach with load balancers and multiple VPN providers. Turned out the added complexity was the cause for instability and downtime.


No one is talking about redundancy though. I love setups like this but prod environments need robust forms of redundancy. Cloud run, k8s, and their ilk are extremely distasteful I’ll grant you (the added complexity and cost almost never are worth it. And don’t get me started on the painfully slow prod debug cycles…) but the redundancy and uptime of them just can’t be beat with a setup like this.

Also, none of the solutions discussed here gracefully handle new connections on the new service while waiting for all connections to terminate before shutting down the old service. Maybe some of the more esoteric Ansible do idk.

I TRULY want the simplicity of setups like discussed here, but I can’t help but think it’s irresponsible to recommend them in non hobbyist scenarios.


You have to decide whether the complexity and cost of a fully redundant system is worth it and consider it against what your SLA is, especially if your redundancy increases the risk of something going wrong because of that extra complexity.

From personal experience in B2B web apps, a lot of sales/business MBA type's will say they need 100% uptime, but what they actually mean is it needs to be available whenever their customer's users want to access it, and their users are business users that work 9-5 so there's plenty of scope for the system to be down (either due to genuine outage or maintenance/upgrades).

You've possibly also got the bonus of the people that use the app are different to the people that pay for it, so you've also got some leeway in that your system can blip for a minute and have requests fail (as long as there's no data loss), and that won't get reported up the management chain of the customer, because hitting F5 30 seconds later springs it back into life and so they carry on with their day without bother firing an email off or walking over to their bosses desk to complain the website was broken for a second.

At a previous company we deployed each customer on their own VM in either AWS or Azure, with the app and database deployed. It was pretty rare for a VM to fail, and when it did the cloud provider automatically reprovisioned it on new hardware, so as long as you configure your startup scripts correctly and they work quickly then you might be down for a few minutes. It was incredibly rare for an engineer to have to manually intervene, but because our setup was very simple we could nuke a VM, spin up another one and deploy the software back onto it in and be up and running again in under 30 minutes, which to us was worth the reduced costs.


> No one is talking about redundancy though. I love setups like this but prod environments need robust forms of redundancy

Not really, there are many kinds of apps that don't need such redundancy.

> Also, none of the solutions discussed here gracefully handle new connections on the new service while waiting for all connections to terminate before shutting down the old service. Maybe some of the more esoteric Ansible do idk.

I have dealt with this in the code with shutdown hooks on the server, waiting for existing requests to finish its processing and reject new requests, clients will just end up retrying, not all apps can accept this but many can.


Is there any reason not to use Docker instead of systemd? I like managing services with a simple docker-compose.yml on my server. It has worked great so far but I wonder if there are some downsides that I am not aware of. Performance doesn’t seem to be an issue, right?


They don’t quite do the same things. Systemd will do stuff like ensure the service is restarted if it ever crashes. It can also make sure system-level dependencies are up and running (“service B depends on service A, so wait for A to be up before trying to start B”).

Performance is not an issue in most docker setups you would ever use, correct.


Acutally, there are docker-compose primitives that solve just that (restart: always/on-failure and depends-on: servicename.

I think it mostly comes down to what layer of abstraction you like working at.


True, Docker-compose has a lot more overlap with systemd.

But it doesn’t have system-level dependencies. For example, in systemd I can wait for a network interface to be up and have an IP assigned by DHCP. As far as I am aware, docker compose knows about the docker network and its own containers, but not the system more broadly.

Also, you will likely want it to run for a long time, so something has to trigger the docker-compose process to start and restart it. You might want it to restart in case the OOM killer knocks it over. That daemom stuff is what systemd is good for.


Problem is you can't have it depend on anything outside of docker i.e. I can't write a docker-compose file that waits for an NFS mount.


Podman can generate systemd units for managing containers IIRC.



Performance with Docker is slightly worse, but it shouldn't be an issue for a long-running process. The main problem I've run into is that, by default, Docker logs will eventually fill the disk and crash the server. You have to change the logging system and then delete and recreate all of your containers, because there is no way to change the logging system for existing containers.


I use docker-compose + systemd. systemd has come in clutch when you need to add waiting for another service to come up.

I should really put my homelab setup somewhere.


The author focuses on simplicity, he tries to handle everything with a single file for the app + a single file for the database.

Unnecessary overhead gets introduced with docker, for example, now you need to depend on a container-registry + the authentication to pull those images from the server + other stuff.


FWIW there are tons of ways to use Docker without an image (I assume you meant image) registry. If you're running Docker on the server you're deploying to then that's all you need.


I guess, still, the image needs to be built somewhere, my bet is that you will do this on the server itself, its unnecessary complexity.


No not really any reason. Docker has a bit of overhead but greatly simplifies most of the things the author is doing manually with his self-described “better than the vast majority of off the shelf solutions” software.


How is setting up a Dockerfile and then a docker-compose file any simpler than just writing a unit file?

This seems like a perfect application of the init system.


My self-hosted servers are Debian on clustered Proxmox. I bake the images periodically with Ansible and Packer.

I used to have quite a few of them, then I shifted to K8s (or k3os, specifically), so now the only VMs other than the K8s nodes are my NAS, backup target, and a dev server. However, since Rancher has abandoned k3os and it’s forever stuck at upstream 1.21, I’m in the process of switching to Talos Linux for K8s. I have Proxmox running Ceph for me to provide block storage to pods.

My blog was running in a tiny EC2 with a Bitnami multi-WordPress AMI, but since everyone else sharing it with me quit, I shifted that out to GitHub Pages + Hugo. So far I like that quite a bit better, plus it’s free.


My default for websites which do not require database is docker image + Google cloud run. Costs almost nothing, easy clickops deployment from GitHub, https managed, reasonably fast cold starts.


Just to give some specific numbers: - 40 visits a day - costs 0.01 USD per month - cold start time: 350 ms - however request latencies are 99%: 120ms, 95%: 85ms, 50%: 5ms - there seems to be "idle" instance like 80% of the time The website with source: https://github.com/PetrKubes97/ts-neural-network


Am I reading correctly that you’re running a site with 40 visits per day and it’s only costing you 1 cent per month?

So you have a container with Alpine Linux and nginx and that’s hosted in Cloud Run and mapped to your domain?

When you say “visits” do you mean the container is active 40 times per day (presumably for not very long)?

Edit: what determines when to suspend the container?

Edit again: answering the previous question: https://cloud.google.com/blog/topics/developers-practitioner...


Yes, that's exactly right. 40 visits, is just what plausible measures, and looking at GCloud logs, seems that container start count is in similar range.


generally speaking, isnt using vps a lot less expensive than a managed service like cloud run? im assuming the “websites” doesnt require to be always available 24/7 (hence the cold start is fine)?


really depends on the traffic, but even if you're service ends up running 24x7 you'll end up paying about 10 dollars for a 1CPU/0.5GB RAM instance. So for a lot of stuff Cloudrun will actually work out pretty cheap.

Don't underestimate cold starts though. Will be a long time until you need to worry about that if you're backend is written in Go or Rust, but you can hit painful starts pretty quickly when it's Nodejs. Really comes down to (unsurprisingly) image size.


Also - this is worth calling out

If coldstarts are a problem, things like fly.io are incredible. Just set the minimum amount of instances to 1, pay 5$ a month and you don't need to deal with anything.

It wasn't immediately obvious to me how fly and Cloudrun compare, but the way I think about it now after having used both:

In Cloudrun you don't think about any servers, you give them your image and they'll do the rest. It feels serverless in the sense that you don't think about machines.

Fly meanwhile is much closer to the infrastructure. It feels more like "we deploy a docker image to a VM for you", which let's you much more granuarly control what's going on beyond that.

Less magic, but in a good way because more predictable results. Cloudrun has a few fun corners (like no CPU between requests unless you explicitly enable it) that can lead to fun side effects and just make it harder to reason about.


Ugh... I don't want to denigrate your advice (which I'm sure is great) but at this point in the comments we're getting so far away from the, "keep it simple" approach in the article that it's getting ridiculous.

If you're at the point where you're concerned about cold starts and using a service like fly.io why not just skip the (unbelievable, no-one-knows-this-stuff) complexity and use a $5/month VPS?


That's a fair point, but that's essentially what fly does. If the only purpose of your VM is to serve your app (and not as a workspace for you or stuff like that), fly's great because you don't need to do any plumbing to deploy.

"Simply" using a 5$ VPS sounds great until you need to start writing systemd files, need to keep the box updated, want to deploy straight from GitHub,...


It doesn't take more to write a systemd unit file than figuring out the api request or navigating the gui of your favorite cloud service.

Nowadays all linux distros offer an unattended way to do packages security updates and reboot the node.


Google cloud run calculator says a 24x7 0.5G instance is $30 a month - the DO droplet with the same cpu/memory is $6/month.

The 0.5G instance has a minimum cost of $10 a month, before any kind of traffic metering, or what have you.


My setup is also kept simple and "basic."

Digital Ocean, Rocky Linux or Fedora, systemd services, Bash. I usually run Rails with PostgreSQL. I might use containers more going forwards although I haven't so far.

I wrote Deployment from Scratch exactly for showing how to deploy with just Bash.


For web services, I would recommend Google cloud run, Azure container instances, or AWS Fargate for running containers directly. In most cases the price per service would be much lower than 5$/month - https://ashishb.net/tech/how-to-deploy-side-projects-as-web-...


The Google calculator for cloud says anything but the tiniest configuration (256M memory) has a minimum $10/month charge just to exist.

A single instance container that runs 24*7 for a month with similar cpu/memory as the $6 droplet is $30/month, before you factor in network costs.


Idk about other clouds, but AWS Fargate pretty much requires an ELB which adds annoying fixed cost, so for tiny services bare ec2 can be cheaper. Maybe you can amortize the ELB costs over many services but its still something to take into account


I employed a workaround instead of using an Elastic Load Balancer (ELB) for Elastic Container Service (ECS) by incorporating an API Gateway. This approach helped me remain within the free tier. Although Fargate doesn't fall under the free tier, I utilized the EC2 launch type, which should operate similarly. Here's the reference to my idea on Github: https://github.com/jacobduba/ratemydishes/blob/fa63f9e09d34d....


Slightly related, is it feasible to run a syncthing node on something like cloud run with persistent storage attached? If you have an always on computer then it doesn't make sense but if you just have a laptop and phone that only sync now and then it seems like it could work but I haven't seen anyone talk about it.

One of the motivating factors is I had a cheap VPS as my syncthing node and it just stopped working one day and won't boot. I haven't had time to debug it and find out exactly why.


I am about to embark on this myself. I was tossing up between DO's app platform (good: no server admin, bad: emphatical, lock in) or just renting a VM like this. This pushes me towards VM.

Setting up a python server environment seems to be hard work, with lots of steps (gunicorn and all that) but that said they make the point about using Docker. So maybe docker compose could take a lot of the pain out of it.


Apache Libcloud supports DigitalOcean:

https://libcloud.readthedocs.io/en/stable/compute/drivers/di...

So as long as you don't mind writing your deployment scripts using Python you can make them reasonably portable (though honestly every provider has a little bit of quirkiness that needs to be worked around but it's usually trivial stuff).

Should solve 95% of that "lock in" problem.


Go for it! It isn't as complex as it seems, then, you can decide whether it is worth it.

One advantage from DO is the regular backups.

Like you said, you can go for executing docker compose on the server if you want to do it fast.


Check out render.com as well.


I’ve been eyeing render.com for a pretty standard rails stack. Have you experienced any drawbacks or limitations with it so far?


Works great with Rails. I’m mostly using it with distributed Elixir natively (which other platforms can’t handle), Hasura (in docker) and some static SPAs (free).

I find it just as easy as Heroku, but cheaper. As someone who hates messing with servers and prefers to focus on my product, render is perfect for me.


Is there a way to buy droplets for a year at a time? Like 40$ for a year would be a sweet deal for the $4 droplet. Especially for things like PiHole and wireguard.


If your goal is to minimise costs, some of the cheaper providers that have offers on https://lowendbox.com/ will have reasonable annual discounts.


Do remember that there have been cases of some of the more budget oriented providers just folding and the company disappearing altogether. That actually happened with me, with DediStation: https://lowendtalk.com/discussion/114949/is-dedistation-a-re...

Nowadays I'd generally go for hosting providers that have been around for some time.

Hetzner: has both great features and affordable costs (I needed to verify my ID, though)

Contabo: the prices are great, but the performance is a bit worse than other options

Time4VPS: a Lithuanian provider that I used due to them being cheaper than Hetzner, unfortunately their prices have increased noticeably, only the yearly plans are worth it

DigitalOcean, Scaleway, Vultr: all have good features, but can be expensive

Azure, AWS, GCP: too complex and enterprisey for my private needs


Contabo was pretty good in my eyes. Not as much I/O or CPU performance as a new droplet with a fancy new EPYC processor but lots of ram and storage for the price and some reasonably fast cores with little contention from what I could tell.


I consider a provider from LowEndBox disappearing one day a near-certainty. In my experience (N = ~10), an average LEB provider will have an existence half-life of around 3 years.

That's not to discourage anyone from using them; in fact, LEB VPS servers are pretty much all I use (each being ~$25 / year), and I've gotten quite good at keeping useful, tested backups, keeping redundancy in my services, and being able to stand up a new server quickly.


Happened to a friend of mine.His backups were the offsite but with same company so he was completely screwed


Also those hidden costs. You don't know about those until you are charged for it, and was written in their terms which you never bothered to read.


goal is to minimize costs from a big provider by buying in bulk. Something similar to spot instances.


No idea about DO, but for years I bought a t3a.micro for about $30/year. If you committed to 3 years it got even cheaper.


I used to host everything on one or two VMs, but I've switched to using a separate VM for every service.

On the one hand, it's a security thing: if one service gets breached by a catastrophic security hole, the rest of the servers should hopefully be unaffected.

But the main reason for it is ease of administration. I don't need to bother with Python virtual envs or RVM for Ruby or juggle multiple PHP versions, I can just install the version I want with apt and everything just works.

When I pass a project on to someone else, I can just give them access to the VM so they can easily migrate it.

That convenience is worth a few Euros per month. (my total hosting bill varies but I don't think it was ever more than 50€ per month)


I do the same, a VM per app, unless it's about static websites which I can host so many on a single VM.


I'm rocking a Linode VM with nginx and static html files. Very little attack surface, no complexity, etc.

In the past I would have templates, analytics and the works. Today I just want to have the shit up there and that's it.


Just curious: what’s the difference between the past and today? What changed for you?


There is different motivations, for one my website at the moment just hosts some of my photographic work. So instead of messing with frameworks and/or apps I prefer to make more photos, easiest way to avoid some kiddie taking it over because I forgot to update Wordpress or something is to have just html.

One thing I noticed while messing with it tho is that the effort to do the basics today is much bigger than 10 years ago. Not because things are more convoluted but because the standards are much higher. You have to work on mobile and desktop, expectation of behaviour is also much higher. Animations are expected, etc.

I also solve this problems everyday for work, last thing I want is to do it for me I guess. Funny is that Im also doing way more than I needed, because I could pay for squarespace or something and be done with it but I also want to control it too much?


My setup used to be pretty similar. A few changes I've made over the last few years:

Moved to Ansible playbooks for all server config, checked into the same Git repo as the code for the project. Sure, I can set it up by hand fine, but then there's no documentation about how the server is currently set up. It's no fun to have to set it all up again by hand if you need to switch to a new server, or try to remember/figure out exactly what you did 3 years ago to set it up when something goes wrong.

Docker for the apps. Not as big of a deal with compiled languages, but for interpreted languages, getting a non-ancient version of the interpreter on the bare metal server is a major headache, as is updating that version. Updating the dockerfile to point to Ruby 3.2 by contrast takes just a moment. Docker is also at least as good as bare systemd, probably better, at auto-restarting services, holding configuration secrets, isolating app access, managing logs, etc.

I use Ansible to set up and launch the containers because it fits in with all of the other setup steps and it works well with running multiple independent apps on the same server.

I've experimented with the super-integrated stuff like Cloud Run. Seems okay if you just want to run an image somewhere, but it seems to me like the complexity and potential for issues multiplies fast once you need supporting services like DBs or caches, periodic background tasks, multiple services running, etc. Might be okay in say AWS with CloudFormation or something, but I'd honestly rather go right to managed K8s instead. It's a little more complex, but quite capable and can be run on a bunch of different cloud services.


A pretty same setup with a bunch of differences:

1. I'm using a single postgresql database for all apps (each with a different user) on a different server; each app has a different db user

2. I use a minio instance for file/media uploads/serving

3. I mostly use nginx but i'm transitioning new apps to caddy because of automatic integration with let's encrypt and much smaller config for common purposes

4. I use a fab-classic (fabric 1x) script to deploy new versions: https://github.com/spapas/etsd/blob/master/fabfile.py

5. For backup I do a logical db backup once per day via cron (using a script similar to this https://spapas.github.io/2016/11/02/postgresql-backup/)

6. One memcache instance of all apps

7. Each app gets a redis instance (if redis is needed): https://gist.github.com/akhdaniel/04e4bb2df76ef534b0cb982c1d...

8. Use systemd for app control


Neat setup. Regarding the deploy script. I have just setup a separate VPS for proxying database queries using various Node database drivers for my own project[1] and only used Github Actions managing it[2]:

- add build script using Github Action that fails the entire build pipeline if code doesn't build

- add deploy script (essentially a few commands ssh into your VPS and pulling, install, building and restarting)

I was surprised how easy it was. Naturally you need to get your hands a bit dirty compared to managed solutions, but using AWS with Lambda, Gateway, NAT (since I required a static IP) would have taken way time and costed significantly more.

[1]: https://aihelperbot.com/

[2]: https://gist.github.com/danielwetan/4f4db933531db5dd1af2e69e...


> It's statically linked, and [...] secrets [...] are compiled into the binary

Depends a lot on what you're doing this for; in particularly for personal stuff then you do you - but this particular item does give me pause.

But then I use AWS for pretty much all my personal stuff, so I guess I have the overkill mindset already?


The way I do it is to set the `EnvironmentFile` entry in the service definition.

I'd guess that the author's idea is that if someone gains access to the server, the secrets would be either dumped from the binary or the environment file, if so, why to bother with another file?


Do you use LightSail or is it just regular AWS?

I’ve never used LightSail but it looks like a good deal.


Nope - in fact right now I'm not running much at all; just some static hosting with S3 and CloudFront which costs pennies a month. For a blog that's pretty much write-only and that not very often.

When I'm playing with more complex stuff I spin up an ECS with Fargate hosted containers. That's mostly for fun though.


Cloudformation gang represent


Almost same, but mines are:

- NixOS:

Was hard to figure out at first but is the MOST no-brained deployment setup after +20 years doing things. Is super easy to upgrade, add-remove things in a predictable way. Also, I love how ALL the config + security tweaks are in a single place (I Just do a big NixOS config to see everything at once).

Another big win with nixos is that you can do a very constrained setup and if for some reason need to do something that require a install you can do it once without polluting the install and it disappear after it (like: Install node, run things, get out and node and their dirt gone!)

- PG

I wish to use only Sqlite but it lack vital features for me (like proper stored procedures) and have the ability to access the DB with TailScale is a big plus when tracing a problem with a customer.


I just have everything running on proxmox with mostly FreeBSD vms.


Is this really enough? What about oscap-scanning regularly, EDR/XDR protection with CrowdStrike or Wazuh, dependency scanning, anti-virus and general vulnerability management?


If you only install Debian packages & the apps you compile is is quite unlikely you need "edr/xdr protection, dependency scanning, anti virus".

The author has snapshots faced every 6 hours. Even if something did happen, I feel like the time to recovery would be not bad. And the time & bad energy saved not worrying about all this obnoxious & almost certainly irrelevant enterprise grade security concerns seems greatly relieving.


Nobody outside the enterprise does this.


Anyone considered deploying with docker compose? Simple, well known and pretty flexible. I wrote some helpers to manage monitoring, logs, and deployments: lostdock.com


I'm a fan of this approach in general, but I've been looking for a way to accomplish almost the same thing, except I would like to have the version defined in a git repository.

I suppose I could git pull on a cron job, copy over the systemd unit files, and restart, but I'd like to have just a little more smarts than that. I've been working on my own tool to accomplish this but it's not super stable just yet.


NixOS, whilst being a huge time sink, delivers exactly on that promise and then some.


I switched from Ansible and YAML-hell to NixOS and never looked back. Still learning Nix but it's easy to get started with the basics and then refactor things later


I do this with gitlab and Cron. Commit to main on gitlab and a ci script builds the project and sftps a zip to the staging server. A Cron job checks for new zips in the directory and unzips, then installs the update. After that it restarts the job and I have a new version ready to test in staging.


Have you considered installing a GitLab runner on your deploy machine to run jobs specifically tagged to deploy that app?

It’s a super easy setup and saves managing SSH keys in env variables etc while still being really quick!


I didn't think about doing that. It's a great idea.


I've been using Ansible to deploy ~10 services from git repos to a set of KVMs. After paying the time tax of writing the playbooks, I just update a release version variable for the service I want to update and rerun the playbook. It will even take care of the systemd restarts.

I'm pretty happy with that set up. Ansible has just enough smarts to be less tedious than shell scripting, but dumb enough that it's behavior is easy to figure out in most cases.

Now I'm tempted to try running it nightly from cron.


We are on the same boat, my ansible scripts have barely changed in the past 7 years, still, I have been wondering about getting rid of ansible for simple scripts to handle everything, the main reasons being speed and simplicity (1 file vs many files).


> I suppose I could git pull on a cron job, copy over the systemd unit files, and restart, but I'd like to have just a little more smarts than that.

How complex... You know that you can simply git push to just about any ssh account, right?


Of course, but the point is not just to get configuration into git, it's also to get out of the business of SSH'ing to individual servers.


I made a small python flask app that would listen to a webhook and then do a git pull on the directory. This has worked super well for the few random website projects i host


"a reverse proxy. The main advantage to this is that nginx can do TLS termination"

I use Apache for the same purpose, works great, always has.


Rate limiting and caching is another big advantage, setting it up with nginx is a piece of cake, setting it up in the app, not so much.


Since we're all sharing our favorite simplest solutions, I guess I'll throw https://fly.io out there. The DX is far from perfect right now so this answer is _ever-so-slightly theoretical_ but, if you know how to use Docker, `fly launch` is extremely hard to beat.


> One complaint about this setup is that paying $5/month for every service you want to run is a lot.

Oh, you sweet summer child.


I really don't understand the sqlite fad, you can run postgres on the same instance and get a much more capable database but also have the freedom to expand to a tiered architecture if you want.

For personal things I just use an EC2 instance w/ docker compose.


Because Postgres is another thing you have to run, maintain and monitor, and you have to make a judgement whether that extra complexity is worth it - for a lot of simple projects, it's not.

SQLite is a library you use in your application process that writes to a file. There's a bit of care you have to do to ensure you back up that file safely, but there's no extra monitoring or maintenance above what you do for your app anyway.

I agree that running Postgres isn't terribly difficult, but no matter how simple you try to make a small Postgres instance, SQLite is simpler.


I run a setup that's very similar to the author's one, but, I run postgres.

For a simple setup, postgres maintenance is practically null.


Didn't we have a thread recently about the dangers of using Let's Encrypt?

If on AWS, and if your servers are behind a load balancer, just install a certificate on those. Isn't that a little better?


> If on AWS, and if your servers are behind a load balancer, just install a certificate on those. Isn't that a little better?

My take from the post is that author's goal is simplicity, AWS isn't simple, also AWS load balancers are expensive.


I have a similar setup of Nginx proxy, Backends are Go binaries (without Docker) and Postgres. If it's for a pet project the free tier on GCP suffices.


I do the same. OpenBSD, Go, SQLite, and rcctl.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: