Hacker News new | past | comments | ask | show | jobs | submit login
11 years of hosting a SaaS (ghiculescu.substack.com)
334 points by ghiculescu on June 15, 2023 | hide | past | favorite | 219 comments



> Use managed services for as long as possible

Big agree here.

Yes, you can save stupid money by handrolling postgres on an extremely beefy Hetzner server, or you can pay someone else and keep building your product: https://onlineornot.com/self-hosting-vs-managed-services-dec...

This isn't to say, "don't bother learning how to do it yourself", but more "learn to pick your battles".


It may be a generational thing, a matter of familiarity with computing and computers. For someone who's lived through the 80s, "handrolling postgres" doesn't sound nearly as scary as you imagine.

I expect the cost/benefit analysis of "handrolling postgres on a beefy Hetzner server" vs "navigating the menus and options of AWS services" would be different for different teams.


I'm looking at the ansible playbooks to setup my favourite beefy baremetal Hetzner server (128GB ram, Ryzen 9 5950X 16-Core, 450Gb fast NVME SSD, 3.5TB x2 NVME SSDs, 155€/month):

- Install Debian 11 while booted in rescue mode.

- Setup the root file system encryption using cryptsetup and dropbear (to enter the key during the boot through SSH). Involves chroot and some fun commands.

- Setup ZFS encrypted mirror filesystem for the two additional SSDs.

- OpenSSH hardening and Teleport installation.

- Kubernetes installation (K3S)

- Connecting kubernetes to my Argo CD instance or an existing Kubernetes cluster.

And then through GitOps:

- Installation of openebs-zfspv

- Installation of kube-prometheus-stack helm chart

- Installation of (many) postgresql instances and other craps

I have been playing with Linux servers for 20 years and I find this fun and rewarding. But I do understand people saying that baremetal Hetzner is not for everyone. Especially if you start to have requirements such as "data must be encrypted at rest".


Our terraform config for RDS is about 50 lines of configuration. We get a much smaller instance for our money, but ultimately figuring out all of what you posted isn't a good use of my time (yet).


I agree, if you don't have the time and don't find it fun, managed services in the cloud is a better idea.


Why are you running cluster orchestration software on a single machine?


Kubernetes is not a cluster orchestration software. It’s an app orchestration software that also supports multiple nodes. If you don’t understand the difference, you’re missing the value proposition of k8s.


Well, yeah. I'm not sure I do see the value prop of k8s.

The fact that it's a supports multi-node means that you get all of the drawbacks of a multi-node system without any of the benefits. It's single node deployment but worse.


My clusters don't necessarily have one machine, but I have a few clusters with a single node.

K3S is pretty lightweight and kubernetes is much more than cluster orchestration, so the pros win against the cons.


If you skip Kubernetes the setup is not that complicated.


Yes, that list reminds me of the exaggerated posts about "look how hard it is to install Firefox on Linux!!". Claiming that setting up a Debian Postgres server necessarily entails knowing ZFS and Kubernetes is quite a reach

Not sure about Debian, but I believe Ubuntu Server will let you setup an mdadm mirror, LUKS (with LVM), and install and enable a Postgres server with a few buttons in the install wizard. It can even fetch SSH authorized keys from a Github account, covering by far the most important SSH hardening step (disabling passwords). Most hosting providers will also offer a one-click deploy that may similarly add your keys and do other common config

A better example of something that hosted databases makes a lot easier out of the box would be backup, replication, and monitoring


I'm not sure. I rather use a lightweight kubernetes or perhaps nomad than do everything that kubernetes does without it. It sounds even more complicated. But I agree that for one single postgresql isolated from everything, kubernetes is overkill.


OMG you do not need this bloatload for just PG hosting. Just harden SSH, harden PG configs and voila :)


> - Installation of (many) postgresql instances and other craps

and you missing part about fault tolerance and fall back which is most complicated.


pg_auto_failover has your back - https://github.com/hapostgres/pg_auto_failover


and you absolutely can't be sure your saas vendor is doing encryption and hardening.


I've been on unmanaged MySQL for ~8 years now. Considered switching to managed but I'm not seeing any performance or stability issues, so I guess I'll just keep this train going until it craps out on me, then restore a backup onto a managed service, say sorry for the downtime, and that'll be that.


Do you know how long the downtime might be? Have you tested your backup recently?

Gitlab had a long downtime because the backup was huge and on the other side of the country. The backup server was on a low speed network.

https://www.arcserve.com/blog/lessons-learned-gitlabs-massiv...

How much money would you lose if you were down for one week? How many customer would you lose?

How much credibility would you lose?

For my peace of mind, I can't afford a spof when I know one lingering.


I get what your saying here, but its again the comparison with Github and extremely large sites thats the problem. Most of us dont run google/fb/github scale sites and the backup will probably fit on an external HDD and in some cases would be even downloadable over S3 in an hour.


That's what I can't be comfortable with : "would".

How long does it take to try it? A day?

Well then try it, either it'll work flawlessly on the first try, either you'll learn that the backup you have doesn't include logins, password and the security configuration that goes with it. Or that the dump you took lost some data because it wasn't in the right encoding.

Or the tape drive you're using need specific drivers that aren't available on the web anymore because the company website's closed.

... This is a work of fiction. Any similarity to actual events might be purely coincidental...


Yea i also dont see the point of having everything “managed”.

RDS is crazy expensive compared to self hosting and if i have the DB on prem its much faster as well. And the admin overhead is not so big to be honest if you are using just one DB.

If you are Google scale of course things will change, but I think 80% of loads dont need any managed AWS stuff, replications, multiple nodes, kubernetes, etc… just periodic backups and it runs fine.

But people nowadays just like throwing money around I guess, instead of trying to set it up for themselves.


I don't think "modern" stacks are sane enough to "handroll" anymore. Sure, you can do it, but look at the poster in this thread that details the setup of a debian server.

Kubernetes, "Argo CD", zero-trust, the sheer amount of "management" is off the chart.

"Installation of kube-prometheus-stack helm chart".. "Installation of openebs-zfspv"..

It's not postgres that's the problem here.


A lot of "modern" stacks is just complexity for the sake of complexity. Google is doing it so clearly our 5 man startup will face the same scaling problems, or something like that.

Many of the problems these tools solve are problems that wouldn't exist building things the old fashioned way. If you stick relatively close to the metal, operating this stuff is pretty easy.

However it's notable that a very valid reason to prefer managed services as a SaaS is to cover your ass if things go wrong. Your SLA violation is their SLA violation.


The complex stacks are insane to operate yourself but very simple to operate if you use a managed offering instead, and they do provide genuine value.

I can set up a new golang app on ECS with a load balancer and database, with a CI/CD pipeline, with 0 downtime updates in about 30 minutes. Most of that time is waiting for AWS to give me a load balancer. Our work applications have been running with this setup for over 2 years and the only thing Ivs done with infra in that time is adjusted instance sizes and bumped a MySql version.


To be fair, I can set up a new service on bare metal in minutes too, mostly because I don't need to set up everything from scratch.

I don't really need to set up a database or load balancer or anything like that because it already exists on the server. Just create a new database schema, new systemd service, new nginx rule.


I agree

And more to the point, learning how to "handroll" Postgres could be beneficial. You could have learned about options for limiting the amount of memory, etc

Sure, managed is easier and use it when you can afford it easily. But before that, it's better to see how things are going (mem usage, disk usage, bottlenecks, etc)


Yep, I'd always prefer freedom and power of hosting my own PG instance upon a some robust VM offering to guiggling with clumsy AWS menus.


If you're managing AWS infra through the web application, you're definitely not doing things as per AWS-prescribed best practice.


Or you could use sqlite until you need postgres. I have to admit I reach for postgres immediately when in many cases sqlite would have served me just as well.


SQLite seems to be gaining popularity with even larger projects which is surprising to me. As I see it, the big value prop of SQLite is that it runs in-process which, for a webapp, is almost nil?

Other than that, it's not like queries are any simpler and the "simple" type system is, in my opinion, not a feature. I get that some might disagree with that.

Is there some other reason why you would prefer it?


It has an extremely low barrier of entry while providing the features of a relational database when all you need is a local data store. The files are trivially easy to transport using standard tools when needed. I've been in back-end automation/integration for my entire career and use these kinds of things all the time. The overhead of maintaining a full networked RDBMS isn't always something I want (or need) to do.


Here’s a great article explaining some of the benefits of using SQLite in production: https://blog.wesleyac.com/posts/consider-sqlite

I use it in my production SaaS serving around 4 million requests per month on one of the lowest DigitalOcean tiers. The big ones for me were cost, operating simplicity and performance. I don’t need a separate process or server running which has saved me some money and time, and the app’s workload doesn’t need a ton of inserts so the speed is blazing fast.


"4 million requests per month"

There are 2,592,000 seconds in a month. So, 1.5 requests per second?


Oh whoops it’s actually 4 million per day! Good catch.


I agree with you about the worse type system not being a feature. Also missing the json features of postgres is inconvenient.

Only one file to backup or deploy is the biggest advantage of sqlite IMO.


It may not quite have all of the JSON features of Postgres, but recently the JSON handling has become way more usable in SQLite. More than usable for sure.


Hat tip to both the amazing native JSON support from Postgres, and the SQLite module with JSON functions: https://www.sqlite.org/json1.html


Check out Cloudflare D1


It looks neat but I am still not sure what the appeal is exactly. Is it cost reduction?


Cost reduction and simplicity - costs nothing if no one is using it


I’m super intrigued by SQLite these days.

11 years ago you used Postgres because that’s what Heroku told you to use.


Yes especially data, simplicity is key as well.

Where possible go with simple but abstracted cloud storage, cloud tables and then a cloud db that is managed. We use Azure mostly right now but our storage system works across Azure storage, Amazon S3, Google Cloud and others. For tables, Azure Tables mainly. For database with filtering/paging better performant and ACID compliant cloud db, CosmosDB currently which is a dream with the differing apis (SQL, Mongo, Cassandra, Tables style). The more you can avoid vendor or dev-lockin the better so simple formats/messaging/routing and abstracted specifics/implementations.

When you store data in storage or a cloud db the scaling is "infinite" and you can also snapshot or backup to anther one, you never worry about data.

The front ends and APIs are mostly repos pushed to app/web services and everything else in data storage. Super simple and anywhere you need some special service that can be serverless or a dedicated setup, like maybe a RDBMS, chat server, network server or WebRTC/socket endpoint that interacts with the simple side. These managed as well if possible, though not always. Additionally, build cheap and horizontal scaling on web/real-time servers. Vertical scaling and sharding is for suckers.

Side note: CosmosDB is like a combination of NoSQL, document databases and GraphQL and it is ACID compliant and you can do REST or SQL, it can even wrap MongoDB and Cassandra and make them ACID compliant. It really feels like the best way. Not many have all that and ACID compliant. Not even Amazon Redshift has that, DynamoDB does if specified. Google Firestore does if specified. I used to be big on RDMBS Oracle then MSSQL then PostgreSQL and those are great for backing/reporting etc but CosmosDB combines all the power of RDMBS, NoSQL, document databases, and ACID compliant with little worry about scale. It is vendor lockin to Azure, which you can route around with platform abstraction, but currently can't be beat. As you got that clean API layer you could change later but best way is limited/clean and if possible, non breaking change API layers/signatures.


Really do not share your experience with CosmosDB, some of it's attributes that currently make me miserable include...

- Scaling is not infinite, it's up to 20gb per partition key (1), which can't be changed after document creation.

- One set of global indices, no equivalent to DynamoDB's secondary indices.

- Still can't run their docker container on mac (2) natively.

- Weird SQL-like dialect that's required for all but the simplest queries. JOINs are spectacularly awkward.

- Tooling is horrific. Based on the previous point, no existing tooling works for it (and nobody is building tooling for a DB with such minimal market share). For example, I needed to manually deleted 30 or so documents/rows yesterday - only way to achieve this is with 30 separate click-to-deletes in their UI.

- Minimises analytics options. There exist a plethora of business intelligence type tools that will happily sit on top of most common DBs. None of them like CosmosDB. So you're stuck with synapse link or whatever MS calls it now.

Overall it seems to combine the worst aspects of both RDS and document stores, with the worst aspects of both traditional and serverless infrastructure.

(1) https://learn.microsoft.com/en-us/azure/cosmos-db/partitioni... (2) https://learn.microsoft.com/en-us/azure/cosmos-db/local-emul...


The partition key limitation is something you can work out with smart partitioning and horizontal scaling. We do the same already with storage/tables to prevent large data blocks at a smaller limit even for speed/lookups/map.

Tooling could be improved and will be, it is fairly new still and the Azure Cosmos DB Emulator is not bad.

There is a CosmosDB Synapse setup that allows more analytics/intel on top like you said but same with other NoSQL, takes a bit to get worked in.

I actually like the flexibility of query types and that they include SQL as it makes it a bit more standard and somewhat less vendor lockin. You can use other types as well Mongo/Cassandra/Tables syntax. For filtering the SQL side isn't bad but most of what we do is flat/associative and not heavily normalized. For most of our data we are very cache heavy as well to reduce db hits and retries.

ACID compliance is huge and there are some design considerations.

What cloud DB do you use the most DynamoDB?


Most experience with DynamoDB, which isn't without challenges, and has a steepish learning curve to get the most out of it.

I work in pre-market fit startup though, so hyper-scaleability is not really a consideration right now, and would much prefer to be running postgres.


> Where possible go with simple but abstracted cloud storage

This is the total opposite of simplicity. Simple is PG and backup setup. DONE.

Maybe you mean "comfort"?


Managed and no need to backup with snapshots. PG still has manage concerns with size, logs, access, scale etc. Nothing you have to do with cloud storage. I do love me some PostgreSQL but use it less and less except for reporting or heavily filtered needs. Horizontal over vertical for comfort and simplicity.


having no skin in the game, I feel this is a very interesting view, especially when you are offering a managed service building on top of another.

go through enough levels of dependencies, and someone who could do a few layers of the onion in-house can sweep in with competitive pricing/offerings a la sherlocking.


Which one do you recommend?

Those get quite expensive.


I personally use RDS


This is one of the best ops writeup I've read. People are rarely so vulnerable and relatable. Usually it's all "here's how we do things perfectly at our unicorn company".

The most important lesson was about reducing the alerts and sleeping better. It's important to be able to leave your laptop at the office and not worry about things going wrong. This post gave me flashbacks of when I made websites for others and hosted them myself. Never again!


Great post, I've used all these services once or more at various points in my career. I'm still on the Digital Ocean stage, as I've found AWS is often overly complicated (or at least too much for me as a primarily solo founder) I'm curious about this line:

> We were doubling the size of our customer base every 9 months, and pretty soon this meant we needed more servers.

I've found that a moderately sized 4 core 'droplet' from Digital Ocean handles nearly anything I can throw at it (at least in terms of the web application world) - I see you product is doing employee management / scheduling - is there heavy compute / resources associated? Or is it more due to privacy requirements (i.e. you need a siloed instance / server per client?). Not trying to sound smug at all, just genuinely curious where the load comes from.


It seems like server performance has improved a lot over the last decade. The move from HDD -> SSD -> NVMe alone makes a huge difference for anything that touches disk. And my memory is that it was pricier to buy machines with more cores or RAM vs today.

It wouldn't surprise me if today's budget VPS does the work of several machines from 10/11 years ago, and costs less too!


Great questions.

There is some heavy compute - most of the real value is in our pay calculations and compliance features which are all based on algorithms that calculate correct pay based on legal requirements.

But mostly it's just having lots of concurrent users. For a while a few droplets was plenty, and then one day we hit a tipping point where it was overwhelming.


Would be interested to know what your hosting bill was like?

We recently moved everything in-house and started hosting our own servers. Having physical access to the machine (and the network HW) pretty much eliminated all of our dev ops (1). We expected the HW to fail often and such, but in 2 years the only maintenance we had to do was add more disks to the RAID array and resize the partition a few times. Even though we have redundant everything, nothing has ever failed and the server has never gone offline (except when it reboots every Sunday @ 3AM in a min or two).

(1): How? Well instead of running small containers/micro boxes/services that we could scale up and down, since we owned the hardware it meant that efficient utilization wasn't a requirement for us anymore. We don't really care how much we utilize our own HW, so we just smooshed everything onto huge VMs in two huge 100 core boxes with affectionate names. We used super basic networking features (VLANs, firewall port forward) to setup the network. To simplify dev ops even further, there is only access to the servers from specific ethernet ports in the building.


> since we owned the hardware it meant that efficient utilization wasn't a requirement for us anymore

This is a great point. Hyper focus on optimizing micro-containers is costly and sucks up a lot of time. You can still deploy like this in the cloud though. On AWS make a three-year reservation of a c6a.32xlarge to get 128 vCPU for $16k annually. Hetzner is cheaper, 80 vCPU for about €2,506 annually.

[1] https://instances.vantage.sh/aws/ec2/c6a.32xlarge

[2] https://www.hetzner.com/dedicated-rootserver/rx170


This is the way. Unless you are huge, self hosting and even colo is not competitive with long term VM commitments in hyper scalers.

They get big discounts on everything you'd pay retail, and their margins for barebones VM is not that large.


> On AWS make a three-year reservation of a c6a.32xlarge to get 128 vCPU for $16k annually.

plus more expensive network for cloud


One of our 100 core boxes is Windows Server and it looks like they want $79,949 annually for it.


I wanted to add more info but couldn't edit. We bought that server for $12,000 with 5 year warranty, and got our Windows Server licenses out of an MS bundle which cost about $400.

I think that brings the ROI after 5 years compared with AWS to something like 3200%, less energy, if nothing goes wrong.

(I know that there are better options than running Windows Server on this box, but it seems to perform the same, it's far easier to administer by inexperienced devops, and in the long term it's actually around $1500/yr cheaper than running supported RHEL. Our other box is running supported RHEL but after testing Windows Server and having no issues we'll probably switch it for the cost savings).


> We expected the HW to fail often and such, but in 2 years ... nothing has ever failed and the server has never gone offline (except when it reboots every Sunday @ 3AM in a min or two).

I began my career by scripting Asterisk using Python. I created a daemon using Twisted and that process ran for 3 years without fail (as in, the same Linux OS process was running for 3 years). This was on commodity hardware.


It's around 150k USD/month at the moment.

Would love to learn more about your setup. Email me if you're interested in chatting. Email in my profile.


Our deployment is much smaller than yours! Our upfront hardware cost including carpentry and HVAC was around $80k, our monthly cost in cloud was around $5k/mo and now we pay only about $1k/mo in energy (which includes the existing $500/mo bill from the office). We could reduce this further with solar panels, but we share the tall-ish building with 13 other suites and we couldn't get approval to take up most of the roof.


Holy shit that's more than I expected for your type of business. I'm running a SaaS too with half a million users and a lot of data processing, but my hosting costs are way lower... under $10k/mo


What does spend break down as?


fyi - tanda.co gives a 403 forbidden error


IME, if a drive is going to fail, it's going to fail in the first hundred hours or so of use. In my entire life (of using hundreds of disks), I've only had three failures. Two of those were on the same server, at the same time, in the same RAID array and took out the whole array -- the data was gone forever. Thank the gods for backups.

So, just make sure you take backups pretty regularly because you never know.


Does geolocation of the server not matter for your application? I am curious if there are any latency issues with this solution as opposed to a hosted option.


the progression from Heroku to AWS is very common as SaaS businesses grow. However one thing that’s missing from all “alternatives to Heroku” is what I call life after deployment or Day-2 operations. Many solutions show you a sleek demo of how easily you can deploy a Hello World app to their platform but leave you in a straight jacket when it the app is deployed and you need tools to keep it running, like changing an Nginx (or equivalent) setting or using persistent storage or a Postgres contrib.

Making decisions about infrastructure is as much about day-2 as it is about the initial deployment. I see fly.io and render in this camp. Shiny day-1 docs and demos and then wishing you good luck when you need a `rails c` to see something in the DB


I hate what Salesforce has done to Heroku but after a year of running our saas on aws, render, and fly - here I am, back on Heroku. It sucks, but slightly less than the alternatives.

Fly has potential, but they've changed/grown so much that most docs are out of date, everything is buggy, support is not very responsive, and their security posture leaves a lot to be desired. Not a fun place to be production issues pop up.


(I work at Heroku) Do you have more details on what sucks? Anything we're not already tracking to fix in our public roadmap? https://github.com/heroku/roadmap/issues


Glad to hear there's a roadmap. In my PAAS thrashing I've got quite a few notes:

- no wildcard subdomain ssl - poor metrics, nothing per instance - poor dyno granularity - (jumps from 2.5GB RAM to 14, no cpu/storage control) - no transparency on what each dyno actually is - external postgres replication disabled (deal breaker) - no first class postgres metrics and access logs/alerts (kibana recommended, but not great) - no external postgres backups (e.g. S3) - no deletion locks on dynos and add-ons, esp databases!!! - no warning when add-ons like databases are being deleted as a result of apps being deleted - deleting an add-on also irreversibly deletes all replicas and backups - painfully inconsistent naming for databases through their connection str env - dyno types for build processes use Perf-M? Not configurable - No lambda or github action style computing - No scheduled scaling - Unable to choose aws-us regions - Hard 30s timeout limit - Limited to 1 api key per user. No labels, configurable permissions, usage logging - no http/2 - frustrating enterprise offering. massive over-sell, near zero value


> dyno types for build processes use Perf-M? Not configurable

What are you looking for here? Larger build dynos? Heroku provides the build service for free so we use perf-m dynos to get fast builds with reasonable cost (for us).


I'm building Argonaut to be a layer on top of AWS/GCP to make them less sucky. The primary use case is startups that have a bunch of services and have outgrown heroku etc. I'd love to chat if we can help - you get the stability of AWS and a push to deploy dev experience for your apps.


(Render CEO) What can Render do better?


This precisely the niche Cloud 66 tries to fill, it caters for the growth businesses, providing initial ease but without sacrificing control. This allows companies to grow without being forced out before they are ready. (Disclaimer: I work for Cloud 66)


Does Cloud 66 have any resources on migrating a large (~300 GB) Postgres DB off Heroku, with minimal downtime? This has been our biggest sticking point when thinking about leaving Heroku.


Moving data is always the trickiest part! We get a lot of Heroku customers who need help with their data migration. My honest answer is there will always be some downtime while the switch over is happening, but we know of a few ways to reduce it. Unfortunately Heroku PG databases don't allow outside replication setup, so we can't really baseline your data and then close the replication gap with a shorter downtime. But we do support multi-DB solutions so we can run against an old and new DB at the same time while data is gradually being moved over. If you're interested, ping me at hello-at-cloud66.com if you want to discuss further :)


We'll keep that in mind, we're currently locked into a (ridiculous) enterprise contract with them. But it expires in November :)

We're considering using this https://bucardo.org/Bucardo/ based on this https://www.porter.run/blog/migrating-postgres-from-heroku-t...

Going to give it a shot when I have some free time.


Hello! Author of that little blog post there. Happy to be hit up with thoughts/feedback when you give it a shot!


Thank you so much for writing that. I was starting to lose hope of getting our data out of Heroku without some insane amount of downtime.


Please feel free to DM me should you need any tips with this. Email - me[at]rudimk.com.


That's what I'm struggling with. You're not wrong, but I feel like it's way too easy to go to the other extreme and suddenly you're way in over your head.


The title is correct. But any of you changes or lessons are perfectly fine. (Sure, a bit awkward to do ssh from cafes). May be reading the Google SRE PDF (or equivalent) would have been a bit more useful. At my wife's business one of the team member shall LOGIN to DO/AWS every month send a screenshot that account/CC is in good standing.


Can’t you do rails c in fly via the fly ssh console command?


The point is you can do it from Heroku UI.


vercel is great


How’s rails deployment experience with Vercel?


Reacting to his comment "However one thing that’s missing from all “alternatives to Heroku” is what I call life after deployment or Day-2 operations."

Since that's generic enough to also see vercel as a heroku alternative, be it for node and not rails


> But the worst Digital Ocean incident we ever had was when they turned all our droplets off all at once. The credit card entered into the account had expired, there was no backup card, and the contact email on the account went to a shared inbox that was not monitored. So for probably a month we were getting and ignoring billing alerts, until we really paid attention when everything was offline and not responding to SSH. This wasn’t totally their fault, but at the time it just all felt like a dodgy, shaky setup.

This wasn’t *at all* their fault and there’s nothing dodgy about it. This was entirely your fault, and even mentioning this as a „partly“ negative for DO is imo very wrong. I imagine any other service provider, including Heroku, would have acted the same way. You can’t expect to mooch off a month of service without paying and without consequences. If you don’t pay your bills, service will be interrupted. It’s as simple as that.


> The credit card entered into the account had expired, there was no backup card, and the contact email on the account went to a shared inbox that was not monitored.

The using of an unmonitored email is the one that really gets to me.


Yeah. I'm not proud of it.


It’s not whether or not you are proud that has people upset. They gave you a month of free service and you ignored their attempts to contact you. You are using this incident to paint them in a negative light when they probably gave you more than you deserved in the situation. Your attitude is very entitled for the size of customer you were. If they sent you text messages or letters I would assume you would have ignored those as well based on your prior actions. What exactly would you want them to do here? Free service until they can make contact with you?


I'm not sure I see any criticism of DO in OPs post. They said _their_ setup was dodgy, and _they_ aren't proud of not checking the inbox. Other than that they just explained what happened, no? Looks like a bit of misunderstanding to me


Would you explain why you're trying to implicate DO as being in any way "dodgy" or at fault for your own failure to pay them? You got a month of service for free, and they attempted numerous times to contact you about your bill, you just didn't read their emails.


I think they meant their own setup is dodgy, not DO's


I don’t consider email to be a legal form of communication. Cutting services to a business effectively disabled their business (especially servers). You’d think they would send at least one letter.


> I don’t consider email to be a legal form of communication.

The flipside of requiring a provider to submit paperwork to terminate your service, is the situation where in order to setup a new server with DO you'd have to file paperwork yourself. Can you imagine sending a paper letter every time you need to spin up a VM?

Which one do you prefer? paperwork in both directions or no paperwork?


Yep, that’s how things are done with my colo. Real papers. Real signatures. I don’t need to send any letters to spin up a vm though. I just click a button. For new servers, I walk in and plug it in.


For physical hosts that makes perfect sense. It would just be annoying with VM-exclusive providers like DO.


It’s very common that EULAs and other contracts explicitly state that the contractual parties agree that email is a valid means for written notifications.

I used to work for a SaaS firm. Every once in awhile the main office would receive a written and signed letters from (primarily) German users when they wanted to terminate their service. (We accepted that after confirmation using an authenticated email address, of course.)


> I don’t consider email to be a legal form of communication

Email is legal in all countries, as far as I'm aware.


Lawyers and courts do, however, so what you consider really doesn't matter.


Yeah, Tanda's own terms of services[1] are more restrictive than DO's. They "restrict or suspend" all services (or else terminate the agreement) if the client is 14 days late. This vs ignoring billing alerts for a month and being surprised when the service is shut off?

[1] https://www.tanda.co/terms-conditions/


I'd say that your service is dead unless proven otherwise. Your bills are unpaid unless proven otherwise. I'd be alerted by absence of regular emails telling me that my card was billed for the hosting service.


It turns out that humans are very bad at noting the absence of a signal.

Computers, on the other hand, are perfectly happy to use an expired timer to generate mail, a dashboard icon, or any other shout for attention that you please -- as long as you actually set it up. Paying attention to it is still a human problem.


> I'd say that your service is dead unless proven otherwise.

I'd say you're dead right -- this is the only safe assumption. And any email client smart enough to check incoming email against a regexp could be extended to raise a not-easily-ignored alert if no pattern match (indicating success in processing a payment) had arrived by some chosen date of the month. Yet I've never come across such a feature in Gmail, Fastmail, nor Thunderbird. Maybe someone sometime has hacked such a thing into Gnus ...


Whatever happened to companies actually reaching out with a phone call to an actual human in order to resolve these issues before they become a problem? A provider that simply throws a switch because a credit card expired is not one that I would trust. So, yeah, DO (and every other provider that operates that way) deserves a little bit of shade for actions that clearly demonstrate a lack of care for their customers.


They were paying all of $40 a month, I'd imagine that doesn't buy them very much customer support from any decently sized cloud provider.


But it kind of is.

Merchant payment services exist that can automatically obtain the updated card-on-file credential when they are about to expire.

Both Visa and Mastercard offer this directly, or you can get the same service from a PSP.


It’s definitely not the merchant’s responsibility to catch when your card changes.

We have this at work thanks to Stripe, and it’s wildly inconsistent. But beyond that, it won’t do anything for a closed/canceled card/cardholder account.

That situation was fully on OP. This could have happened at GCP, AWS, Azure, Linode, OVH, Rackspace, Oracle Cloud, you name it.


> It’s definitely not the merchant’s responsibility to catch when your card changes.

Well seen that the tech exists, they're not just leaving money on the table by not using it (to update card that just expired) but they're also burning money on pointless support, support which wouldn't need to happen in the first place had they updated the month/year of expiration.

I mean: we're literally talking about credit cards often keeping the exact same owner name, the very same number and just having their MM/YY of expiration changed. It's not rocket science to update in a DB after an API call. And what's the catch on at least trying? If it works, you saved everybody time. If it doesn't work, it's not worse than your current "solution".

I understand politicians and lawyer-minded people saying: "technically it's not our responsibility" but they're wasting everybody's time, leaving money on the table and wasting money on support.

It's just poor judgment to react like that instead of thinking as to how life could be made better for everybody, starting with your paying customers (which, btw, are the reason you exist).


> Well seen that the tech exists, they're not just leaving money on the table by not using it (to update card that just expired) but they're also burning money on pointless support, support which wouldn't need to happen in the first place had they updated the month/year of expiration.

To lend a bit of insight as someone who works in the payments industry: This functionality is generally not free.

If most of your customers come and update their expired cards already - or you offer a service which is so essential to your customers that they generally would freely expend the effort to do so if notified, it doesn't make any sense from a financial standpoint to pay the fees to subscribe to card issuer updates.


Would it be a better practice to use that tool? Yes.

Are they at fault because their customer dropped the ball so badly about paying their bill? No.


It doesn’t matter who’s at fault. They missed a chance to provide better service and lost a customer as a result.


Sorry, I thought your comment was in support of alberth's upthread assertion that they bear some of the fault in this.


These are regionally restricted on top of users being able to opt out of such service at both the visa/mastercard level and at their local banks. This means you cannot rely on the service entirely and have to create alternative flows even if the update system works perfectly. Both Visa and Mastercard detail for developers the manual methods they can use, which still require human monitoring/input. They also both warn people about monitoring this at the merchant level so you minimize disruptions, late fees, cancellations, etc.

It's not as seamless as you're making it sound.


Totally agree! Super dick move for dumping on DO when this was all your fault.


And then there's Twitter who doesn't pay GCS and AWS for months and keeps operating normally.


Yeah, Twitter is totally comparable in that case, $50 a month is exactly the same thing as however many Millions of Dollars Twitter is paying to GCS/AWS with presumably custom contracts. But just to humor you: https://techcrunch.com/2023/06/14/twitter-is-being-evicted-f...


> many Millions of Dollars Twitter is paying to GCS/AWS

But they aren't paying anymore!


Apropos of nothing, $27k/mo for an office for 300 people!?


I can guarantee you they’re not ignoring emails from their providers, if this is the case.

Account management involves a lot of discussion and your account manager will keep the service running if they think the bill will eventually get paid.


I read the "dodgy, shaky" parts as their own setup was that, not digital oceans


Digital Ocean has an invoices API: https://docs.digitalocean.com/reference/api/api-reference/#o...

No need to use email at all.


I don’t think it’s unreasonable to expect that multiple channels of communication will be tried before cutting off service, especially not for services provided to a business.


For a business that makes checks notes 50 bucks a month in revenue with DO? If this were the case, prices would be astronomically high because there would be tons of scammers and bad actors abusing this system, leaving their servers running for months, because apparently DO has to send them a letter via snail mail to Antarctica and wait for their response. All of this would then have to be paid for by legitimate customers as for DO to break even, which means a simple droplet would probably cost around $50 alone. So no, thank you.

For larger businesses with thousands of dollars of revenue per month? Sure. For small $50 charges? No way.


Also, the more low level customer support workers you add to your cloud service, the more insecure it becomes, and typically the worse the support experience becomes (hiring people who actually have a good handle of how cloud software works ain’t cheap).

If you have an account manager or sales folk in the cloud org, IOW have a high spend, you get the twitter experience. If you don’t, and are a self-serve customer spending a pretty low amount like in this case and missed a payment and ignored the account emails reminding you to pay… what exactly is the provider supposed to do? I mean this in the nicest way possible but they aren’t so desperate for your business that they’re going to beat down your door so you can pay before you get shutdown. Important to keep in mind, if you don’t have a long payment history, you don’t look that different from a user that’s just trying to get something for free, and without paying for the communication channel, you also can resemble a company that just went bankrupt or something (which often creates a huge risk for the cloud provider if the cloud creds get sold by some disgruntled ex employee to crypto miners).

Source: I worked in this area at a major public cloud provider


"For small $50 charges? No way."

Why not? If the phone companies here can extend that courtesy to normal consumers then I find it ridiculous to suggest that a cloud provider couldn't afford to do the same.


That’s exactly how it works for providers in Germany… as a foreigner I find it hilarious. You do get back-charged on the extra costs, tho.


Why's it hilarious? If you stopped paying for something, how long should you get that thing for? What if you don't pay the back-charges?


I find it hilarious because I find it extremely dysfunctional .. for everyone involved and yet… it’s true. I can tell you from personal experience… my partner somehow signed up for a stupid accounting online tool that we never used .. and we just find out 2 years later because we received a letter from a debt collection agency asking for the 2 years we “owed” for a service we never actually used .. but they kept “providing” us. And somehow even tho we never paid .. they never stopped. We very likely had to pay a lot more than what we would have to if they had stopped providing the service at some point in the first place. So yeah.. it just baffles me to think this is how things are supposed to work in this country and neither the provider nor we could do anything about it.


Well, the service provider was ready to provide you with it. They kept your account, your data and ensured the service is available for you to use. Why should they care about whether you use the service or not? It's like renting an apartment, then never bothering to move in. Why should the landlord care whether you move in or not? After all, it's your responsibility to book services you use and cancel the ones you don't use. You made the contract, so you should pay for it. Nobody is forcing you to keep the contract beyond what's agreed upon, but you can't retroactively say "hey, by the way, I didn't live in that apartment for three years, can I please have my money back?"


I think you misunderstood what I was trying to say. My original point was that in Germany they are mandated to keep the lights on until I explicitly cancel the service regardless of wether I pay on time or not. I didn’t complain about them charging me for something I didn’t use.. I just find it ridiculous that they had to keep providing me the service even tho I didn’t paid. I thibk from a provider perspective it’s shitty to keep on the lights while not getting paid. And I find it laughable that’s how service providers work in Germany .. it’s difficult for both providers and clients alike to cancel a contract in Germany.. And that’s even if the client doesn’t pay… and I think that makes everything worse for everyone.


Not getting something you didn't pay for is not dysfunctional. It's what turned slavery-based, Feudal and Socialist countries into modern ones.

Your example seems to be unrelated, although it sounds frustrating.


But.. that’s not what happened. I was getting something I didn’t paid for .. that’s what I find dysfunctional. Because that means there’s a lot of extra effort in the country on getting unpaid services paid. My personal experience just adds the cherry on top that I was also not using it. So it was wasted effort on the provider and society in general.


> I was getting something I didn’t paid for .. that’s what I find dysfunctional

Fair enough, but that's why I was saying your experience seemed to not be relevant to the topic.


It's not as simple as that. Do you know how quickly a business loses customers without being flexible? It's also reputation damaging for the market to know that DO draws a hard line.


What is being flexible? They gave 1) a month of unpaid bills 2) emailed the customer N times 3) gave the option to enter a backup credit card. The customer made it so neither of these worked. I expect that DO deals with a lot of non-paying customers. I wouldn't put it on them to what - figure out where they live and drive up to their house?


They literally send an email every single day for a month. And they are flexible enough that if you do write them they will happily extend that grace period.

It’s like saying car is a shitty transportation service because it stops working once the gas runs out.

Not paying attention the the reserve light, instrument msg and pings is solely your problem. More so because you are running a business…


The car should arrange for a singing telegram to be delivered to your home, office AND your child’s daycare to notify you before shutting off its engine due to lack of fuel. In my country this is required by law.


Hilarious comment. Thanks for brightening my day.


Not exactly. Some countries have laws in place that prevent a provider from removing service due to non-payment to a business. Instead they must follow a process, including physical paper notices, before removing service.

DO should have done something like that as well, just cutting off service because you sent some emails is kinda shitty.


DO should have done something like that as well, just cutting off service because you sent some emails is kinda shitty.

DO should abide by the law of the country they operate in. If that means they can shut off services for non-payment then that's what they're entitled to do. By the sounds of it they gave plenty of warning and 30 days service they weren't getting paid for.

Email is the expected form of business communication in 2023. I don't think any of the hosts I use even have my physical address. Maybe they could SMS me. I doubt they have my phone number either though.

I would expect a billing warning notice when I SSH into the service though. Maybe that was in place for the 30 days before they shut the customer out. I certainly hope so.


> I would expect a billing warning notice when I SSH into the service though. Maybe that was in place for the 30 days before they shut the customer out. I certainly hope so.

I would not expect Digital Ocean to alter the MOTD configuration of my Linux box because I forget to check my emails. That would be an invasion of privacy in my opinion.


I SSH'd plenty. There was no billing notice.

I think that + allowing customers to add more than one credit card to an account are reasonable suggestions. Maybe they do that now - this was 7 years ago.

I'm not trying to shirk responsibility though. We didn't, for example, pass the buck by telling our customers our site was offline because of DO.


You did describe them as "dodgy" though.

Not proactively updating expired cards is dodgy. Supplying an unmonitored email address as a contact address is dodgy. Cutting service after non-payment, email warnings, and grace period is normal and expected.


How do you deal with customers that don't reply to email notifications and don't have a phone number on file?

I can appreciate how terrible must have been to suddenly find all your servers shut down, and I understand how easy it is to set up billing and forget, especially with the day to day stress of running a startup, but what would you have done if you were DO?

> our site was offline because of DO.

Or maybe it's because of the bank that expired your credit card?


These laws are usually for utilities providers to residential customers. The aim is to stop people in poverty freezing to death in winter, not to protect a business with poor operating practices.


> physical paper notices

What is this? Germany?


What countries are those?


If I could have my time again I'd focus on enablement of the broader engineering group through an incident management process, readiness exercises, some aggregate incident analyses for the org to learn from and leaning into observability.

Infrastructure is kind of a solved problem for common use cases today, just requires the expertise.

With our ducks in a row I'd next have look to a GRC function for the compliance bits whilst splitting the platform engineers time between embedding engagements and tooling investments.

You're on the right path man, I'd love to know what I know now back then but unfortunately time doesn't work like that.


> Infrastructure is kind of a solved problem for common use cases today, just requires the expertise.

This is the problem however for many (older) companies. They either don't care, or quite literally don't know the infrastructure solutions out there which can save literally thousands of hours per year of headache. Sure, for many companies with legacy systems they have a "dont fix what isn't broken" mindset, but from what I've seen, I always ask, if shipping and modifying new versions of a system takes hours or even days to complete, is the system really not 'broken'? I guess I never realized it, but having automated and clean infrastructure with tests and uptime metrics is a must-have for me on anything I build going forward. Take 2-3 weeks to save months of headache.


Thanks. Means a lot, particularly coming from you.


> We started on Heroku, because in 2012 if you did any Ruby on Rails tutorial that included deploying your app, you ended up with a Heroku account.

Couldn't be more accurate. Heroku and Rails is almost as much of a throwback as Node and Express. You just had to be there. And it was great. Web dev was always a hobby for me as a teen, but then I turned to the rails book as a means of learning a professionally designed system, when I wanted to get serious as a dev. It definitely served me well. Heroku, at the time, offered a very streamlined and accessible way to integrate rails. It was a great time to learn.


I sometimes wonder how my career (and life) would've differed if 11 years ago, I had chosen to learn rails instead of laravel because of familiarity with php via Wordpress. From there it went on to C# (windows), JS (frontend), scala/go/node.js backend + JS front end. Lots of JS.

Here being Japan, there were a lot of opportunities to pick up ruby/rails along the way but I stuck on the JS trajectory partially because I didn't want to 'start over' with rails _now_. There were only a few times when not knowing ruby/rails meant I was limited to specific tasks so it was never really career limiting. It did mean that I actively avoided working in ruby shops, for better or worse.

If I had picked rails instead then, would I have transitioned into more of a backend engineer with some frontend duties instead of the reverse?

Not that it matters, but there definitely is a tendency in our industry to 'look down' on frontenders as not real engineers and thus not consider them for leadership positions.


> Not that it matters, but there definitely is a tendency in our industry to 'look down' on frontenders as not real engineers and thus not consider them for leadership positions.

I think that's funny because there some to be a ton of backenders that can't do frontend at all. And then they want to look down on FE when they can't do it themselves? It's not just basic HTML and CSS if you're building a complex app.

I do both (FE & BE) so... I've seen it all and enjoy it all. Not sure one is easier than the other.


I had the same experience with some backend engineers, they constantly looked down on the front-end engineers, ridiculing JavaScript and CSS. And while all those frontend engineers picked up backend languages and became fullstack developers. Those backend engineers couldn't do the most basic frontend things to save their lives.


Meh. FE, BE, system, embedded; they all go deep. A good engineer can learn one or more in any order, but only fools look down on a discipline they don't understand.


To be fair to BE focused folks the Web exploded from being a bad document environment you did ugly hacks and cargo-culting on to make something interesting, to an actual platform at quite a high speed.

And additionally to this during the time it became viable a lots of FE folks still had to continue battling IE6 in their daily lives so online documentation still lagged and had a clear smell of the cargo-culting. Heck even today you see people complaining about Javascript here on HN.

But being left behind today, you gotta blame yourself. Early realtime Google-suggestions using AJAX came already back around 2005 (?) and if you didn't take notice and still missed people were doing decent realtime games by 2010 you were doing your best to live under a rock.


... while quite a few understand the fundamentals of infrastructure they consume, and end up in undesirable places (with major impact on performance, costs, or a combination of both). I fairly recently worked for a few months in a web3/blockchain/NFT/<other buzzwords>/crypto-wallet company, and the fact that I had to explain in ELI5 mode how diff types of databases (and how critical such a choice is), name resolution (a.k.a. DNS), global vs local load balancers of different types work, which measurements and logs to enable, and where, reality of redundancy and recovery times, on prem(NO - you cannot rely on moving VMs over L2 connected DCs!) and in the cloud, etc., to "app" (whatever end) folks, with no understanding how these fundamental services [kept] impact[ing] their clients, was an experience I would certainly not remember too dearly. Even a comprehensive explanation of email client headers reading, to an office full of FE and BE engineers, who failed our security tests (joyfully clicked on anything having a link, in a message, 'cause it was presented as an "API PoC"), was revealing in regards to the missing basics. /rant


Kernel/compiler/OS/formal verification engineer here.

This all seems silly.


I agree. My daily tasks are more FE than BE simply because there are more FE things to do.


It's not too late to go back! Rails and Laravel are still insanely popular and productive for getting work done.

I'm also disappointed that real frontend expertise is not valued as highly as a traditional backend engineer. Think about someone like an Architect. You could say that they are glorified artists (yes, I know they do more than that) and they are highly valued for their artistic input as well as their professional advice; but it's also not unheard of to have an architect solely provide artistic direction and a structural engineer provide support.


> Node and Express. You just had to be there. And it was great.

I agree and Node and Express are still great when going with just js. Node and Koa more now but still great. Socket.io for real-time. All are great for getting things up quickly, simple and shipping things.


For sure. If it ain't broke don't fix it. Node and express are still great. I still use them. Heroku and rails would still be great to this day, too.


Node and Express are definitely winners for my use case.


In context of your comment, what became better than Node and Express?


> At our first meeting our [AWS] account manager brought along his solutions architect. I had never met a solutions architect, so I didn’t really know what they did.

> Eventually I realised that I wasn’t the problem.

I had the same experience. These free "Solution Architects" are a disgrace to the architecture profession. They are sales person and are paid/rated/motivated to sell you up. Not what is best for you. Ours went over the in-house architect with zero feedback and told our CTO/CIO how shitty the architecture is. Fun. Stay away from them.


If something is free etc...

I've always taken the advice of any prepaid 'experts' with a huge grain of salt, not just AWS


I'm sure its interesting but I am making a conscious choice to not read or linger on user hostile articles. As soon as I want to scroll I am interrupted by a page takeover requiring action and subsequently compelling me to lose interest.

Did you want me to read it or click buttons for you? Too bad your UI gets in the way of the U.


Substack is definitely scummy in this regard, it's not really the author's choice (although a conscious decision was made to host it on Substack).


I read a Substack article almost every day and still consistently get a little jolt of rage when the overlay fades in.

I guess they do it because it works, and if the average person felt the same way, they wouldn't be doing it?


> if the average person felt the same way, they wouldn't be doing it?

You might think this, but I tend to doubt it's true... people making these decisions end up being in their own little bubble, and not really have a good idea what people's actual response is.

That said, I find Substack's pop-up tolerable: I am after all getting someone's work for free, and I'd rather have a simple pop-up that I know I can get rid of than ads or other aggressive forms of pop-ups.


Yeah I'm aware of it. But it's less annoying than Medium.


Firefox's reader view gets rid of all that on top of reducing other distractions


So does closing the tab.


Well I do not feel any of the 3 advice he gives, good advice.

First the longer you stay in heroku ,the most complex is it to exit it the time you really need it and the less flexible you are in the time being.

Second, wish he had pay for a pit team sooner, but could this money better used investing in marketing or sales like he probably did ?

The guy has obviously succeeded as a business owner, would it still be the case if he had implemented these advice ? We will never know, but what we know for sure is not implementing these advice made him successful


> We will never know, but what we know for sure is not implementing these advice made him successful

"Made him successful" implies a causality that is a bit too strong.

Indeed, we will never know for sure.

Maybe implementing these advice would have impeded development of other critical areas of his business.

Maybe it would have would have helped make is business more successful as he would have had a more reliable product.

Or maybe the business impact would have been neutral, but would have resulted in better quality of life/less stress for him and his employees.

But in general, the way I read this article is: they made good decisions overall, but as everything in the world, it was not optimal (switching platform too early, making some big mistakes like the credit card one, etc).

It's a very interesting read nonetheless, with clear take away:

* chose boring tech you know and focus on your product, not the tech, specially in the early days

* grow your infrastructure and complexity with your product needs

* accept you will mess-up but properly learn from it, and grow your organizational knowledge, structure and processes accordingly.


I think he meant, to stay with managed services as long as possible. Which makes sense, as in the early stage, the focus has to be the functionality rather than cost/performance optimisation.


I think our success is despite a lot of decisions (specifically the ones in the post), not because of them.


This really resonates. It's so easy to get caught up in chasing trends and overlook delivering customer value/stability.

Good luck on the next 11 years :)


I read this article with interest, having run a SaaS for the last 8 years or so (solo-founder). I find it intriguing that the OP needed so much compute power. Computers are really powerful these days.

As a counterpoint of sorts, here's my "hosting journey":

* run everything on a single physical server rented at Hetzner (DE)

* [... several years pass, business grows ...]

* switch to ansible, learn it, spend a week or so to write automation for a 3-server setup, also learn terraform and write terraform configs for setting up a Digital Ocean system from scratch

* run production on a 3-server setup at Hetzner DE, run a staging system at Hetzner FI, also serving as a possible quick manual failover, for a total of 6 physical servers, test re-initializing systems from scratch regularly, test setting up a Digital Ocean system from scratch regularly

* [... several years pass ...]

* that's it — I really can't see a need for more in the near future.

But then, my software is not in Ruby on Rails, and I have no experience with that platform. I use Clojure and ClojureScript and I was careful to design everything to be rather client-heavy at the start. I also never wanted to depend on PaaS systems, mostly to avoid lock-in, but also because I don't buy the "just use our magic database offering and forget about database problems forever" selling pitch. You can sweep possible problems and complexity under the rug and hide them, but you can't run away from them. I also do not use Postgres (collective gasp in the audience), because having a single centralized point of failure is not something I want in my setup.

I also never needed significant sysadmin/devops support. Granted, I do have some experience, but these things do not require dedicated teams, unless you are YouTube. A little ansible+terraform goes a long way, so does buying an hour of two of consulting from an experienced sysadmin.

Those physical servers that I use are significantly faster than the over-subscribed cloud VM instances that you usually get from AWS and the like. And they have 64GB of RAM, not some measly amounts. If I need more servers, it takes on the order of hours to get additional ones, but I'm not sure what I'd use them for.

My total hosting bill is on the order of 350€/month and is boringly predictable.


Just curious, if you feel comfortable sharing, what do you use as your database?

I'd agree Postgres is not the right answer to every problem, but the documentation of its failure modes and the mitigations thereof make it a "good" answer to "most" problems.

I'm curious what the "problem" (in access pattern terms, doesn't have to be business terms) and "solution" (i.e. persistence technology) is in your setup!


Sure. I use RethinkDB, which works well, but has been pretty much abandoned at this point. It's a good database, but it wasn't "in fashion" like many worse solutions were.

I am working on replacing it with FoundationDB. I want to have a fully distributed database with strict serializable semantics (see https://jepsen.io/consistency), and there is very little out there that gets the job done. FoundationDB is really impressive and works really well. I'm worried that it isn't "fashionable", though.

As for access patterns, I'm not sure if I understand the question, but I'll offer one thought: if you're writing an app, you don't need a "query language'. You'll quickly learn what your queries are, and the right approach is to restructure your data to fit your access patterns. Your "queries" will be written in your programming language of choice, not in the databases "query language".

I feel that the idea of a "query language" is stuck in our heads back from the days when the boss would come and tell you to produce a custom report from the database. It's just not how app databases are used these days.


Appreciate the glimpse!

Should have elaborated, by access patterns I was referring to proportion of readers to writers, the distribution of load over time, the distribution of transaction sizes, so on.

Although your discussion of query language is an interesting one – that the goal is essentially efficient (de-)serialization with "retrieval-from-other-process" costs that are minimal for your workload and subset of query space.


So what are the new Herokus? What do you recommended for starting a serious project? AWS AppRunner? How about something open source?


Azure App Service is a thing that everyone just skips over, listing competitors with an annual revenue smaller than some individual Azure App Service deployments...

I mean seriously, railway.app has just fifteen employees in total, and revenue in the single-digit millions.

Reminds me of when someone posted a "cloud storage vendors poster" with hundreds of vendors -- but not Azure -- some with revenues smaller than the cost of an Azure Storage Account one of my customers was using... to back up a single server. By accident.


I think Railway nails the user experience, but basically every time I use it something goes wrong in a small or a big way (build fails randomly, UI glitches, random 404s in the control panel that fix themselves minutes later).

Used it for a few clients but it's tough to keep using it with all the issues, so I'm looking at Render and other alternatives. But if Railway could smooth out the reliability / stability, I wouldn't have any reason to switch.


PS: I'm not calling out Railway specifically, I just like to laugh at how the second biggest public cloud is just a figment of our collective imaginations.


Check railway.app, fly.io, render.com

Possibly qovery.com as well.


fly.io is the least bad.

But honestly nothing these days is as good as Heroku was 10 years ago, if you just want to put something online and don't care how it gets there.


Vercel for serverless apps is very easy to do.


I see Render being mentioned on HN quite a bit as the new Heroku. I’m still using Heroku for my personal projects but have been meaning to check Render out.


Digital Ocean App Platform is really good for what I run :)


I'am a cofounder of withcoherence.com, a PaaS developer experience on top of your own cloud. We are not open-source, but we support both AWS and GCP and provide a free tier of usage. Happy to answer any questions, feel free to get in touch!


FL0.com is new and looks v solid


https://Flightcontrol.dev is a relatively new one that provides a PaaS-like experience on your own AWS account

*I'm cofounder & CEO


Can it be that between docker and digital ocean giving you a gig of ram for $6/mo, people are just DIYing it much more often these days?


It really makes me super bitter how people like the author of the article who basically fumble their way through stuff are still successful, even though they seem to have done a lot of stuff the wrong way.

Calling out D.O. for a fairly transparent process is very wrong. They did what they had to and OP was oblivious to a fairly standard process of "hey, your CC expired, let's do something about it before we turn off your servers" which OP seems to have ignored several times in a row.

All of the story felt like he was just stumbling from one hype to another, and from one easy to avoid mistake to another. Even when I had only 7 years of experience in total (now I have 21.5) I was paranoid enough to look for DB ID turn-over and absolutely would have made sure to upgrade the DB server when I start noticing it hits 80% load consistently.

But that's life for you. A technically excellent guy still lives paycheck to paycheck, meanwhile an absent-minded guy who is easily hyped has a successful business. [sighs deeply] ...Moving on.

Finally, choosing AWS but still only using EC2 is kinda non-intuitive for me; why do that at all? Maybe because they allow transparent upgrading that makes for less sysadmin work? If so then fair, but that's still like buying a Ferrari to drive on long empty roads but never go above 90 km/h.


This story resonates with me — apparently we are on phase two. And we need dedicated team members to take care of this. Thanks for sharing your lessons this was hugely valuable. Don’t let the negative comments get you down, when building things it’s hard and it’s all a big learning game. Congratulations on building a great piece of software.


Thanks! Reach out if you ever need a hand.


A recent observation, and this could be a biased one, you mentioned:

> we’ve spent a bit of time learning more about moving off the cloud to a managed data center. But the nice thing is not feeling like we need to.

That SaaS's are moving/thinking/completed to running their own infustructure away from managed services like digital ocean and aws, like 37Signal(Basecamp & Hey).


> We didn’t take advantage of any other platform features, we just treated AWS like any other VPS.

This is a lot of switching around for a plain VPS.


Yeah, if you aren’t going to use AWS managed services then what are you doing on AWS? EC2 is great but if that’s all your using your probably overpaying.


After 20 years of dev, having devops skills nearly maximized (with a AWS specialization), built/maintained/scaled many products, ... can completely relate. I really do see hostings costs differently now compared to earlier in my career, when I was more junior/midlevel where one still argues about "cool" tech/languages/architectures.

What you want from day one is a completely managed hosting for tech stack you work with, but are not hard tied to forever.

Right now I am starting some new/modern meteor.js stuff again, there is meteor galaxy for hosting that "just works" for that stack, you bring your own mongodb (I prefer atlas itself, they even have serverless now), and everything is taken care of, including CI/CD/Monitoring/... . Its like a few minutes of initial configuration, and never think about it again, and if done correctly you shold have horizontal autoscale of some kind automatically nowadays.

Yes, this is significantly more expensive than directly using AWS (which they use under the hood), and even though I am personally highly trained in this stuff+terraform/cdk/..., I don't want to have all this work anymore when I also can shell out a few hundred bucks per month instead. Just in case Galaxy becomes a problem for some reason, I _still_ can deploy the app stack to some VPS provider, but I'd use some already existing automation (like meteor-up in this case) instead of really digging into typical devops topics for it.

There is a bad feeling in the mouth as an engineer to shell out "more than needed" for infrastructure, but my rule of thumb now is that I am happy to eat that frog as long as the potential cost saving is less than 2 infrastructure engineer FTEs, thats my trigger to _maybe_ discussing it.


Get a 403 Forbidden when I visit https://www.tanda.co/


It seems that the new PIT team was unable to manage the HN traffic.


They should switch back to digitalocean when things were manangeable


Something that would be interesting to compare and contrast is actually using your own hardware to host the service(s). Servers are super-reliable these days, it doesn't feel like a huge step from the VPS approach to simply have a bunch of them.

At least one tradeoff is that you'd have to be physically in a location with adequate bandwidth / latency but for various use cases it might be very cost effective during some phase of the startup.

I wonder also how the AI induced focus on specialized hardware might change the calculus


great writeup and congrats on 11 years!


> The team had several people in every time zone

Kind of hard to believe


Crap sentence, but people in Australia, US, and UK can cover every time zone.


Alex! Great to see you on HN.


Hey mate!!


Those are such noob mistakes. No bigint OID, or worse, not UUIDv4, because even int64 will overflow, maybe not on their platform, but on a platform with millions of new inserts a day, despite being a large number, isn't future proof. If anyone knows a better 128bit alternative to UUIDv4, please respond.

Side remark, that's what I was trying to tell the dgraph guy. If you have an int64 ID for ALL your transactions and all your IDs it will eventually overflow. It's a single incrementing value for ALL actions. Inset a new record, new ID, retrieve this record, new ID etc.

Everything this guy writes about are newbie mistakes, stemming from a lack of "far sight", aka thinking 5 steps ahead. Thinking 5 steps ahead is what I had to learn along the way, because if you don't you end up with problems like these.

What annoys me is that I'm actually seeking a job now and that I can't find one that fits my needs, despite being way past those Kinderkrankheiten (child diseases), because those jobs nowadays require you to be an AWS zealot, which I'm not. I'm a follower of the holy church of K.I.S.S. keep it simple, stupid. Simple and organized, the 2 key pillars of good design. As simple as possible, but not more simple. Thinking 5 steps ahead is the hard and time consuming part.

I know this article is about Ruby. But as a Go zealot, which can handle 500 million visitors per month or over 5000 concurrent http1.1 connections on a 13 year old 4 core 32gb ram computer, I would NOT go AWS lambda and complicate my life and, most importantly, the development process or as I call it the solution delivery process.

This guy's problem is that he believed the hype, picked the wrong tool for the job and had no clue about infrastructure/operations.

There's a reason why Ruby isn't as popular as it was 10-15 years ago. Performance matters. The split between backend and frontend happened around 2013. Having a classic website with low performing Ruby, that also has such a bad workflow, e.g. Blog, you can't start with Comment, you have to start with Blog, then Post, then Comment. This is archaic. With Go and ent I start wherever I want and I get ×100 the performance. If I need to consume the data, if I don't care about SEO, I can write a SPA easily and the load is on the client. If I need SEO I'm back to the scaling problem, but the Go backend won't be the one that needs scaling. And should you magically expect more than 500 million visitors per month, simply adding load balancing and doing this for the database as well would do the trick, BAM, you've now tripled the amount of backend requests you can handle. Doing this for the SSR JS frontend is where it gets expensive. So either try to static render with Svelte, if possible, to solve this problem and load balance or get locked into complex auto scaling cloud nonsense. And this is where it gets REALLY expensive.

Your Ruby on Rails app might be good enough for a PoC, but that's about it. If you really expect large volumes of traffic, better pick a well performing language from the start and think about infrastructure. And for the love of god, don't pick k8s. Because when shit hits the fan, you won't be able to debug it. Keep it simple and transparent. You can have your cake and eat it too. Just think it through from the start to the end.


> What annoys me is that I'm actually seeking a job now and that I can't find one that fits my needs, despite being way past those Kinderkrankheiten (child diseases), because those jobs nowadays require you to be an AWS zealot, which I'm not

Perhaps its more to do with your attitude?

This comment comes across as very condescending and needlessly aggressive to someone admitting their mistakes made whilst growing their successful business and graciously sharing them so others can learn or commiserate on similar mistakes.


lambda is great for three things:

- scaling to zero for low use services

- reliably managing other infrastructure, like pools of ec2 or ephemeral ec2 spot

- getting 1000 cpu cores for 5 seconds to do latency sensitive data heavy lifting

the elephant in the room with cloud is always egress bandwidth billing. egress heavy apps just shouldn’t live on aws.

for me, the fun with aws is not figuring out how to make cloud lasagna and then write thought leader blogs about how tasty it is, but in understanding which parts of aws are actually better than alternatives, and how to compose them into good systems.

s3, dynamo, ec2, ec2 spot, lambda, r53. egress bandwidth aside, these things are for great good.

add in cf workers+r2 for egress bandwidth heavy components.


Your comment feels like one I would have written in my past, but I think there is some possible context or understanding I might be able to share based on some of my XP.

>No bigint OID, or worse, not UUIDv4, because even int64 will overflow, maybe not on their platform, but on a platform with millions of new inserts a day, despite being a large number, isn't future proof. If anyone knows a better 128bit alternative to UUIDv4, please respond.

Sane defaults are important. If the DB defaults to a particular value it's likely because it's fine for most use cases. It's an excellent thing to have enough customers and utilization to require something more than int for an ID. That said if there is little to no performance difference, I'd argue that this should be made the default upstream rather than the default recommendation or "best practice."

>Thinking 5 steps ahead is what I had to learn along the way, because if you don't you end up with problems like these.

Not every one is able to do that kind of systems level thinking across an entire product stack, and that's OK. What's more important is not building things you will never use, or might use one day but don't need today.

>I'm a follower of the holy church of K.I.S.S. keep it simple, stupid. Simple and organized, the 2 key pillars of good design. As simple as possible, but not more simple. Thinking 5 steps ahead is the hard and time consuming part.

I would spend more time thinking about how to market to folks who want to reduce their cloud investment. My current org is way over engineered both in infra and engineering in general. We have been working to reduce complexity and cost and it's had real big wins for us as an org. KIS is an excellent philosophy when applied pragmatically.

>I know this article is about Ruby.

The author could have replaced ruby with "Language-X" and the article would have been just as accurate. It's often not the language that's slow, more often it's the DB schema, data models, and business logic.

> But as a Go zealot, which can handle 500 million visitors per month or over 5000 concurrent http1.1 connections on a 13 year old 4 core 32gb ram computer

If all it has to do is reply "Hello World" plenty of languages and frameworks could hit similar numbers. But if all those frameworks are blocked by a de-optimized external request, it doesn't matter. Here, the authors app was def having DB scaling issues, this could happen with any language, runtime, or framework. It will happen sooner with certain combinations than others yes, but utilization and customer feedback are more important than early optimization.

>This guy's problem is that he believed the hype, picked the wrong tool for the job and had no clue about infrastructure/operations.

I think that's a harsh assessment. We all buy into some hype, and we all generally try to make the best decision with the information we have at the time. Instead work backwards, put yourself in the authors shoes, assume the idea at the time made sense, and then ask "What had to be true, or seem true at the time, to reach the conclusion they did?"

>There's a reason why Ruby isn't as popular as it was 10-15 years ago. Performance matters.

It matters, until it doesn't. https://www.tiobe.com/tiobe-index/ as an example, we see top spot held by a known "slow" language (certainly comparable to Ruby). JavaScript, PHP, and VB all beat out Go in utilization. Performance matters, is a statement that requires context. Sometime the "performance" of your developers matters more than your runtime.

>With Go and ent I start wherever I want and I get ×100 the performance. If I need to consume the data, if I don't care about SEO, I can write a SPA easily and the load is on the client.

And now the current trend is to start to reduce the amount of load on the client because turns out 2Mb of JS to render an SPA is not the best XP on a lot of devices and networks. It's important to understand who your users are, and the operating context they bring.

>And should you magically expect more than 500 million visitors per month, simply adding load balancing and doing this for the database as well would do the trick, BAM, you've now tripled the amount of backend requests you can handle.

Horizontal scaling isn't dependent on language. You need to design your API and choose your persistent storage correctly to make this task easy. But in reality, most folks could just use caching and be fine.

>Your Ruby on Rails app might be good enough for a PoC, but that's about it.

Based on what, your personal opinion? It's a perfectly fine language for a back-end service and I say that as someone who has no interest in learning Ruby.

>If you really expect large volumes of traffic, better pick a well performing language from the start and think about infrastructure.

Maybe. We use scala at work, because "it can handle large volumes of streaming data" and yet it's some of the slowest parts of our stack because it's hard for folks to "do right." Sometimes you need a language who's feature is easy to use and hire for.

>And for the love of god, don't pick k8s. Because when shit hits the fan, you won't be able to debug it. Keep it simple and transparent.

Yes and no. K8s is basically the next generation of LAMP for the cloud. It's fine, just understand what you are investing in and what it isn't. We use aws at work w/o k8s and it's a complex nightmare to manage in some respects. My k8s cluster at home however is fantastic and fun to tinker on, and causes me less stress with more uptime than my AWS clusters at work. I think it's more important to understand the problem space, and how a tool might fit in or not, than to hope it just solves problems magically.

The cloud isn't magic, it's just someone else's computers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: