When I was doing databases, I formulated this joke-theorem of modern computing: Whatever reasonable and widespread database/language you use to model your application, if you do things sensibly, it will work well. A corollary is that: 1. It's a lot more about taste and less about technology we may believe. 2. Good programmers can bend a non perfect tool to work well for a given use case, bad programmers will mess everything even with perfect tools, so in general it is better to spend time becoming better programmers than spend time learning all the new cool tools.
This idea, if true (and I believe it is) tells us how lucky we are compared than the past programmers. Advanced databases and languages are so advanced that they can do almost everything, even if they are not good at doing everything. And hardware and RAM is so generous today, that will mask a lot of potential issues unless you are at a big scale.
In terms of hardware issues, Ive had a single EC2 instance running application servers and MySQL (YUP - ran a decently traffic intensive site like this). All software was run on a virtual instance. I configured all the necessary things to run on startup of the machine and the worst thing AMZN ever did was occasionally reboot the host machine my instance was running on. I was never down and there were "planned restarts" that I'd get the odd email about.
In the event some real failure happened, I had a clone of the AMI and could easily dump it onto new hardware - after I knew about it of course.
All that said - that situation wasn't my Job - and if Im being paid to have uptime, Id better cover my ass and have a hot failover on standby.
Since Im speaking AWS here - of course I used EBS for the virtual filesystem such that a machine crash didn't lose my SSD based filesystem. As I'm sure many here know, you can run a database on an EBS backed filesystem (and I have to assume even mount a swap file there too). I thought it was crack-smoking a when someone suggested that could easily be done.
"But if its that simple - why is everyone running all these N tier setups and living with all this unnecessary complexity?!?!"
It's mostly right, but the problem lies in when the requests per second are not average. Even then most people just do some unscalable stuff on the cloud for minimum 10x the cost.
The industry really has become bloated. Now we have DevOps and Agile Coaches and Cloud Engineers and microservices and JavaScript Everywhere(TM).
It’s all bullshit. DevOps is really so complex that a medium-sized company needs an entire team?
Earlier today I was trying to set up some elaborate Postgres database backup and couldn’t get it to work so I just put the command in the cron tab and called it a day. Took 1 minute.
> DevOps is really so complex that a medium-sized company needs an entire team?
Depends on your goals, but most people probably don't need it.
> I just put the command in the cron tab and called it a day
Obviously this will lead to data loss if any data is written and then a corruption event or issue occurs in-between cron events. The road to over-complicated infrastructure is paved with mistakes and outages. The home-lab server running on a pi in my closet can use cron jobs and scripts, but at work we maintain higher standards because our data loss would cost more than a few extra days of engineer time.
Everything should be as simple as possible, but let's not pretend everyone is a chump because their engineering tolerances don't allow for data loss.
Where big iron might be something between a raspberry pi with an NVMe hat and a gaming PC.
My 6th gen i5 (4 cores) can render ~15k page views per second with database queries, or several times that serving cached pages from nginx. 150 requests/second wouldn't be noticeable if I were using it for gaming at the same time.
Another team's product at our company runs inside a Kubernetes cluster which consists of lots of microservices. CPU's are at 80% all the time. Redis, RabbitMQ, MySQL, Elastic Search etc. There's no number crunching, mostly CRUD with some business logic on top. It requires a separate operations team to manage it (otherwise this goldberg machine easily explodes).
A few days ago I asked our ops team, what's our rps? ~500. Out of curiosity, I ran a few benchmarks to stress test my monolithic pet project in Go with a single DB for everything and it easily handled 15k/s on my home computer, too, without any special optimizations. I understand its load is totally different from our production server, and my benchmark is most likely flawed, but the order of magnitude of difference is interesting.
I was told the production cluster also has additional 500 rps just for communication between the microservices. I suspect there's just too much overhead in their setup: communication between services, serialization of data between different DBs, etc. If it was a single modular monolith with a single DB, I suspect it could be faster and easier to maintain... Another issue is that they also sell it as an on-premise solution. And with the complexity growing, it now often fails to run on clients' infra.
We had pretty much the same deal at my last job. We had somewhere around 50 services that ought to have been one. Batch processes would take hours as they made 10s of millions of internal API calls (which could also fail of course) instead of running for a minute or two just doing the work with a couple SQL queries. Who knows how many hours wasted debating the boundaries and contracts between services when those layers didn't need to exist in the first place. Millions a year spent on hardware. All in the name of "scalability" when we also had a few hundred rps, and the thing obviously had scaling issues exactly because it was being done that way. Very frustrating.
Not to disagree, but if your company has around 300,000 LoC in business logic and your pet project has 1,000 LoC, the processing times are going to be significantly different.
It has more business logic, true, but the average request is just the usual "retrieve an object from the DB, modify a few columns, and save back". Just like in my project. They have a higher LoC because they have more use cases, but a single use case alone is not very complex and is comparable to my project's code in its complexity. So I think it's fair to compare the two when it comes to rps.
Unless it's a consultancy project it's less about billable hours and more about internal busywork and resume fodder.
In a ZIRP/startups environment this kind of BS was rewarded by the market and thus "engineers" would take any opportunity to build up such unproductive over-engineering skills to polish up their resume for the next opportunity.
> Where big iron might be something between a raspberry pi with an NVMe hat and a gaming PC.
At my last startup (with ~3 technical employees), I was talking to the CTO about our cloud spend and planning for properly scaling infra after we got our first customer. We weren't "a SaaS website", but (hand-waving) we did complex physics modeling and analytics on data streams. The (physicist) CTO had heard of k8s and cloud stuff and wanted me to investigate how we should scale our cloud. He was convinced it'd be expensive to do "all those physics calculation with 64 bit numbers".
I showed him our entire cloud - gateway, database(s), physics modeling, metrics collection, log aggregation, etc - running at 1000x estimated 1-customer load from a MacBook. I offered to set him up a personal raspberry pi to stress test before a k8s cluster.
We ended up with a sensible single medium EC2 instance running everything, with some extra stuff for fail-over. AFAIK the only change made after I left was using a cloud-vendor DB.
There are some caveats to that, like you can't get anywhere near that with TLS unless you have hardware offload (I've seen claims that it's ~1% overhead, which seems to be 15 years out of date. For cached content, TLS handshakes cause more like a 90% reduction in throughput on my computer). Unfortunately Intel segments this, so that offload isn't on consumer chips. You might be able to use a cheap firewall device with QAT as a reverse proxy to terminate TLS though (it is on some cheap atom processors). I don't have one to measure what it can do. Obviously that's not something a large corporation would do, but for a small bootstrapped project, it ought to take you way farther than you could need.
Also batching database queries is important for high throughput.
The best things that's happening in the Postgres ecosystem right now is the separation of compute and storage. Breaking that link opens up Postgres as a generic query processing engine that can sit in font of multiple different table and storage types. It allows joining between all these different types of data. It really will enable you to use Postgres as generic database for everything.
My theory is that every new database tech starts and a new domain specific db, then over time gets merged into the generic engines to serve the 99%.
I failed a system design interview for saying use postgres for the whole solution. App data to support about a millions users and a job/task table to support asynchronous tasks. The throughput was a few tasks per user per day.
I feel they were expecting a more complex solution with kafka queues.
He! Interviews don't go well for saying "just use postgres"
One thing that postgres does not solve very well is reactive UIs. Postgres does have listen/notify but it requires much more boilerplate and infrastructure to set up than Firebase/Firestore. For example I'd like to be able to run a query and get notified that the results have updated when any of the affected rows change.
Solving this is high on my list with PGlite (https://github.com/electric-sql/pglite). I have a bunch of iteas/thoughts that I hope to get to at some point (IVM, streaming queries, result diffs).
When PGlite (or SQLite) is used with Electric (https://electric-sql.com/), we already provide good reactive primitives. But we hope to improve this so that where possible the full queries don't have to be re-run.
Agree. If you want a fully reactive database (and not just hierarchy-based reactivity like Firebase) but want to keep your ACID + relational data modeling, check out https://convex.dev .
Htmx doesn't help with this problem (knowing whether the underlying data has changed in order to update the UI). It's a common problem in any kind of collaborative app.
Replace Redis by using UNLOGGED tables and TEXT as a JSON data type in PostgreSQL. Use PostgreSQL with SKIP LOCKED for message queuing instead of Kafka. https://www.mycoveredcalifornia.com
Coming from Timescale, the proposal is not to use the best kind of database for a workload, but to use one database but then multiple extensions for workloads that Postgres may or not be the best fit, moving the multiple layer one level down. And maybe not getting the best performance (database people are specially sensitive on that topic, for heavy loads at least), that may or not introduce incompatibilities, inconsistency or compromise stability.
Shaving complexity is a good goal, but is partially negated by adding it again closer to the core.
Anyway, maybe for most (in unique installations, or companies, not amount of servers dedicated) the workload won’t be heavy enough to deserve the extra complexity of handling many different database servers.
Postgres is the only RDBMS that is not backing companies into corners because so many products are forks of it or "speak" Postgres. You can completely swap out your database without changing any of your code. There is no other database that lets you shop around (or upgrade to a more specialized product) like that.
I would much rather inherit and optimize someone's postgres DB with specific scaling challenges than a prematurely optimized multi-database solution built for "web scale".
And on the third hand, Timescale is not advised to be your production OLTP if you're using Timescale cloud, so it's a strange double-standard that they are communicating here since they cluster totally differently than normal OLTP postgres workloads? Or maybe I skimmed it to fast and haven't used Timescale in a while
100s of engineers all coding against the same Postgres database becomes a pain. Lack of decentralization leads to silly outages due to people sharing resources they really shouldn't.
Not the parent but I've seen this scenario a few times. Essentially what you get is random groups of devs (often you never even knew they existed or were messing with your DB) who mount a DoS attack (not their intention but that's the result). Long lived transactions, unindexed queries, unexpected deadlocks, etc etc. Worst thing is their reaction is that your database sucks. So no amount of education will fix the problem. They think you're the problem. So you end up with a defensive architecture where the blast radius for any single act of stupidity is limited.
In my experience it's really hard to prevent issues like this. RDBMS's kind of push you to keeping everything in the same database to take advantage of foreign keys and other features when in reality different data types will need their own source of record. I consider myself more experienced than most and I still fell into this trap in recent memory, failing to expect the scaling requirements for one of my tables until it was too late.
I understand the sentiment here (and agree PSQL will get you quite far) but this post is completely devoid of any actual content/meaning, examples etc. This is a literal shitpost.
James did put a lot of thought into this post. I saw multiple iterations of it before it was published. I think calling it a shitpost is being unkind to him.
I think we just find that some developers don't like reading long posts, so he kept this one short :-)
But if you want something with more length/depth, here is another one we recently published:
Author here, this is obviously a high level post! There are many, many posts about how Postgres with different extensions can get you different things.
I didn't want to go down that path for this article because I didn't want it to end up as a sales pitch. The article you describe needs to be written, but it's a follow-up in my mind (and much more technical, for a slightly different audience).
One of the biggest concerns about any DB is speed of query execution.
Today a decent app/site can generate terabytes of machine data that you need to analyze.
There is a massive difference in usability of a query that takes < 5 secs vs even 5 minutes. So, some benchmarks of speed vs major use cases would have been ideal.
I understand, general purpose DBs can't be the best, but I can compromise for 80%. If it's 50% or less, then I really have to question that choice.
If you're generating terabytes of analytical data you should be using a different stack for analyzing that than you do for running your production workload.
(You could still use PostgreSQL, just not the same PostgreSQL servers that are running your user-facing features.)
Postgres storage engines are not mature yet to allow higher through put for insert heavy scenarios (That are not necessarily analytics focused). They also don't have a way to offload storage to cheaper storage.
Orioledb, pg_analytics, hydra are getting there, but nothing out of the box postgres yet.
Compression also nearly non-existent except for TOAST.
This is very much a work-in-progress in the ecosystem, though. We (ParadeDB) also have `pg_lakehouse`, which is a version of `pg_analytics` designed to work over cloud object stores (AWS S3, GCS, etc.) for "cheaper storage".
I don't know if columnar engines will make it to core Postgres (I doubt it), but I do see the work of OrioleDB making it to core someday. We've joined an "alliance", alongside the core OrioleDB folks and some people from Timescale and Yandex, to push for it.
This idea, if true (and I believe it is) tells us how lucky we are compared than the past programmers. Advanced databases and languages are so advanced that they can do almost everything, even if they are not good at doing everything. And hardware and RAM is so generous today, that will mask a lot of potential issues unless you are at a big scale.