Yep, there's a premium on making your architecture more cloudy. However, the best point for Use One Big Server is not necessarily running your big monolithic API server, but your database.
Use One Big Database.
Seriously. If you are a backend engineer, nothing is worse than breaking up your data into self contained service databases, where everything is passed over Rest/RPC. Your product asks will consistently want to combine these data sources (they don't know how your distributed databases look, and oftentimes they really do not care).
It is so much easier to do these joins efficiently in a single database than fanning out RPC calls to multiple different databases, not to mention dealing with inconsistencies, lack of atomicity, etc. etc. Spin up a specific reader of that database if there needs to be OLAP queries, or use a message bus. But keep your OLTP data within one database for as long as possible.
You can break apart a stateless microservice, but there are few things as stagnant in the world of software than data. It will keep you nimble for new product features. The boxes that they offer on cloud vendors today for managed databases are giant!
> Seriously. If you are a backend engineer, nothing is worse than breaking up your data into self contained service databases, where everything is passed over Rest/RPC. Your product asks will consistently want to combine these data sources (they don't know how your distributed databases look, and oftentimes they really do not care).
This works until it doesn't and then you land in the position my company finds itself in where our databases can't handle the load we generate. We can't get bigger or faster hardware because we are using the biggest and fastest hardware you can buy.
Distributed systems suck, sure, and they make querying cross systems a nightmare. However, by giving those aspects up, what you gain is the ability to add new services, features, etc without running into scotty yelling "She can't take much more of it!"
Once you get to that point, it becomes SUPER hard to start splitting things out. All the sudden you have 10000 "just a one off" queries against several domains that are broken by trying carve out a domain into a single owner.
I don't know what's the complexity of your project, but more often than not the feeling of doom coming from hitting that wall is bigger than the actual effort it takes to solve it.
People often feel they should have anticipated and avoid the scaling issues altogether, but moving from a single DB to master/replica model, and/or shards or other solutions is fairly doable, and it doesn't come with worse tradeoffs than if you sharded/split services from the start. It always feels fragile and bolt on compared to the elegance of the single DB, but you'd also have many dirty hacks to have a multi DB setup work properly.
Also, you do that from a position where you usually have money, resources and a good knowledge of your core parts, which is not true when you're still growing full speed.
I can't speak for cogman10, but in my experience when you start to encounter issues of hitting the limit of "one big database" you are generally dealing with some really complicated shit and refactoring to dedicated read instances, shards, and other DB hacks are just short term solutions to buy time.
The long term solutions end up being difficult to implement and can be high risk because now you have real customers (maybe not so happy because now slow db) and probably not much in house experience for dealing with such large scale data; and an absolute lack of ability to hire existing talent as the few people that really can solve for it are up to their ears in job offers.
The other side of this is once you actually can’t scale a single DB the project has proved it’s value and you have a solid idea what you actually want.
Designing let alone building something scaleable on the other hand is a great way to waste extreme effort up front when it’s completely superfluous. That’s vastly more likely to actually kill a project than some growing pains especially when most projects never scale past a single reasonably optimized database.
You're not wrong. Probably more than 95% of applications will never outgrow one large relational database. I just think that this leads to an unfortunate, but mostly inevitable issue of complexity for the few that do hit such a level of success and scale.
Alex DeVrie (author of 'The DynamoDB Book') discusses that his approach is to essentially start all new projects with DynamoDB.
Now I don't really agree with him, yet I can't fully say he's wrong either. While we won't need it most of the time, reaching for a tool like this before we need it provides more time to really understand it when/if we reach that point.
@ithrow, yeah I know he is clearly biased which is why I don't really agree with him. I do however think it would have helped me to start using/learning before I needed it since the paradigm is so foreign to the relational model that is now second nature.
DynamoDB (and Mongo) is nice, right up until you need those relations. I haven’t found a document oriented database that gives me the consistency guarantees of a RDBMS yet.
You must not have looked at MongoDB. We have been delivering fully consistent ACID transactions since 4.0 which shipped several years. Yes, Jepsen did find some issues with the initial release of ACID transactions and yes, we fixed those problems pretty rapidly.
> Jepsen evaluated MongoDB version 4.2.6, and found that even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations.
1 Updates
2020-05-26: MongoDB identified a bug in the transaction retry mechanism which they believe was responsible for the anomalies observed in this report; a patch is scheduled for 4.2.8.
Your initial claim was that these issues were addressed in 4.0.
Jepsen's report refutes your claim,and demonstrates MongoDB had serious reliability problems even in 4.2.6.
Frankly, your insistence in pulling the wool over everyone's eyes, specially on a topic that's easily verified, does not help built up trust on MongoDB
I can see the source of confusion. Apologies. I mentioned ACID transactions were released in 4.0 but did not explicitly mention when the problems arose which of course was in 4.2 which was actually released a year later. The version numbers are clearly referenced in the Jepsen article.
This is the core culture of MongoDB - cutting corners to optimise things a little more and cater to a NoSQL crowd. It's entire mindset is fundamentally different from what you'd get in a proper relational database and ignoring those things isn't going to do any software you write any favours.
It's been a long time since I've used Mongo so I don't know if it only supports eventual consistency, but DynamoDB does support transactions and traditional consistency, but it comes at the cost of reduced read throughput.
DynamoDB also supports relations, but they aren't called relations because they don't resemble anything like relations in traditional relational databases.
You may already know this, but just to clarify DynamoDB isn't really a document oriented database. It's both a key/value database and a columnar database, so in that sense I'd closer to Redis and Cassandra than Mongo, but there's definitely a lot of misinformation on this front.
> The long term solutions end up being difficult to implement and can be high risk because now you have real customers (maybe not so happy because now slow db) and probably not much in house experience for dealing with such large scale data; and an absolute lack of ability to hire existing talent as the few people that really can solve for it are up to their ears in job offers.
This is a problem of having succeeded beyond your expectations, which is a problem only unicorns have.
At that point you have all this income from having fully saturated the One Big Server (which, TBH, has unimaginably large capacity when everything is local with no network requests), so you can use that money to expand your capacity.
Any reason why the following won't work:
Step 1: Move the DB onto it's own DBOneBigServer[1]. Warn your customers of the downtime in advance. Keep the monolith as-is on the current OriginalOneBigServer.
Step 2: OriginalOneBigServer still saturated? Put copies of the monolith on separate machines behind a load-balancer.
Step 3: DBOneBigServer is still saturated, in spite of being the biggest Oxide rack there is? Okay, now go ahead and make RO instances, shards, etc. Monolith needs to connect to RO instances for RO operations, and business as usual for everything else.
Okay, so Step 3 is not as easy as you'd like, but until you get to the point that your DBOneBigServer cannot handle the loads, there's no point in spending the dev effort on sharding. Replication doesn't usually require a team of engineers f/time, like a distributed DB would.
If, after Step 3, you're still saturated, then it might be time to hire the f/time team of engineers to break up everything into microservices. While they get up to speed you're making more money than god.
Competitors who went the distributed route from day one have long since gone out of business because while they were still bugfixing in month 6, and solving operational issues for half of each workday (all at a higher salary) in month 12, and blowing their runway cash on AWS for the first 24 months, you had already deployed in month 2, spending less than they did.
I guess the TLDR is "don't architect your system as if you're gonna be a unicorn". It's the equivalent of you, personally, setting your two-year budget to include the revenue from winning a significant lottery.
You don't plan your personal life "just in case I win the lottery", so why do it with a company?
^ This. Not so long ago, I had worked in the finance department of a $350M company as one of the five IT guys and we had just begun implementing Step 2, after OriginalOneBigServer had shown its limits. DBOneBigServer was really big though, 256 GB RAM and 128 cores if I remember correctly. So big in fact that I implemented some of my ETL tasks as stored SQL procedures to be run directly on the server. The result? A task that would easily take a big fraction of OneBigServer memory and 15 hours (expected to increase correlatedly with the revenue) is run in 30 minutes.
It's worth noting that when I left we still were nowhere close to saturate DBOneBigServer.
Maybe unicorn is not the right word? If your app has millions of DAUs choking your DB, you should at least be tacking your next big investment round or some other success milestone.
Otherwise, your product is on it's way to failure, so good thing you did One Big DB...
These services didn’t need additional rounds of funding and aren't the kind of thing that would scale like a unicorn.
Some services might only been transient (like services based around a particular sports league or TV series) or be regional (like government sites or, also, sports leagues).
Not every service out there has aspirations to “change the world”. Some exist to fill a niche. But sometimes that “niche” still covers millions of people.
> I don't know what's the complexity of your project, but more often than not the feeling of doom coming from hitting that wall is bigger than the actual effort it takes to solve it.
We've spent and failed at multiple multi year projects to "solve" the problem. I'm sure there are more simple problems that are easier to disentangle. But not in our case.
I can share some. Had a similar experience as the parent comment. I do support "one big database" but it requires a dedicated db admin team to solve the tragedy of the commons problem.
Say you have one big database. You have 300 engineers and 30-50 product managers shipping new features every day accountable to the C-Suite. They are all writing queries to retrieve the data they want. One more join, one more N+1 query. Tons of indexes to support all the different queries, to the point where your indexes exceed the size of your tables in many cases. Database maintenance is always someone else's problem, because hey, it's one big shared database. You keep scaling up the instance size cause "hardware is cheap". Eventually you hit the m6g.16xlarge. You add read replicas. Congratulations, Now you have an eventually consistent system. You have to start figuring out which queries can hit the replica and which ones always need the fresh data. You start getting long replication lag, but it varies and you don't know why. If you decide to try to optimize a single table, you find dozens or 100+ queries that access it. You didn't write them. The engineers who did don't work here anymore....
I could go on, and all these problems are certainly solvable and could have been avoided with a little foresight, but you don't always have good engineers at a startup doing the "right thing" before you show up.
I think this hits the nail right on the head, and it's the same criticism I have of and article itself: the framing is that you split up a database or use small vms or containers for performance reasons, but that's not the primary reason these things are useful; they are useful for people scaling first and foremost, and for technical scaling only secondarily.
The tragedy of the commons with one big shared database is real and paralyzing. Teams not having the flexibility to evolve their own schemas because they have no idea who depends on them in the giant shared schema is paralyzing. Defining service boundaries and APIs with clarity around backwards compatibility is a good solution. Sometimes this is taken too far, into services that are too small, but the service boundaries and explicit APIs are nonetheless good, mostly for people scaling.
> Defining service boundaries and APIs with clarity around backwards compatibility is a good solution.
Can't you do that with one big database? Every application gets an account that only gives it access to what it needs. Treat database tables as APIs: if you want access to someone else's, you have to negotiate to get it, so it's known who uses what. You don't have to have one account with access to everything that everyone shares. You could
It would be easier to create different databases to achieve the same thing. Those could be in the same database server, but clear boundaries is the key.
Indeed! And functions with security definers can be useful here too. With those one can define a very strict and narrow API that way, with functions that write or query tables that users don't have any direct access to.
Look at it as an API written in DB functions, rather than in HTTP request handlers. One can even have neat API versioning through, indeed, the schema, and give different users (or application accounts) access to different (combinations of) APIs.
The rest is "just" a matter of organizational discipline, and a matter of teams to internalize externalities so that it doesn't devolve into a tragedy of the commons — a phenomenon that occurs in many shapes, not exclusively in shared databases; we can picture how it can happen for unfettered access to cloud resources just as easily.
But here's the common difference: through the cloud, there's clear accounting per IOP, per TB, per CPU hour, so incentive to use resources efficiently is can be applied on a per-team basis — often through budgeting. "Explain to me why your team uses 100x more resources than this other team" / "Explain to me why your team's usage has increased 10-fold in three months".
Yet there's no reason to think that you can only get accounting for cloud stuff. You could have usage accounting on your shared DB. Does anyone here have experience with any kind of usage accounting system for, say, PostgreSQL?
I think we're getting hung up on database server vs. database as conceptual entity. I think separation between the entities is more important (organizationally) and don't think it matters as much whether or not the server is shared.
These are real problems, but there can also be mitigations, particularly when it comes to people scaling. In many orgs, engineering teams are divided by feature mandate, and management calls it good-enough. In the beginning, the teams are empowered and feel productive by their focused mandates - it feels good to focus on your own work and largely ignore other teams. Before long, the Tragedy of the Commons effect develops.
I've had better success when feature-focused teams have tech-domain-focused "guilds" overlaid. Guilds aren't teams per-se, but they provide a level of coordination, and more importantly, permanency to communication among technical stakeholders. Teams don't make important decisions within their own bubble, and everything notable is written down. It's important for management to be bought in and value participation in these non-team activities when it comes to career advancement (not just pushing features).
In the end, you pick your poison, but I have certainly felt more empowered and productive in an org where there was effective collaboration on a smaller set of shared applications than the typical application soup that develops with full team ownership.
In uni we learnt about federated databases, i.e multiple autonomous, distributed, possibly heterogeneous databases joined together by some middleware to service user queries. I wonder how that would work for this situation, in the place of one single large database.
Federated databases are never usually mentioned in these kind of discussions involving 'web scale'. Maybe because of latency? I don't know
Sure. My point is that the organization problems are more difficult and interesting than the technical problems being discussed in the article and in most of the threads.
Introducing an enormous amount of overhead because training your software engineers to use acceptable amounts of resources instead of just accidentally crashing a node and not caring is a little ridiculous.
For whatever reason I've been thrown into a lot of companies at that exact moment when "hardware is cheap" and "not my problem" approaches couldn't cut it anymore...
So yes, it's super painful, and requires a lot of change in processes, mindsets, and it's hard to get everyone to understand things will get slower from there.
On the other end, micro-services and/or multi-DB is also super hard to get right. One of the surprise I had was all the "cache" that each services started silently adding on their little island when they realized the performance penalty they had from fetching data from half a dozen services on the more complicated operations. Or the same way DB abuse from one group could slow down everyone, and service abuse on the core parts (e.g. the "user" service) would impact most of the other services. More that a step forward, it felt a lot like a step sideways and continuing to do the same stuff, just in a different way.
My take from it was that teams that are good at split architectures are also usually good at monolith, and vice-versa. I feel from the parent who got stuck in the transition.
Sure, you'll get to m6g.16xlarge; but how many companies actually have oltp requirements that exceed the limits of single servers on AWS, eg u-12tb1.112xlarge or u-24tb1.metal (that's 12-24tb memory)?
I think these days the issues with high availability, cost/autoscaling/commitment, "tragedy of the commons", bureaucracy, and inter-team boundaries are much more likely to be the drawback than lack of raw power.
You do not need that many database developers, it's a myth. Facebook has 2 dedicated database engineers managing it. I work in United Nations, there is only 1 dedicated database developer in 1000+ team.
If you have a well designed database system. You do not need that many database engineers.
I do not disagree at all that what you are describing can happen. What I'm not understanding is why they're failing at multi year attempts to fix this.
Even in your scenario you could identify schemas and tables that can be separated and moved into a different database or at maturity into a more scalable NoSQL variety.
Generally once you get to the point that is being described that means you have a very strong sense on the of queries you are making. Once you have that it's not strictly necessary to even use a RDBMS, or at the very least, a single database server.
> Even in your scenario you could identify schemas and tables that can be separated and moved into a different database or at maturity into a more scalable NoSQL variety.
How? There's nothing tracking or reporting that (unless database management instrumentation has improved a lot recently), SQL queries aren't versioned or typechecked. Usually what happens is you move a table out and it seems fine, and then at the end of the month it turns out the billing job script was joining on that table and now your invoices aren't getting sent out.
> Generally once you get to the point that is being described that means you have a very strong sense on the of queries you are making.
No, just the opposite; you have zillions of queries being run from all over the case and no idea what they all are, because you've taught everyone that everything's in this one big database and they can just query for whatever it is they need.
It's resource intensive - but so is being in a giant tarpit/morass. Adding client query logging is cheaper and can be distributed. I just double checked, and neither Oracle nor Postgres warn 'never use it in production'
And if you have logs, you can see what actually gets queried, and by whom, and what doesn't get queried, and by whom.
That will also potentially let you start constructing views and moving actual underlying tables out of the way to where you can control them.
Which can let you untangle the giant spaghetti mess you're in.
But then, that's just me having actually done that a few times. You're welcome to complain about how it's actually unsolvable and will never get better, of course.
> It's resource intensive - but so is being in a giant tarpit/morass.
Agreed, but it means it's not really a viable option for digging yourself out of that hole if you're already in it. Most of the time if you're desperately trying to split up your database it's because you're already hitting performance issues.
> Adding client query logging is cheaper and can be distributed.
Right, but that only works if you've got a good handle on what all your clients are. If you've got a random critical script that you don't know about, client logging isn't going to catch that one's queries.
> But then, that's just me having actually done that a few times. You're welcome to complain about how it's actually unsolvable and will never get better, of course.
I've done it a few times too, it's always been a shitshow. Query logging is a useful tool to have in some cases but it's often not an option, and even when it is not a quick or easy fix. You're far better off not getting into that situation in the first place, by enforcing proper datastore ownership and scalable data models from the start, or at least from well before you start hitting the performance limits of your datastores.
If you are in the hole where you really cannot add load to your database server but want to log the queries, there is a technique called zero impact monitoring where you literally mirror the network traffic going to your database server, and use a separate server to reconstruct it into query logs. These logs identify the queries that are being run, and critically, who/what is running them.
I've seen this too. I guess 50% of query load were jobs that got deprecated in the next quarterly baseline.
It felt a system was needed to allocate query resource to teams, some kind of tradeable tokens that were scarce maybe, to incentivise more care and consciousness of the resource from the many users.
What we did was have a few levels of priority managed by a central org. It resulted in a lot of churn and hectares of indiscriminately killed query jobs every week, many that had business importance mixed in with the zombies.
Do you think it would make it better to have the tables hidden behind an API of views and stored procedures? Perhaps a small team of engineers maintaining that API would be be able to communicate effectively enough to avoid this "tragedy of commons" and balance the performance (and security!) needs of various clients?
This is so painfully painfully true. I’ve seen in born out personally at three different companies so far. Premature splitting up is bad too, but I think the “just use one Postgres for everything” crowd really underestimate how bad it gets in practice at scale
Maybe it’s all a matter of perspective? I’ve seen the ‘split things everywhere’ thing go wrong a lot more times than the ‘one big database’ thing. So I prefer the latter, but I imagine that may be different for other people.
Ultimately I think it’s mostly up to the quality of the team, not the technical choice.
I’ve seen splitting things go bad too. But less often and to a lesser degree of pain than mono dbs - a bad split is much easier to undo than monodb spaghetti.
However I think it’s “thou shall” rules like this blog post that force useless arguments. The reality is it depends, and you should be using your judgement, use the simplest thing (monodb) until it doesn’t work for you, then pursue splitting (or whatever). Just be aware of your problem domain, your likely max scale, and design for splitting the db sooner than you think before you’re stuck in mud.
And if you’re building something new in an already-at-scale company you should perhaps be starting with something like dynamo if it fits your usecase.
We have over 200 monolith applications each accessing overlapping schemas of data with their own sets of stored procedures, views, and direct queries. To migrate a portion of that data out into it's own database requires, generally, refactoring a large subset of the 200 monolith apps to no longer get all the data in one query, but rather a portion of the data with the query and the rest of the data with a new service.
Sharding the data is equally difficult because even tracing who is writing the data is spread from one side of the system to the next. We've tried to do that trough an elaborate system of views, but as you can imagine, those are too slow and cover too much data for some critical applications so they end up breaking the shard. That, in and of itself, introduces additional complexity with the evolution of the products.
Couple that with the fact that even with these solutions, getting a large portion of the organization is not on board with these solutions (why can't we JUST buy more hardware? Get JUST bigger databases?) and these efforts end up being sabotaged from the beginning because not everyone thinks it's a good idea (And if you think you are different, I suggest just looking at the rest of the comments here in HN that provide 20 different solutions to the problem some of which are "why can't you just buy more hardware?")
But, to add to all of this, we also just have organizational deficiencies that have really harmed these efforts. Including things like a bunch of random scripts checked into who knows where that are apparently mission critical and reading/writing across the entire database. General for things like "the application isn't doing the right thing, so this cron job run every Wednesday will go in and fix things up" Quiet literally 1000s of those scripts have been written.
This isn't to say we've been 100% unsuccessful at splitting some of the data into it's own server. But, it's a long and hard slog.
>Including things like a bunch of random scripts checked into who knows where that are apparently mission critical and reading/writing across the entire database.
This hits pretty hard right now, after reading this whole discussion.
When there is a galaxy with countless star systems of data its good to have locality owners of data who publish for their usage as domain leaders, and build a system that makes subscription and access grants frictionless.
100% agreed and that's what I've been trying to promote within the company. It's simply hard to get the momentum up to really affect this change. Nobody likes the idea that things have to get a little slower (because you add a new layer between the data) before they can get faster.
fwiw hacking hundreds of apps literally making them worse by fragmenting their source of record doesn't sound like a good plan. it's no surprise you have saboteurs, your company probably wants to survive and your plan is to shatter its brain.
outside view: you should be trying to debottleneck your sql server if that's the plan the whole org can get behind. when they all want you to succeed you'll find a way.
> fwiw hacking hundreds of apps literally making them worse by fragmenting their source of record doesn't sound like a good plan. it's no surprise you have saboteurs, your company probably wants to survive and your plan is to shatter its brain.
The brain is already shattered. This wouldn't "literally make them worse", instead it would say that "now instead of everyone in the world hitting the users table directly and adding or removing data from that table, we have one service in charge of managing users".
Far too often we have queries like
SELECT b.*, u.username FROM Bar b
JOIN users u ON b.userId = u.id
And why is this query doing that? To get a human readable username that isn't needed but at one point years ago made it nicer to debug the application.
> you should be trying to debottleneck your sql server if that's the plan the whole org can get behind.
Did you read my post? We absolutely HAVE been working, for years now, at "debottlenecking our sql server". We have a fairly large team of DBAs (about 30) who's whole job is "debottlenecking our sql server". What I'm saying is that we are, and have been, at the edge (and more often than not over the edge) of tipping over. We CAN'T buy our way out of this with new hardware because we already have the best available hardware. We already have read only replicas. We already have tried (and failed at) sharding the data.
The problem is data doesn't have stewards. As a result, we've spent years developing application code where nobody got in the way of saying "Maybe you shouldn't join these two domains together? Maybe there's another way to do this?"
assuming some beastly server with terabytes of ram, hundreds of fast cores, and an exotic io subsystem capable of ridiculous amounts of low latency iops, I'd guess the perf issue with that example is not sql server struggling with load but rather lock contention from the users table being heavily updated. unless that beast of a server is sitting pegged with a hardware bottleneck it can probably be debottlenecked by vertically partitioning the users table. ie: split the table into two (or more) to isolate the columns that change frequently from the ones that don't, replace the table with a view that joins it back together w/instead-of triggers conditionally updating the appropriate tables, etc. etc. then when this happens:
SELECT b.*, u.username FROM Bar b JOIN users u ON b.userId = u.id
sql server sees that you're only selecting username from the users view and eliminates the joins for the more contentious tables and breathes easy peasy
> And why is this query doing that? To get a human readable username that isn't needed but at one point years ago made it nicer to debug the application.
imo users should be able to do this and whatever else they want and it's not even unreasonable to want usernames for debugging purposes forever. I'd expect the db team to support the requirements of the apps teams and wouldn't want to have to get data from different sources
> assuming some beastly server with terabytes of ram, hundreds of fast cores, and an exotic io subsystem capable of ridiculous amounts of low latency iops, I'd guess the perf issue with that example is not sql server struggling with load but rather lock contention from the users table being heavily updated.
You'd guess wrong. The example above is not the only query our server runs. It's an example of some of the queries that can be run. We have a VERY complex relationship graph, far more than what you'll typically find. This is finance, after all.
I used the user example for something relatable without getting into the weeds of the domain.
We are particularly read heavy and write light. The issue is quiet literally that we have too many applications doing too many reads. We are literally running into problems where our tempDb can't keep up with the requests because there are too many of them doing too complex of work.
You are assuming we can just partition a table here or there and everything will just work swimmingly, that's simply not the case. Our tables do not so easily partition. (perhaps our users table would, but again, that was for illustrative purposes and by no means the most complex example).
Do you think that such a simple solution hasn't been explored by a team of 50 DBAs? Or that this sort of obvious problem wouldn't have been immediately fixed?
> Do you think that such a simple solution hasn't been explored by a team of 50 DBAs? Or that this sort of obvious problem wouldn't have been immediately fixed?
based on what you've shared, yeah. I also wouldn't expect a million DBAs to replace a single DBE
One nice compromise is to migrate to using read-only database connections for read tasks from the moment you upgrade from medium sized DB hardware to big hardware. Keep talking to the one big DB with both connections.
Then when you are looking at the cost of upgrading from big DB hardware to huge DB hardware, you've got another option available to compare cost-wise: a RW main instance and one more read-only replicas, where your monolith talks to both: read/write to the master and read-only to the replicas via a load balancer.
I've basically been building CRUD backends for websites and later apps since about 1996.
I've fortunately/unfortunately never yet been involved in a project that we couldn't comfortably host using one big write master and a handful of read slaves.
Maybe one day a project I'm involved with will approach "FAANG scale" where that stops working, but you can 100% run 10s of millions of dollars a month in revenue with that setup, at least in a bunch of typical web/app business models.
Early on I did hit the "OMG, we're cooking our database" where we needed to add read cacheing. When I first did that memcached was still written in Perl. So that joined my toolbox very early on (sometime in the late 90s).
Once read cacheing started to not keep up, it was easy enough to make the read cache/memcached layer understand and distribute reads across read slaves. I remember talking to Monty Widenius at The Open Source Conference, I think in Sad Jose around 2001 or so, about getting MySQL replication to use SSL so I could safely replicate to read slaves in Sydney and London from our write master in PAIX.
I have twice committed the sin of premature optimisation and sharded databases "because this one was _for sure_ going to get too big for our usual database setup". It only ever brought unneeded grief and never actually proved necessary.
Many databases can be distributed horizontally if you put in the extra work, would that not solve the problems you're describing? MariaDB supports at least two forms of replication (one master/replica and one multi-master), for example, and if you're willing to shell out for a MaxScale license it's a breeze to load balance it and have automatic failover.
I worked at a mobile game company for years and years, and our #1 biggest scaling concern was DB write throughput. We used Percona's MySQL fork/patch/whatever, we tuned as best we could, but when it comes down to it, gaming is a write-heavy application rather than the read-heavy applications I'm used to from ecommerce etc.
Sharding things out and replicating worked for us, but only because we were microservices-y and we were able to split our schemas up between different services. Still, there was one service that required the most disk space, the most write throughput, the most everything.
(IIRC it was the 'property' service, which recorded everything anyone owned in our games and was updated every time someone gained, lost, or used any item, building, ally, etc).
We did have two read replicas and the service didn't do reads from the primary so that it could focus on writes, but it was still a heavy load that was only solved by adding hardware, improving disks, adding RAM, and so on.
Not without big compromises and a lot of extra work. If you want a truly horizontally scaling database, and not just multi-master for the purpose of availability, a good example solution is Spanner. You have to lay your data out differently, you're very restricted in what kinds of queries you can make, etc.
Clarification, you can make unoptimized queries on Spanner with a great degree of freedom when you're doing offline analysis, but even then it's easy to hit something that's too slow to work at all, whereas in Postgres I know it'd not be a problem.
For what it's worth, I think distributing horizontally is also much easier if you're already limited your database to specific concerns by splitting it up in different ways. Sharding a very large database with lots of data deeply linked sounds like much more of a pain than something with a limited scope that isn't too deeply linked with data because it's already in other databases.
To some degree, sharding brings in a lot of the same complexities as different microservices with their own data store, in that you sometimes have to query across multiple sources and combine in the client.
Shouldn't your company have started to split things out and plan for hitting the limit of hardware a couple box sizes back? I feel there is a happy middle ground between "spend months making everything a service for our 10 users" and "welp i looks like we cant upsize the DB anymore, guess we should split things off now?"
That is, one huge table keyed by (for instance) alphabet and when the load gets too big you split it into a-m and n-z tables, each on either their own disk or their own machine.
Then just keep splitting it like that. All of your application logic stays the same … everything stays very flat and simple … you just point different queries to different shards.
I like this because the shards can evolve from their own disk IO to their own machines… and later you can reassemble them if you acquire faster hardware, etc.
> Once you get to that point, it becomes SUPER hard to start splitting things out.
Maybe, but if you split it from the start you die by a thousand cuts, and likely pay the cost up front, even if you’d never get to the volumes that’d require a split.
>Once you get to that point, it becomes SUPER hard to start splitting things out. All the sudden you have 10000 "just a one off" queries against several domains that are broken by trying carve out a domain into a single owner.
But that's survivorship bias and looking back at things from current problems perspective.
You know what's the least future proof and scalable project ? The one that gets canceled because they failed to deliver any value in reasonable time in the early phase. Once you get to "huge project status" you can afford glacial pace. Most of the time you can't afford that early on - so even if by some miracle you knew what scaling issues you're going to have long term and invested in fixing them early on - it's rarely been a good tradeoff in my experience.
I've seen more projects fail because they tangle themselves up in unnecessary complexity early on and fail to execute on core value proposition, than I've seen fail from being unable to manage the tech debt 10 years in. Developers like to complain about the second, but they get fired on the first kind. Unfortunately in todays job market they just resume pad their failures as "relevant experience" and move on to the next project - so there is not correcting feedback.
I'd be curious to know what your company does which generates this volume of data (if you can disclose), what database you are using and how you are planning to solve this issue.
There are multiple plans on how to fix this problem but they all end up boiling down to carving out domains and their owners and trying to pull apart the data from the database.
What's been keeping the lights on is "Always On" and read only replicas. New projects aren't adding load to the db and it's simply been a slow going getting stuff split apart.
What we've tried (and failed at) is sharding the data. The main issue we have is a bunch of systems reading directly from the db for common records rather than hitting other services. That means any change in structure requires a bunch of system wide updates.
You can get a machine with multiple terabytes of ram and hundreds of CPU cores easily. If you can afford that, you can afford a live replica to switch to during maintenance.
FastComments runs on one big DB in each region, with a hot backup... no issues yet.
Before you go to microservices you can also shard, as others have mentioned.
This is absolutely true - when I was at Bitbucket (ages ago at this point) and we were having issues with our DB server (mostly due to scaling), almost everyone we talked to said "buy a bigger box until you can't any more" because of how complex (and indirectly expensive) the alternatives are - sharding and microservices both have a ton more failure points than a single large box.
I'm sure they eventually moved off that single primary box, but for many years Bitbucket was run off 1 primary in each datacenter (with a failover), and a few read-only copies. If you're getting to the point where one database isn't enough, you're either doing something pretty weird, are working on a specific problem which needs a more complicated setup, or have grown to the point where investing in a microservice architecture starts to make sense.
One issue I've seen with this is that if you have a single, very large database, it can take a very, very long time to restore from backups. Or for that matter just taking backups.
I'd be interested to know if anyone has a good solution for that.
- you rsync or zfs send the database files from machine A to machine B. You would like the database to be off during this process, which will make it consistent. The big advantage of ZFS is that you can stop PG, snapshot the filesystem, and turn PG on again immediately, then send the snapshot. Machine B is now a cold backup replica of A. Your loss potential is limited to the time between backups.
- after the previous step is completed, you arrange for machine A to send WAL files to machine B. It's well documented. You could use rsync or scp here. It happens automatically and frequently. Machine B is now a warm replica of A -- if you need to turn it on in an emergency, you will only have lost one WAL file's worth of changes.
- after that step is completed, you give machine B credentials to login to A for live replication. Machine B is now a live, very slightly delayed read-only replica of A. Anything that A processes will be updated on B as soon as it is received.
You can go further and arrange to load balance requests between read-only replicas, while sending the write requests to the primary; you can look at Citus (now open source) to add multi-primary clustering.
This isn't really a backup, it's redundancy which is good thing but not the same as a backup solution. You can't get out of a drop table production type event this way.
It was first release around 2010 and gained robustness with every release hence not everyone is aware of it.
The for instance I don't think it's really required anymore to shutdown the database to do the initial sync if you use the proper tooling (for instance pg_basebackup if I remember correctly)
Going back 20 years with Oracle DB it was common to use "triple mirror" on storage to make a block level copy of the database. Lock the DB for changes, flush the logs, break the mirror. You now have a point in time copy of the database that could be mounted by a second system to create a tape backup, or as a recovery point to restore.
It takes exactly the time that it takes, bottlenecked by:
* your disk read speed on one end and write speed on the other, modulo compression
* the network bandwidth between points A and B, modulo compression
* the size of the data you are sending
So, if you have a 10GB database that you send over a 10Gb/s link to the other side of the datacenter, it might be as little as 10 seconds. If you have a 10TB database that you send over a nominally 1GB/s link but actually there's a lot of congestion from other users, to a datacenter on the other side of the world, that might take a hundred hours or so.
rsync can help a lot here, or the ZFS differential snapshot send.
so say the disk fails on your main DB. or for some reason a customer needs data from 6 months ago, which is no longer in your local snapshots. In order to restore the data, you have to transfer the data for the full database back over.
With multiple databases, you only have to transfer a single database, not all of your data.
Do you even have to stop Postgres if using ZFS snapshots? ZFS snapshots are atomic, so I’d expect that to be fine. If it wasn’t fine, that would also mean Postgres couldn’t handle power failure or other sudden failures.
* use pg_dump. Perfect consistency at the cost of a longer transaction. Gain portability for major version upgrades.
* Don't shut down PG: here's what the manual says:
However, a backup created in this way saves the database files in a state as if the database server was not properly shut down; therefore, when you start the database server on the backed-up data, it will think the previous server instance crashed and will replay the WAL log. This is not a problem; just be aware of it (and be sure to include the WAL files in your backup). You can perform a CHECKPOINT before taking the snapshot to reduce recovery time.
* Midway: use SELECT pg_start_backup('label', false, false); and SELECT * FROM pg_stop_backup(false, true); to generate WAL files while you are running the backup, and add those to your backup.
Presumably it doesn't matter if you break your DB up into smaller DBs, you still have the same amount of data to back up no matter what. However, now you also have the problem of snapshot consistency to worry about.
If you need to backup/restore just one set of tables, you can do that with a single DB server without taking the rest offline.
> you still have the same amount of data to back up no matter what
But you can restore/back up the databases in parallel.
> If you need to backup/restore just one set of tables, you can do that with a single DB server without taking the rest offline.
I'm not aware of a good way to restore just a few tables from a full db backup. At least that doesn't require copying over all the data (because the backup is stored over the network, not on a local disk). And that may be desirable to recover from say a bug corrupting or deleting a customer's data.
Try out pg_probackup. It works on database files directly. Restore is as fast as you can write on your ssd.
I've setup a pgsql server with timescaledb recently. Continuing backup based on WAL takes seconds each hour and a complete restore takes 15 minutes for almost 300 GB of data because the 1 GBit connection to the backup server is the bottleneck.
On mariadb you can tell the replica to enter into a snapshotable state[1] and take a simple lvm snapshot, tell the the database it's over, backup your snapshot somewhere else and finally delete the snapshot.
That's fair - I added "are working on a specific problem which needs a more complicated setup" to my original comment as a nicer way of referring to edge cases like search engines. I still believe that 99% of applications would function perfectly fine with a single primary DB.
Depends what you mean by a database I guess. I take it to mean an RDBMS.
RDBMSs provide guarantees that web searching doesn't need. You can afford to lose a pieces of data, provide not-quite-perfect results for web stuff. It's just wrong for an RDBMS.
What if you are using the database as a system of record to index into a real search engine like Elasticsearch? For a product where you have tons of data to search from (ie text from web pages)
In regards to Elasticsearch, you basically opt-in to which behavior you want/need. You end up in the same place: potentially losing some data points or introducing some "fuzziness" to the results in exchange for speed. When you ask Elasticsearch to behave in a guaranteed atomic manner across all records, performing locks on data, you end up with similar constraints as in a RDBMS.
Elasticsearch is for search.
If you're asking about "what if you use an RDBMS as a pointer to Elasticsearch" then I guess I would ask: why would you do this? Elasticsearch can be used as a system of record. You could use an RDBMS over top of Elasticsearch without configuring Elasticsearch as a system of record, but then you would be lying when you refer to your RDBMS as a "system of record." It's not a "system of record" for your actual data, just a record of where pointers to actual data were at one point in time.
I feel like I must be missing what you're suggesting here.
Having just an Elasticsearch index without also having the data in a primary store like a RDMS is an anti-pattern and not recommended by almost all experts. Whether you want to call it a “system of record”, i wont argue semantics. But the point is, its recommended hacing your data in a primary store where you can index into elasticsearch.
This is not typically going to be stored in an ACID-compliant RDBMS, which is where the most common scaling problem occurs. Search engines, document stores, adtech, eventing, etc. are likely going to have a different storage mechanism where consistency isn't as important.
I'm glad this is becoming conventional wisdom. I used to argue this in these pages a few years ago and would get downvoted below the posts telling people to split everything into microservices separated by queues (although I suppose it's making me lose my competitive advantage when everyone else is building lean and mean infrastructure too).
But also it is about pushing the limits of what is physically possible in computing. As Admiral Grace Hopper would point out (https://www.youtube.com/watch?v=9eyFDBPk4Yw ) doing distance over network wires involves hard latency constraints, not to mention dealing with congestions over these wires.
Physical efficiency is about keeping data close to where it's processed. Monoliths can make much better use of L1, L2, L3, and ram caches than distributed systems for speedups often in the order of 100X to 1000X.
Sure it's easier to throw more hardware at the problem with distributed systems but the downsides are significant so be sure you really need it.
Now there is a corollary to using monoliths. Since you only have one db, that db should be treated as somewhat sacred, you want to avoid wasting resources inside it. This means being a bit more careful about how you are storing things, using the smallest data structures, normalizing when you can etc. This is not to save disk, disk is cheap. This is to make efficient use of L1,L2,L3 and ram.
I've seen boolean true or false values saved as large JSON documents. {"usersetting1": true, "usersetting2":fasle "setting1name":"name" etc.} with 10 bits of data ending up as a 1k JSON document. Avoid this! Storing documents means, the keys, the full table schema is in every row. It has its uses but if you can predefine your schema and use the smallest types needed, you are gaining much performance mostly through much higher cache efficiency!
It's not though. You're just seeing the most popular opinion on HN.
In reality it is nuanced like most real-world tech decisions are. Some use cases necessitate a distributed or sharded database, some work better with a single server and some are simply going to outsource the problem to some vendor.
My hunch is that computers caught up. Back in the early 2000's horizontal scaling was the only way. You simply couldn't handle even reasonably mediocre loads on a single machine.
As computing becomes cheaper, horizontal scaling is starting to look more and more like unnecessary complexity for even surprisingly large/popular apps.
I mean you can buy a consumer off-the-shelf machine with 1.5TB of memory these days. 20 years ago, when microservices started gaining popularity, 1.5TB RAM in a single machine was basically unimaginable.
Honestly from my perspective it feels like microservices arose strongly in popularity precisely when it was becoming less necessary. In particular the mass adoption of SSD storage massively changed the nature of the game, but awareness of that among regular developers seemed not as pervasive as it should have been.
'over the wire' is less obvious than it used to be.
If you're in k8s pod, those calls are really kernel calls. Sure you're serializing and process switching where you could be just making a method call, but we had to do something.
I'm seeing less 'balls of mud' with microservices. Thats not zero balls of mud. But its not a given for almost every code base I wander into.
To clarify, I think stateless microservices are good. It's when you have too many DBs (and sometimes too many queues) that you run into problems.
A single instance of PostgreSQL is, in most situations, almost miraculously effective at coordinating concurrent and parallel state mutations. To me that's one of the most important characteristic of an RDBMS. Storing data is a simpler secondary problem. Managing concurrency is the hard problem that I need most help with from my DB and having a monolithic DB enables the coordination of everything else including stateless peripheral services without resulting in race conditions, conflicts or data corruption.
SQL is the most popular mostly functional language. This might be because managing persistent state and keeping data organized and low entropy, is where you get the most benefit from using a functional approach that doesn't add more state. This adds to the effectiveness of using a single transactional DB.
I must admit that even distributed DBs, like Cockroach and Yugabyte have recognized this and use the PostgreSQL syntax and protocol. This is good though, it means that if you really need to scale beyond PostgreSQL, you have PostgreSQL compatible options.
> I'm seeing less 'balls of mud' with microservices.
The parallel to "balls of mud" with microservices is tiny services that seem almost devoid of any business logic and all the actual business logic is encapsulated in the calls between different services, lambda functions, and so on.
That's quite nightmarish from a maintenance perspective too, because now it's almost impossible to look at the system from the outside and understand what it's doing. It also means that conventional tooling can't help you anymore: you don't get compiler errors if your lambda function calls an endpoint that doesn't exist anymore.
Big balls of mud are horrible (I'm currently working with a big ball of mud monolith, I know what I'm talking about), but you can create a different kind of mess with microservices too. Then there all the other problems, such as operational complexity, or "I now need to update log4j across 30 services".
In the end, a well-engineered system needs disciple and architectural skills, as well as a healthy engineering culture where tech debt can be paid off, regardless of whether it's a monolith, a microservice architecture or something in between.
>"I'm glad this is becoming conventional wisdom. "
Yup, this is what I've always done and it works wonders. Since I do not have bosses, just a clients I do not give a flying fuck about latest fashion and do what actually makes sense for me and said clients.
I've never understood this logic for webapps. If you're building a web application, congratulations, you're building a distributed system, you don't get a choice. You can't actually use transactional integrity or ACID compliance because you've got to send everything to and from your users via HTTP request/response. So you end up paying all the performance, scalability, flexibility, and especially reliability costs of an RDBMS, being careful about how much data you're storing, and getting zilch for it, because you end up building a system that's still last-write-wins and still loses user data whenever two users do anything at the same time (or you build your own transactional logic to solve that - exactly the same way as you would if you were using a distributed datastore).
Distributed systems can also make efficient use of cache, in fact they can do more of it because they have more of it by having more nodes. If you get your dataflow right then you'll have performance that's as good as a monolith on a tiny dataset but keep that performance as you scale up. Not only that, but you can perform a lot better than an ACID system ever could, because you can do things like asynchronously updating secondary indices after the data is committed. But most importantly you have easy failover from day 1, you have easy scaling from day 1, and you can just not worry about that and focus on your actual business problem.
Relational databases are largely a solution in search of a problem, at least for web systems. (They make sense as a reporting datastore to support ad-hoc exploratory queries, but there's never a good reason to use them for your live/"OLTP" data).
I really don't understand how anything of what you wrote follows from the fact that you're building a web-app. Why do you lose user data when two users do anything at the same time? That has never happened to me with any RDBMS.
And why would HTTP requests prevent me from using transactional logic? If a user issues a command such as "copy this data (a forum thread, or a Confluence page, or whatever) to a different place" and that copy operation might actually involve a number of different tables, I can use a transaction and make sure that the action either succeeds fully or is rolled back in case of an error; no extra logic required.
I couldn't disagree more with your conclusion even if I wanted to. Relational databases are great. We should use more of them.
> I really don't understand how anything of what you wrote follows from the fact that you're building a web-app. Why do you lose user data when two users do anything at the same time? That has never happened to me with any RDBMS.
> And why would HTTP requests prevent me from using transactional logic? If a user issues a command such as "copy this data (a forum thread, or a Confluence page, or whatever) to a different place" and that copy operation might actually involve a number of different tables, I can use a transaction and make sure that the action either succeeds fully or is rolled back in case of an error; no extra logic required.
Sure, if you can represent what the user wants to do as a "command" like that, that doesn't rely on a particular state of the world, then you're fine. Note that this is also exactly the case that an eventually consistent event-sourcing style system will handle fine.
The case where transactions would actually be useful is the case where a user wants to read something and modify something based on what they read. But you can't possibly do that over the web, because they read the data in one request and write it in another request that may never come. If two people try to edit the same wiki page at the same time, either one of them loses their data, or you implement some kind of "userspace" reconciliation logic - but database transactions can't help you with that. If one user tries to make a new post in a forum thread at the same time as another user deletes that thread, probably they get an error that throws away all their data, because storing it would break referential integrity.
> Sure, if you can represent what the user wants to do as a "command" like that, that doesn't rely on a particular state of the world, then you're fine. Note that this is also exactly the case that an eventually consistent event-sourcing style system will handle fine.
Yes, but the event-sourcing system (or similar variants, such as CRDTs) is much more complex. It's true that it buys you some things (like the ability to roll back to specific versions), but you have to ask yourself whether you really need that for a specific piece of data.
(And even if you use event sourcing, if you have many events, you probably won't want to replay all of them, so you'll maybe want to store the result in a database, in which case you can choose a relational one.)
> If two people try to edit the same wiki page at the same time, either one of them loses their data, or you implement some kind of "userspace" reconciliation logic - but database transactions can't help you with that.
Yes, but
a) that's simply not a problem in all situations. People will generally not update their user profile concurrently with other users, for example. So it only applies to situations where data is truly shared across multiple users, and it doesn't make sense to build a complex system only for these use cases,
b) the problem of users overwriting other users' data is inherent to the problem domain; you will, in the end, have to decide which version is the most recent regardless of which technology you use. The one thing that evens etc. buy you is a version history (which btw can also be implemented with a RDBMS), but if you want to expose that in the UI so the user can go back, you have to do additional work anyway - it doesn't come for free.
c) Meanwhile, the RDBMS will at least guarantee that the data is always in a consistent state. Users overwriting other users' data is unfortunate, but corrupted data is worse.
d) You can solve the "concurrent modification" issue in a variety of ways, depending on the frequency of the problem, without having to implement a complex event-sourced system. For example, a lock mechanism is fairly easy to implement and useful in many cases. You could also, for example, hash the contents of what the user is seeing and reject the change if there is a mismatch with the current state (I've never tried it, but it should work in theory).
I don't wish to claim that a relational database solves all transactionality (and consistency) problems, but they certainly solve some of them - so throwing them out because of that is a bit like "tests don't find all bugs, so we don't write them anymore".
> Yes, but the event-sourcing system (or similar variants, such as CRDTs) is much more complex.
It's really not. An RDBMS usually contains all of the same stuff underneath the hood (MVCC etc.), it just tries to paper over it and present the illusion of a single consistent state of the world, and unfortunately that ends up being leaky.
> a) that's simply not a problem in all situations. People will generally not update their user profile concurrently with other users, for example. So it only applies to situations where data is truly shared across multiple users,
Sure - but those situations are ipso facto situations where you have no need for transactions.
> b) the problem of users overwriting other users' data is inherent to the problem domain; you will, in the end, have to decide which version is the most recent regardless of which technology you use. The one thing that evens etc. buy you is a version history (which btw can also be implemented with a RDBMS), but if you want to expose that in the UI so the user can go back, you have to do additional work anyway - it doesn't come for free.
True, but what does come for free is thinking about it when you're designing your dataflow. Using an event sourcing style forces you to confront the idea that you're going to have concurrent updates going on, early enough in the process that you naturally design your data model to handle it, rather than imagining that you can always see "the" current state of the world.
> c) Meanwhile, the RDBMS will at least guarantee that the data is always in a consistent state. Users overwriting other users' data is unfortunate, but corrupted data is worse.
I'm not convinced, because the way it accomplishes that is by dropping "corrupt" data on the floor. If user A tries to save new post B in thread C, but at the same time user D has deleted that thread, then in a RDBMS where you're using a foreign key the only thing you can do is error and never save the content of post B. In an event sourcing system you still have to deal with the fact that the post belongs in a nonexistent thread eventually, but you don't start by losing the user's data, and it's very natural to do something like mark it as an orphaned post that the user can still see in their own post history, which is probably what you want. (Of course you can achieve that in the RDBMS approach, but it tends to involve more complex logic, giving up on foreign keys and accepting tha you have to solve the same data integrity problems as a non-ACID system, or both).
> d) You can solve the "concurrent modification" issue in a variety of ways, depending on the frequency of the problem, without having to implement a complex event-sourced system. For example, a lock mechanism is fairly easy to implement and useful in many cases. You could also, for example, hash the contents of what the user is seeing and reject the change if there is a mismatch with the current state (I've never tried it, but it should work in theory).
That sounds a whole lot more complex than just sticking it an event sourcing system. Especially when the problem is rare, it's much better to find a solution where the correct behaviour naturally arises in that case, than implement some kind of ad-hoc special case workaround that will never be tested as rigorously as your "happy path" case.
> It's really not. An RDBMS usually contains all of the same stuff underneath the hood (MVCC etc.), it just tries to paper over it and present the illusion of a single consistent state of the world, and unfortunately that ends up being leaky.
There's nothing leaky about it. Relational algebra is a well-understood mathematical abstraction. Meanwhile, I can just set up postgres and an ORM (or something more lightweight, if I prefer) and I'm good to go - there's thousands of examples of how to do that. Event-sourced architectures have decidedly more pitfalls. If my event handling isn't commutative, associative and idempotent I'm either losing out on concurrency benefits (because I'm asking my queue to synchronise messages) or I'll get undefined behaviour.
There's really probably no scenario in which implementing a CRUD app with a relational database isn't going to take significantly less time than some event sourced architecture.
> Sure - but those situations are ipso facto situations where you have no need for transactions.
> Using an event sourcing style forces you to confront the idea that you're going to have concurrent updates going on
There are tons of examples like backoffice tools (where people might work in shifts or on different data sets), delivery services, language learning apps, flashcard apps, government forms, todo list and note taking apps, price comparison services, fitness trackers, banking apps, and so on, where some or even most of the data is not usually concurrently edited by multiple users, but where you still will probably have consistency guarantees across multiple tables.
Yes, if you're building Twitter, by all means use event sourcing or CRDTs or something. But we're not all building Twitter.
> If user A tries to save new post B in thread C, but at the same time user D has deleted that thread, then in a RDBMS where you're using a foreign key the only thing you can do is error and never save the content of post B.
I don't think I've ever seen a forum app that doesn't just "throw away" the user comment in such a case, in the sense that it will not be stored in the database. Sure, you might have some event somewhere, but how is that going to help the user? Should they write a nice email and hope that some engineer with too much time is going to find that event somewhere buried deep in the production infrastructure and then ... do what exactly with it?
This is a solution in search of a problem. Instead, you should design your UI such that the comment field is not cleared upon a failed submission, like any reasonable forum software. Then the user who really wants to save their ramblings can still do so, without the need of any complicated event-sourcing mechanism. And in most forums, threads are rarely deleted, only locked (unless it's outright spam/illegal content/etc.)
(Also, there are a lot of different ways how things can be designed when you're using an RDBMS. You can also implement soft deletes (which many applications do) and then you won't get any foreign key errors. In that way, you can still display "orphaned" comments that belong to deleted threads, if you so wish (have never seen a forum do that, though). Recovering a soft deleted thread is probably also an order of magnitude easier than trying to replay it from some events. Yes, soft deletes involve other tradeoffs - but so does every architecture choice.)
> That sounds a whole lot more complex than just sticking it an event sourcing system. Especially when the problem is rare, it's much better to find a solution where the correct behaviour naturally arises in that case.
I really disagree that a locking mechanism is more difficult than an event sourced system. The mechanism doesn't have to be perfect. If a user loses the lock because they haven't done anything in half an hour, then in many cases that's completely acceptable. Such a system is not hard to implement (I could just use a redis store with expiring entries) and it will also be much easier to understand, since you now don't have to track the flow of your business logic across multiple services.
I also don't know why you think that your event-sourced system will be better tested. Are you going to test for the network being unreliable, messages getting lost or being delivered out of order, and so on? If so, you can also afford to properly test a locking mechanism (which can be readily done in a monolith, maybe with an additional redis dependency, and is therefore more easily testable than some event-based logic that spans multiple services).
And in engineering, there are rarely "natural" solutions to problems. There are specific problems and they require specific solutions. Distributed systems, event sourcing etc. are great where they're called for. In many cases, they're simply not.
Http requests work great with relational dbs. This is not UDP. If the TCP connection is broken, an operation will either have finished or stopped and rolledback atomically and unless you've placed unneeded queues in there, you should know of success immediately.
When you get the http response, you will know the data is fully committed, data that uses it can be refreshed immediately and is accessible to all other systems immediately so you can perform next steps relying on those hard guarantees. Behind the http request, a transaction can be opened to do a bunch of stuff including API calls to other systems if needed and commit the results as an atomic transaction. There are tons of benefit using it with http.
But you can't do interaction between the two ends of a HTTP request. The caller makes an inert request, whatever processing happens downstream of that might as well be offline because it's not and can never be interactive within a single transaction.
Now you're shifting the goalposts. You started out by claiming that web apps can't be transactional, now you've switched to saying they can't be transactional if they're "interactive" (by which you presumably mean transactions that span multiple HTTP requests).
Of course, that's a very particular demand, one that doesn't necessarily apply to many applications.
And even then, depending on the use case, there are relatively straightforward ways of implementing that too: For example, if you build up all the data on the client (potentially by querying the server, with some of the partial data, for the next form page, or whatever) and submit it all in one single final request.
>As Admiral Grace Hopper would point out (https://www.youtube.com/watch?v=9eyFDBPk4Yw ) doing distance over network wires involves hard latency constraints, not to mention dealing with congestions over these wires.
Even accounting for CDNs, a distributed system is inherently more capable of bringing data closer to geographically distributed end users, thus lowering latency.
I think a strong test a lot of "let's use Google scale architecture for our MVP" advocates fail is: can your architecture support a performant paginated list with dynamic sort, filter and search where eventual consistency isn't acceptable?
Pretty much every CRUD app needs this at some point and if every join needs a network call your app is going to suck to use and suck to develop.
I’ve found the following resource invaluable for designing and creating “cloud native” APIs where I can tackle that kind of thing from the very start without a huge amount of hassle https://google.aip.dev/general
I don't believe you. Eventual consistency is how the real world works, what possible use case is there where it wouldn't be acceptable? Even if you somehow made the display widget part of the database, you can't make the reader's eyeballs ACID-compliant.
Yeah, I can attest that even banks are really using best effort eventual consistency. However, I think it is very difficult to reason about with systems that try to use eventual consistency as an abstraction. It's a lot easier to think about explicitly when you have one data source/event that propagates outwards through systems with stronger individual guarantees than eventual consistency.
IMO having event streams as first class is the best way to think about things. Then you don't need particularly strong guarantees downstream - think something like Kafka where the only guarantee is that events for the same key will always be processed in order, and it turns out that that's enough to build a system with clear, reliable behaviour that you can reason about quite easily.
> if every join needs a network call your app is going to suck to use and suck to develop.
And yet developers do this every single day without any issue.
It is bad practice to have your authentication database be the same as your app database. Or you have data coming from SaaS products, third party APIs or a cloud service. Or even simply another service in your stack. And with complex schemas often it's far easier to do that join in your application layer.
> It is bad practice to have your authentication database be the same as your app database.
No, this is resume-driven-development, Google-scale-wannabe FUD. Understand your requirements. Multiple databases is non-trivial overhead. The only reason to add multiple databases is if you need scale that can't be handled via simple caching.
Of course it's hard to anticipate what level of scale you'll have later, but I can tell you this: for every tiny startup that successfully anticipated their scaling requirements and built a brilliant microservices architecture that proactively paved the way to their success, there's a 100 burnt out husks of companies that never found product market fit because the engineering team was too busy fantasizing about "web-scale" and padding their resume by overengineering every tiny and unused feature they built.
If you want to get a job at FAANG and suckle at the teat of megacorporations who's trajectory was all based on work done in the early 2000s, by all means study up on "best practices" to recite at your system design interview. On the other hand, if you want to build the next great startup, you need to lose the big co mentality and start thinking critically from first principles about power to weight ratio and YAGNI.
Most of our cloud hosted request/responses are within the realm of 1-10ms, and that's with the actual request being processed on the other side. Unless there's a poorly performing O(N) stinker in the works, most requests can be served with most latency being recorded user->datacenter, not machine to machine overhead. This article is a lot bonkers.
I've seen this evolve into tightly coupled microservices that could be deployed independently in theory, but required exquisite coordination to work.
If you want them to be on a single server, that's fine, but having multiple databases or schemas will help enforce separation.
And, if you need one single place for analytics, push changes to that space asynchronously.
Having said that, I've seen silly optimizations being employed that make sense when you are Twitter, and to nobody else. Slice services up to the point they still do something meaningful in terms of the solution and avoid going any further.
I have done both models. My previous job we had a monolith on top of a 1200 table database. Now I work in an ecosystem of 400 microservices, most with their own database.
What it fundamentally boils down to is that your org chart determines your architecture. We had a single team in charge of the monolith, and it was ok, and then we wanted to add teams and it broke down. On the microservices architecture, we have many teams, which can work independently quite well, until there is a big project that needs coordinated changes, and then the fun starts.
Like always there is no advice that is absolutely right. Monoliths, microservices, function stores. One big server vs kubernetes. Any of those things become the right answer in the right context.
Although I’m still in favor of starting with a modular monolith and splitting off services when it becomes apparent they need to change at a different pace from the main body. That is right in most contexts I think.
> splitting off services when it becomes apparent they need to change at a different pace from the main body
yes - this seems to get lost, but the microservice argument is no different to the bigger picture software design in general. When things change independently, separate and decouple them. It works in code and so there is no reason it shouldn't apply at the infrastructure layer.
If I am responsible for the FooBar and need to update it once a week and know I am not going to break the FroggleBot or the Bazlibee which are run by separate teams who don't care about my needs and update their code once a year, hell yeah I want to develop and deploy it as a separate service.
To clarify the advice, at least how I believe it should be done…
Use One Big Database Server…
… and on it, use one software database per application.
For example, one Postgres server can host many databases that are mostly* independent from each other. Each application or service should have its own database and be unaware of the others, communicating with them via the services if necessary. This makes splitting up into multiple database servers fairly straightforward if needed later. In reality most businesses will have a long tail of tiny databases that can all be on the same server, with only bigger databases needing dedicated resources.
*you can have interdependencies when you’re using deep features sometimes, but in an application-first development model I’d advise against this.
Not suggesting it, but for the sake of knowledge you can join tables living in different databases, as long as they are on the same server (e.g. mysql, postgresql, SQL server supports it - doesn't necessarily come for free)
I’d start with a monolith, that’s a single app, single database, single point of ownership of the data model, and a ton of joins.
Then as services are added after the monolith they can still use the main database for ease of infra development, simpler backups and replication, etc. but those wouldn’t be able to be joined because they’re cross-service.
There's no need for "microservices" in the first place then. That's just logical groupings of functionality that can be separate as classes, namespaces or other modules without being entirely separate processes with a network boundary.
I have to say I disagree with this ... you can only separate them if they are really, truly independent. Trying to separate things that are actually coupled will quickly take you on a path to hell.
The problem here is that most of the microservice architecture divisions are going to be driven by Conway's law, not what makes any technical sense. So if you insist on separate databases per microservice, you're at high risk of ending up with massive amounts of duplicated and incoherent state models and half the work of the team devoted to synchronizing between them.
I quite like an architecture where services are split except the database, which is considered a service of its own.
Well, I stand by what I said. And you are also correct, you can only separate them if they are really truly independent. Those two are correct at the same time.
Microservices does more than encapsulation and workspace segmentation. They also distribute data locality and coherence. If you have an organizational need to break something, but not on independent parts, it's better to use some abstraction that preserves the data properties.
(In other words, microservices are almost never the answer. There are plenty of ways to organize your code, default to those other ones. And on the few cases that microservices are the answer, rest assured that you won't fail to notice it.)
>> If you are creating microservices, you must segment them all the way through.
> I have to say I disagree with this ... you can only separate them if they are really, truly independent. Trying to separate things that are actually coupled will quickly take you on a path to hell.
I could be misinterpreting both you and GP, but sounds like you agree with GP - if you can't segment them all the way through, maybe they shouldn't be microservices?
Perhaps - but I think they are underestimating the organisational reasons to separate services from each other. If you are really going to say "we can't separate any two things that have any shared persistent data" then you may just end up with a monolith and all the problems that come from that (gridlock because every team needs to agree before it can be updated / released etc).
Breaking apart a stateless microservice and then basing it around a giant single monolithic database is pretty pointless - at that stage you might as well just build a monolith and get on with it as every microservice is tightly coupled to the db.
To note that quite a bit of the performance problems come when writing stuff. You can get away with A LOT if you accept 1. the current service doesn't do (much) writing and 2. it can live with slightly old data. Which I think covers 90% of use cases.
So you can end up with those services living on separate machines and connecting to read only db replicas, for virtually limitless scalability. And when it realizes it needs to do an update, it either switches the db connection to a master, or it forwards the whole request to another instance connected to a master db.
(1) Different programming languages e.g. you're written your app in Java but now you need to do something for which the perfect Python library is available.
(2) Different parts of your software need different types of hardware. Maybe one part needs a huge amount of RAM for a cache, but other parts are just a web server. It'd be a shame to have to buy huge amounts of RAM for every server. Splitting the software up and deploying the different parts on different machines can be a win here.
I reckon the average startup doesn't need any of that, not suggesting that monoliths aren't the way to go 90% of the time. But if you do need these things, you can still go the microservices route, but it still makes sense to stick to a single database if at all possible, for consistency and easier JOINs for ad-hoc queries, etc.
These are both true - but neither requires service-oriented-architecture.
You can split up your applicaiton into chunks that are deployed on seperate hardware, and use different languages, without composing your whole architecture into microservices.
A monolith can still have a seperate database server and a web server, or even many different functions split across different servers which are horizontally scalable, and be written in both java and python.
Monoliths have had seperate database servers since the 80s (and probably before that!). In fact, part of these applications defining characteristics at the enterprise level is that they often shared one big central database, as often they were composed of lots of small applications that would all make changes to the central database, which would often end up in a right mess of software that was incredibly hard to de-pick! (And all the software writing to that database would, as you described, be written in lots of different languages). People would then come along and cake these central databases full of stored procedures to make magic changes to implement functionality that wasn't available in the legacy applications that they can't change because of the risk and then you have even more of a mess!
Agree. Nothing worse than having different programs changing data in the same database. The database should not be an integration point between services.
In this example, it's the job of the "database access layer service" to manage those processes and prevent issues.
But, terrible service name aside, this is a big reason why two services accessing the same database is a capital-H Huge anti-pattern, and really screams "using this project to learn how to do microservices."
I guess I just don't see the value in having a monolith made up of microservices - you might as well just build a monolith if you are going down that route.
And if your application fits the microservices pattern better, then you might as well go down the microservices pattern properly and not give them a big central DB.
The one advantage of microservice on a single database model is that it lets you test the independent components much more easily while avoiding the complexity of database sharding.
Where I work we are looking at it because we are starting to exceed the capabilities of one big database. Several tables are reaching the billions of rows mark and just plain inserts are starting to become too much.
Yeah, the at the billions of rows mark it definitely makes sense to start looking at splitting things up. On the other hand, the company I worked for split things up from the start, and when I joined - 4 years down the line - their biggest table had something like 50k rows, but their query performance was awful (tens of seconds in cases) because the data was so spread out.
I disagree. Suppose you have an enormous DB that's mainly written to by workers inside a company, but has to be widely read by the public outside. You want your internal services on machines with extra layers of security, perhaps only accessible by VPN. Your external facing microservices have other things like e.g. user authentication (which may be tied to a different monolithic database), and you want to put them closer to users, spread out in various data centers or on the edge. Even if they're all bound to one database, there's a lot to recommend keeping them on separate, light cheap servers that are built for http traffic and occasional DB reads. And even more so if those services do a lot of processing on the data that's accessed, such as building up reports, etc.
You've not really built microservices then in the purest sense though - i.e. all the microservices aren't independently deployable components.
I'm not saying what you are proposing isn't a perfectly valid architectural approach - it's just usually considered an anti-pattern with microservices (because if all the services depend on a single monolith, and a change to a microservice functionality also mandates a change to the shared monolith which then can impact/break the other services, we have lost the 'independence' benefit that microservices supposedly gives us where changes to one microservice does not impact another).
Monoliths can still have layers to support business logic that are seperate to the database anyway.
yah, this is something i learned when designing my first server stack (using sun machines) for a real business back during the dot-com boom/bust era. our single database server was the beefiest machine by far in the stack, 5U in the rack (we also had a hot backup), while the other servers were 1U or 2U in size. most of that girth was for memory and disk space, with decent but not the fastest processors.
one big db server with a hot backup was our best tradeoff for price, performance, and reliability. part of the mitigation was that the other servers could be scaled horizontally to compensate for a decent amount of growth without needing to scale the db horizontally.
Definitely use a big database, until you can't. My advice to anyone starting with a relational data store is to use a proxy from day 1 (or some point before adding something like that becomes scary).
When you need to start sharding your database, having a proxy is like having a super power.
We see both use cases: single large database vs multiple small, decoupled. I agree with the sentiment that a large database offer simplicity, until access patterns change.
We focus on distributing database data to the edge using caching. Typically this eliminates read-replicas and a lot of the headache that goes with app logic rewrites or scaling "One Big Database".
Yep, with a passive replica or online (log) backup.
Keeping things centralized can reduce your hardware requirement by multiple orders of magnitude. The one huge exception is a traditional web service, those scale very well, so you may not even want to get big servers for them (until you need them).
If you do this then you'll have the hardest possible migration when the time comes to split it up. It will take you literally years, perhaps even a decade.
Shard your datastore from day 1, get your dataflow right so that you don't need atomicity, and it'll be painless and scale effortlessly. More importantly, you won't be able to paper over crappy dataflow. It's like using proper types in your code: yes, it takes a bit more effort up-front compared to just YOLOing everything, but it pays dividends pretty quickly.
This is true IFF you get to the point where you have to split up.
I know we're all hot and bothered about getting our apps to scale up to be the next unicorn, but most apps never need to scale past the limit of a single very high-performance database. For most people, this single huge DB is sufficient.
Also, for many (maybe even most) applications, designated outages for maintenance are not only acceptable, but industry standard. Banks have had, and continue to have designated outages all the time, usually on weekends when the impact is reduced.
Sure, what I just wrote is bad advice for mega-scale SaaS offerings with millions of concurrent users, but most of us aren't building those, as much as we would like to pretend that we are.
I will say that TWO of those servers, with some form of synchronous replication, and point in time snapshots, are probably a better choice, but that's hair-splitting.
(and I am a dyed in the wool microservices, scale-out Amazon WS fanboi).
> I know we're all hot and bothered about getting our apps to scale up to be the next unicorn, but most apps never need to scale past the limit of a single very high-performance database. For most people, this single huge DB is sufficient.
True if the reliability is good enough. I agree that many organisations will never get to the scale where they need it as a performance/data size measure, but you often will grow past the reliability level that's possible to achieve on a single node. And it's worth saying that the various things that people do to mitigate these problems - read replicas, WAL shipping, and all that - can have a pretty high operational cost. Whereas if you just slap in a horizontal autoscaling datastore with true master-master HA from day 1, you bypass all of that trouble and just never worry about it.
> Also, for many (maybe even most) applications, designated outages for maintenance are not only acceptable, but industry standard. Banks have had, and continue to have designated outages all the time, usually on weekends when the impact is reduced.
IME those are a minority of applications. Anything consumer-facing, you absolutely do lose out (and even if it's not a serious issue in itself, it makes you look bush-league) if someone can't log into your system at 5AM on Sunday. Even if you're B2B, if your clients are serving customers then they want you to be online whenever their customers are.
> I agree that many organisations will never get to the scale where they need it as a performance/data size measure, but you often will grow past the reliability level that's possible to achieve on a single node.
Many organisations have, for decades, exceptionally good reliability numbers using a backed-up/failed-over OneBigServer. Great reliability numbers did not suddenly appear only after 2012 when cloudiness took off.
I think you may be underestimating the reliability of OneBigServer.
> If you do this then you'll have the hardest possible migration when the time comes to split it up. It will take you literally years, perhaps even a decade.
At which point a new OneBigServer will be 100x as powerful, and all your upfront work will be for nothing.
I don't know the characteristics of bikesheddb's upstream in detail (if there's ever a production-quality release of bikesheddb I'll take another look), but in general using something that can scale horizontally (like Cassandra or Riak, or even - for all its downsides - MongoDB) is a great approach - I guess it's a question of terminology whether you call that "sharding" or not. Personally I prefer that kind of datastore over an SQL database.
It’s never one big database. Inevitably there are are backups, replicas, testing environments, staging, development. In an ideal unchanging world where nothing ever fails and workload is predictable then the one big database is also ideal.
What happens in the real world is that the one big database becomes such a roadblock to change and growth that organisations often throw away the whole thing and start from scratch.
> It’s never one big database. Inevitably there are are backups, replicas, testing environments, staging, development. In an ideal unchanging world where nothing ever fails and workload is predictable then the one big database is also ideal.
But if you have many small databases, you need
> backups, replicas, testing environments, staging, development
all times `n`. Which doesn't sound like an improvement.
> What happens in the real world is that the one big database becomes such a roadblock to change and growth that organisations often throw away the whole thing and start from scratch.
Bad engineering orgs will clutch defeat from the jaws of victory no matter what the early architectural decisions were. The one vs many databases/services is almost moot entirely.
Just FYI, you can have one big database, without running it on one big server. As an example, databases like Cassandra are designed to be scaled horizontally (i.e. scale out, instead of scale up).
There are trade-offs when you scale horizontally even if a database is designed for it. For example, DataStax's Storage Attached Indexes or Cassandra's hidden-table secondary indexing allow for indexing on columns that aren't part of the clustering/partitioning, but when you're reading you're going to have to ask all the nodes to look for something if you aren't including a clustering/partitioning criteria to narrow it down.
You've now scaled out, but you now have to ask each node when searching by secondary index. If you're asking every node for your queries, you haven't really scaled horizontally. You've just increased complexity.
Now, maybe 95% of your queries can be handled with a clustering key and you just need secondary indexes to handle 5% of your stuff. In that case, Cassandra does offer an easy way to handle that last 5%. However, it can be problematic if people take shortcuts too much and you end up putting too much load on the cluster. You're also putting your latency for reads at the highest latency of all the machines in your cluster. For example, if you have 100 machines in your cluster with a mean response time of 2ms and a 99th percentile response time of 150ms, you're potentially going to be providing a bad experience to users waiting on that last box on secondary index queries.
This isn't to say that Cassandra isn't useful - Cassandra has been making some good decisions to balance the problems engineers face. However, it does come with trade-offs when you distribute the data. When you have a well-defined problem, it's a lot easier to design your data for efficient querying and partitioning. When you're trying to figure things out, the flexibility of a single machine and much cheaper secondary index queries can be important - and if you hit a massive scale, you figure out how you want to partition it then.
Cassandra was just an example, but most databases can be scaled either vertically or horizontally via sharding. You are right if misconfigured performance can be hindered, but this is also true for a database which is being scaled vertically. Generally speaking you will get better performance if you have a large dataset by growing horizontally then you would by growing vertically.
Cassandra may be great when you have to scale your database that you no longer develop significantly. The problem with this DB system is that you have to know all the queries before you can define the schema.
A relative worked for a hedge fund that used this idea. They were a C#/MSSQL shop, so they just bought whatever was the biggest MSSQL server at the time, updating frequently. They said it was a huge advantage, where the limit in scale was more than offset by productivity.
I think it's an underrated idea. There's a lot of people out there building a lot of complexity for datasets that in the end are less than 100 TB.
But it also has limits. Infamously Twitter delayed going to a sharded architecture a bit too long, making it more of an ugly migration.
I do, it is running on the same big (relatively) server as my native C++ backend talking to the database. The performance smokes your standard cloudy setup big time. Serving thousand requests per second on 16 core without breaking sweat. I am all for monoliths running on real no cloudy hardware. As long as the business scale is reasonable and does not approach FAANG (like for 90% of the businesses) this solution is superior to everything else money, maintenance, development time wise.
I agree with this sentiment but it is often misunderstood as a means to force everything into a single database schema. More people need to learn about logically separating schemas with their database servers!
Another area for consolidation is auth. Use one giant keycloak, with individual realms for every one of the individual apps you are running. Your keycloak is back ended by your one giant database.
I agree that 1BDB is a good idea, but having one ginormous schema has its own costs. So I still think data should be logically partitioned between applications/microservices - in PG terms, one “cluster” but multiple “databases”.
We solved the problem of collecting data from the various databases for end users by having a GraphQL layer which could integrate all the data sources. This turned out to be absolutely awesome. You could also do something similar using FDW. The effort was not significant relative to the size of the application.
The benefits of this architecture were manifold but one of the main ones is that it reduces the complexity of each individual database, which dramatically improved performance, and we knew that if we needed more performance we could pull those individual databases out into their own machine.
I'd say, one big database per service. Often times there are natural places to separate concerns and end up with multiple databases. If you ever want to join things for offline analysis, it's not hard to make a mapreduce pipeline of some kind that reads from all of them and gives you that boundless flexibility.
Then if/when it comes time for sharding, you probably only have to worry about one of those databases first, and you possibly shard it in a higher-level logical way that works for that kind of service (e.g. one smaller database per physical region of customers) instead of something at a lower level with a distributed database. Horizontally scaling DBs sound a lot nicer than they really are.
>>(they don't know how your distributed databases look, and oftentimes they really do not care)
Nor should they, it's the engineer's/team's job to provide the database layer to them with high levels of service without them having to know the details
I'm pretty happy to pay a cloud provider to deal with managing databases and hosts. It doesn't seem to cause me much grief, and maybe I could do it better but my time is worth more than our RDS bill. I can always come back and Do It Myself if I run out of more valuable things to work on.
Similarly, paying for EKS or GKE or the higher-level container offerings seems like a much better place to spend my resources than figuring out how to run infrastructure on bare VMs.
Every time I've seen a normal-sized firm running on VMs, they have one team who is responsible for managing the VMs, and either that team is expecting a Docker image artifact or they're expecting to manage the environment in which the application runs (making sure all of the application dependencies are installed in the environment, etc) which typically implies a lot of coordination between the ops team and the application teams (especially regarding deployment). I've never seen that work as smoothly as deploying to ECS/EKS/whatever and letting the ops team work on automating things at a higher level of abstraction (automatic certificate rotation, automatic DNS, etc).
That said, I've never tried the "one big server" approach, although I wouldn't want to run fewer than 3 replicas, and I would want reproducibility so I know I can stand up the exact same thing if one of the replicas go down as well as for higher-fidelity testing in lower environments. And since we have that kind of reproducibility, there's no significant difference in operational work between running fewer larger servers and more smaller servers.
"Your product asks will consistently want to combine these data sources (they don't know how your distributed databases look, and oftentimes they really do not care)."
This isn't a problem if state is properly divided along the proper business domain and the people who need to access the data have access to it. In fact many use cases require it - publicly traded companies can't let anyone in the organization access financial info and healthcare companies can't let anyone access patient data. And of course are performance concerns as well if anyone in the organization can arbitrarily execute queries on any of the organization's data.
I would say YAGNI applies to data segregation as well and separations shouldn't be introduced until they are necessary.
"combine these data sources" doesn't necessarily mean data analytics. Just as an example, it could be something like "show a badge if it's the user's birthday", which if you had a separate microservice for birthdays would be much harder than joining a new table.
Replace "people" with "features" and my comment still holds. As software, features, and organizations become more complex the core feature data becomes a smaller and smaller proportion of the overall state and that's when microservices and separate data stores become necessary.
At my current job we have four different databases so I concur with this assessment. I think it's okay to have some data in different DBs if they're significantly different like say the user login data could be in its own database. But anything that we do which is a combination of e-commerce and testing/certification I think they should be in one big database so I can do reasonable queries for information that we need. This doesn't include two other databases we have on-prem which one is a Salesforce setup and another is an internal application system that essentially marries Salesforce to that. It's a weird wild environment to navigate when adding features.
> Your product asks will consistently want to combine these data sources (they don't know how your distributed databases look, and oftentimes they really do not care).
I'm not sure how to parse this. What should "asks" be?
Mostly agree, but you have to be very strict with the DB architecture. Have very reasonable schema. Punish long running queries. If some dev group starts hammering the DB cut them off early on, don't let them get away with it and then refuse to fix their query design.
The biggest nemesis of big DB approach are dev teams who don't care about the impact of their queries.
Also move all the read-only stuff that can be a few minutes behind to a separate (smaller) server with custom views updated in batches (e.g. product listings). And run analytics out of peak hours and if possible in a separate server.
The rule is: Keep related data together. Exceptions are: Different customers (usually don't require each others data) can be isolated. And if the database become the bottleneck you can separate unrelated services.
Surely having separate DBs all sit on the One Big Server is preferable in many cases. For cases where you really to extract large amounts of data that is derived from multiple DBs, there's no real harm in having some cross-DB joins defined in views somewhere. If there are sensible logical ways to break a monolithic service into component stand-alone services, and good business reasons to do (or it's already been designed that way), then having each talk to their own DB on a shared server should be able to scale pretty well.
If you get your services right there is little or no communications between the services since a microservice should have all the data it needs in it's own store.
Use One Big Database.
Seriously. If you are a backend engineer, nothing is worse than breaking up your data into self contained service databases, where everything is passed over Rest/RPC. Your product asks will consistently want to combine these data sources (they don't know how your distributed databases look, and oftentimes they really do not care).
It is so much easier to do these joins efficiently in a single database than fanning out RPC calls to multiple different databases, not to mention dealing with inconsistencies, lack of atomicity, etc. etc. Spin up a specific reader of that database if there needs to be OLAP queries, or use a message bus. But keep your OLTP data within one database for as long as possible.
You can break apart a stateless microservice, but there are few things as stagnant in the world of software than data. It will keep you nimble for new product features. The boxes that they offer on cloud vendors today for managed databases are giant!