Ask HN: Has anybody shipped a web app at scale with 1 DB per account?

sky_rw · on May 26, 2020

My startup currently does just this 'at scale', which is for us ~150 b2b customers with a total database footprint of ~500 GB. We are using Rails and the Apartment gem to do mutli-tenancy via unique databases per account with a single master database holding some top-level tables.

This architecture decisions is one of my biggest regrets, and we are currently in the process of rebuilding into a single database model.

FWIW, this process has worked well for what it was originally intended to do. Data-security has a nice db level stopgap and we can keep customer data nicely isolated. It's nice for extracting all data from a single customer if we have extended debugging work or unique data modeling work. It saves a lot of application layer logic and code. I'm sure for the most part it makes the system slightly faster.

However as we have grown this has become a huge headache. It is blocking major feature refactors and improvements. It restricts our data flexibility a lot. Operationally there are some killers. Data migrations take a long time, and if they fail you are left with multiple databases in different states and no clear sense of where the break occurred.

Lastly, if you use the Apartment gem, you are at the mercy of a poorly supported library that has deep ties into ActiveRecord. The company behind it abandoned this approach as described here: https://influitive.io/our-multi-tenancy-journey-with-postgre...

Happy to expand on this if anybody is interested. It's currently a cause of major frustration in my life.

ryanbrunner · on May 26, 2020

Echoing this as well, I worked for Influitive and was one of the original authours of apartment (sorry!)

There are a lot of headaches involved with the "tenant per schema" approach. Certainly it was nice to never have to worry about the "customer is seeing data from another customer" bug (a death knell if you're in enterprisish B2B software), but it added so many problems:

- Migrations become a very expensive and time-consuming process, and potentially fraught with errors. Doing continious-deployment style development that involves database schema changes is close to impossible without putting a LOT of effort into having super-safe migrations.

- You'll run into weird edge cases due to the fact that you have an absolutely massive schema (since every table you have is multiplied by your number of tenants). We had to patch Rails to get around some column caching it was doing.

- Cloud DB hosting often doesn't play nice with this solution. We continually saw weird performance issues on Heroku Postgres, particularly with backup / restores (Heroku now has warnings against this approach in their docs)

- It doesn't get you any closer to horizontal scalability, since connecting to a different server is significantly different than connecting to another schema.

- It will probably push the need for a dedicated BI / DW environment earlier than you would otherwise need it, due to the inability to analyze data cross-schema.

I still think there's maybe an interesting approach using partioning rather than schemas that eliminates a lot of these problems, but apartment probably isn't the library to do it (for starters, migrations would be entirely different if partioning is used over schemas)

sitkack · on May 26, 2020

This leads me to believe that everything you mentioned is already subtly broken, it is the new DB/account model that just exposes it.

Is there something between the two solutions or pieces that could be modified that collapse the problem?

What about going with DB account per app account and using views to limit exposure to data. If user level views are applied before the business logic has access, then the death knell scenario should be avoided.

Migrations seem hard regardless, they have to be automated anyway and have a verified rollback/undo mechanism. Code should be multiversion aware if you are doing live upgrades.

I am personally a fan of read-only mode while doing the last phase of an ETL upgrade on the backend. Snapshot databases, run a bunch of Hive jobs and throw it back down on a new set of DBs.

In the cases you outlined where it was dangerous or fragile, I think those systems were already that way, and that fragility is a system of a different problem. Fix that and lots of other stuff disappears.

ryanbrunner · on May 26, 2020

I agree that migrations are painful at the best of times, but dealing with the complexity of migrating a single database is far simpler than dealing with migrating hundreds of schemas:

- Migrations will first of all just take longer - you're multiplying the number of schema changes by the number of tenants you have.

- While in an ideal world migrations should be purely run within a transaction, occasionally performance considerations mandate that you run without DDL transactions - when some tenants fail and your migrations are in a partially completed state for some of your tenants and not others, it can be scary and painful.

- In my experience, almost no one approaches data migrations in a way that is purely backwards compatible 100% of the time without exception. You certainly can, but there's a significant tax associated with this, and if you're in a traditional scoped environment, you can often get away with the potential for errors in the minuscule time that a schema change is operating (of course, some schema changes aren't run in minuscule times, but those are the ones you're more likely to plan for)

Going read only during migrations is an interesting approach, but there's real business costs associated with that (particularly if your migration speed is multiplied by running it across tenants).

I don't want to say that you should never isolate data on a schema level, but I do think it's something that shouldn't be a standard tool to reach for. For the vast majority of companies, the costs outweigh the benefits in my mind.

bdcravens · on May 26, 2020

A model I haven't fully fleshed out but which looked promising was a single source of truth, and then copying out to Docker instances of each tenant's database. Obviously the nature of data access may not make this practical, but would for our use case (end user is mostly read only, and most writes occur in background that we control)

pgt · on May 26, 2020

Can confirm, here be dragons. I did a DB per tenant for a local franchise retailer and it was the worst design mistake I ever made, which of course seemed justified at the time (different tax rules, what not), but we never managed to get off it and I spent a significant amount of time working around it, building ETL sync processes to suck everything into one big DB, and so on.

Instead of a DB per tenant, or a table per tenant, just add a TenantId column on every table from day 1.

developuh · on May 26, 2020

I do both.

Have a tenant_id column in every table.

This gives me flexibility to either host each client separately or club them together.

gomox · on May 26, 2020

How does the architecture block major refactors or improvements? Are you running a single codebase for all your tenants, albeit with separate schemas for each?

Edit: on reading the link you included, it seems like a lot of the problems are on the Rails implementation of the architecture with ActiveRecord and Apartment rather than with the architecture itself.

krainboltgreene · on May 26, 2020

Here I'll give you one: If you want to change a property of the database that will give your specific use improved performance, you have no way to transactionally apply that change. Rolling back becomes a problem of operational scale, rolling out as well.

What if you need to release some feature, but that feature requires a database feature enabled? Normally you enable it once, in a transaction hopefully, and then roll out your application. With this style you have to wait for N database servers to connect, enable, validate, then go live before you can even attempt the application being deployed, much less if you get it wrong.

jcims · on May 26, 2020

For this architecture to work you'd likely have to be able to operate more than one release at a time and move customers along with their database. If you're able to do that then this becomes much less of an issue.

mathieuh · on May 26, 2020

I’m not a database architect, but can’t you just add the column or whatever as a nullable field in one migration, update the application to write to the column, and then make the field non-nullable in a further migration?

As I say I’m no expert but this is what I’ve done previously.

gomox · on May 26, 2020

You can, but a transactional database migration is much nicer, predictable and less prone to bugs.

lonelappde · on May 26, 2020

You roll the feature out incrementally to the users who are ready. You build backwards compatible features. Basic software engineering.

orisho · on May 26, 2020

Sure, this problem can be solved as can any other -- but there's a cost to it in development time. For every feature, the dev working on it had to do it in 3 steps to ensure backwards compatibility and handle partial migration failures gracefully. Imagine doing this for a small feature - the ops effort dwarfs the actual feature! Many small features would probably be dropped.

gomox · on May 26, 2020

Multi-single-tenant actually makes it easier to do transactional schema upgrades because these operations only impact one customer at a time. No fiddling with middle-of-the-migration state.

411111111111111 · on May 26, 2020

Depends on your uptime guarantees.

It becomes a non-issue if you're able to go down for a few moments. Not necessarily viable for some software though, it just depends on what you're running.

(Especially because you can apply the migration on slave and subsequently switch master, making this downtime as quick as the time your system needs to do a database switch.)

This doesn't apply to big services with millions of users, however.

sah2ed · on May 26, 2020

> You roll the feature out incrementally to the users who are ready. You build backwards compatible features. Basic software engineering.

The parent mentioned having to support ~150 B2B customers, so the effort is amplified x100 — more than 100 individual customers databases have to be coddled as you’ve described, albeit they are stuck with poor tooling to manage changes across their unusual architecture.

magicalhippo · on May 26, 2020

While not a web-app, we too have ~200 B2B customers running our application and each one have their own DB. Some self-host, most others are hosted by us.

We have written our own tool to upgrade the DB schema, it's nothing fancy, just takes an XML description, compares with the current DB and makes changes. However it ensures the upgrade process is automated.

New versions of our application come with a new DB schema, and we also have an auto-updater for our software. We have a small launcher application which allows the users to select a few older versions of our software, so upgrades are almost always safe in the sense that if the user encounters a bug with the new version, they can try in the previous one.

If the DB schema upgrade tool fails for some reason, the application upgrade will be held back and we'll get a notification.

Combined this allows us to be quite aggressive with pushing out new versions, and to do so without manual intervention.

A limitation with this setup is that DB changes have to be backwards compatible. So we do have a few cases of "if this table exists do this, else do that" type logic and similar, but mostly it's not a big deal.

For example I recently had to refactor some old table where they had used five columns for some additional references (ref_1, ref_2 etc), into a new table of additional references. To handle this I just made a new view which would return either the refs from the new table, or the data from the columns in case the new table didn't have any data for that record.

I then changed the application to use a grid instead of five separate edit controls, and to use this new view. If the user had to use an older version, that version would just write to the five fields and not populate a row in the new table, and the view would then return this data if the user later viewed it in the new version of our application.

So the "overhead" of backwards compatibility in this case was just ensuring this would be sufficient and writing that view, so just a few minutes.

rbanffy · on May 26, 2020

Doing the same thing to 100 databases is usually easy to automate.

I managed a group that had a dozen customers on a single multi-tenant database and some migrations took more than a day - we needed to set maintenance windows on weekends (and once we had to roll back after an error).

Having a dozen databases we could roll out updates to on a client per client basis would have saved us from some enormous headaches.

OTOH, no two companies are alike. You do you.

sky_rw · on May 26, 2020

For me it falls into the category of decisions that are easy to make and difficult to un-make. If for whatever reason you decide this was the wrong choice for you, be in tech needs (e.g. rails) or business needs, merging your data and going back into a codebase to add this level of filtering is a massive undertaking.

gomox · on May 26, 2020

Indeed, but if you are making a B2B enterprise/SMB SaaS, I think you are most likely to regret the opposite choice [1][2]. A lot of companies run a single instance, multitenant application and have to develop custom sharding functionality down the road when they realize the inevitable: that most joins and caches are only needed on strict subsets of the data that are tenant specific.

If you get successful enough in this type of application space, you reach a mature state in which you need to be able to:

* Run large dedicated instances for your largest customers, because either their performance or security requirements mandate it.

* Share resources among a large number of smaller customers, for efficiency and fault tolerance reasons.

You can get there in two ways:

* You start with a massive multi tenant application, and you figure out a way to shard it and separate in pieces later.

* You start with multiple small applications, and you develop the ability to orchestrate the group, and scale the largest of them.

I would argue the latter is more flexible and cost efficient, and requires less technical prowess.

[1] https://news.ycombinator.com/item?id=23307172

[2] https://news.ycombinator.com/item?id=23307139

cornishpixels · on May 26, 2020

You can always take a multi-tenant system and convert it into a single-tenant system a lot more easily. First and foremost, you can simply run the full multi-tenant system with only a single tenant, which if nothing else enables progressive development (you can slowly remove those now-unnecessary WHERE clauses, etc).

krainboltgreene · on May 26, 2020

> You can always take a multi-tenant system and convert it into a single-tenant system a lot more easily.

This is not true if your primary keys are int or bigint.

It's also not true if you have any sort of unique indexes that are scoped to a table.

cornishpixels · on May 26, 2020

How is it not? Uniqueness would cease to matter between different tenants. They're unique by virtue of using a different database.

gomox · on May 26, 2020

See my sibling comment with the Hubspot example. Even though the system might work internally, other things will break if you start having duplicate account IDs because other systems don't think of the account ID as a cluster-local identifier, but as a global one.

ryanbrunner · on May 26, 2020

Just thinking through this, but if it's an entirely separate environment, just host it on a separate subdomain and the account id becomes irrelevant. If you have a functioning staging environment, you already have the ability to run an isolated environment without issue, this is just a different application of the same idea.

gomox · on May 26, 2020

You can probably run the environment itself, but other systems (release management, monitoring, billing, etc) probably rely on the account_id being unique.

ehsanu1 · on May 26, 2020

I think you are talking about problems going from multiple single-tenant systems to a single multi-tenant system. You parent is talking about the opposite.

gomox · on May 26, 2020

No.

For example, HubSpot runs a multitenant system. URLs look like:

https://app.hubspot.com/section/$ACCOUNT_ID/etc/etc

In the simple, YAGNI implementation of this, when you create a new HubSpot account, most likely that will insert a new row into the accounts table, and the auto generated ID of that row will be your account ID. Therefore you need uniqueness to be enforced at that level.

If you want to start running a separate copy of the system, you need to refactor the system to move that sequence out of the database so that two different customers running on different clusters don't end up with the same account ID. This is just an example, but there are many problems like this that are caused by the assumption that the production system is a single unique system.

mrunkel · on May 26, 2020

There are many ways to solve this that don't require uniqueness across all systems.

https://${customer}.hubspot.com/... https://app.hubspot.com/${customer}/...

You'd do this at the proxy/forwarder level.

gomox · on May 26, 2020

Everything has a solution, but want to bet that at least 20 different internal systems at HubSpot assume that the account ID in that URL is globally unique?

gomox · on May 26, 2020

True, but:

In my experience by the time you reach this point you have a lot of operational complexity because you and your team are used to your production cluster being a single behemoth, so chances are it's not easy to stand up a new one or the overhead for doing so is massive (i.e. your production system grew very complex because there is rarely if ever a need to stand up a new one).

Additionally, a multi tenant behemoth might be full of assumptions that it's the only system in town therefore making it hard to run a separate instance (i.e. uniqueness constraints on names, IDs, etc).

egamirorrim · on May 26, 2020

Some of the issues I see in one of my projects is high interactivity between accounts. E.g. if account 1 'sends' something to account 2 both of the shared/separate db instances need to be up or there'll need to be some kind of queueing mechanism.

That's hard enough and then add to it that most clients want to BYOK to those instances

gomox · on May 26, 2020

High interactivity between accounts is a good reason to not adopt the proposed multi-single-tenant architecture. The scenarios discussed are B2B enterprisey apps in which the environments are essentially independent from each other.

sky_rw · on May 26, 2020

Yeah, it's far from a black and white issue. I am by no means against single tenant database structures, and for our initial use case it was likely the correct decision. I have not run into anything that stopped us in our tracks.

My pushback focuses on two things:

1: The Rails implementation of this, specifically with off the shelf gems and integration with ActiveRecord. Presents a lot of unknown framework complexity down the road. We are currently running on a fork of the Apartment gem as it breaks under newer versions of Rails.

2: Long term support for evolving business case. Now, this is very much a problem unique to our company and how our business/product has evolved. We started out with a very clear separation of account concerns, and as we've grown and pivoted things have changed. You are unlikely to experience the same evolution we have. HOWEVER, all startups face these unknowns, and giving yourself as much flexibility early in the game will pay off down the road. There are no right answers here, you just have to think it through and weigh the costs/benefits. Maybe for you that flexibility is found in single tenant. For us it appears in hindsight that multi-tenant would have been the best long term choice.

qatanah · on May 26, 2020

Can confirm the experience, I had to work on a company before with similar tech using Django.

https://github.com/bernardopires/django-tenant-schemas

All I can say, that it was a nightmare to do deployment and migration and some backward incompatibility features restrained us in some ways.

The company valued data isolation as a priority rather than ease of developing it. Hard to work with but great for data isolation.

bluejekyll · on May 26, 2020

The link failed to medium for me. Would you mind describing in a little more detail some things?

What DB are you using?

How mature are your automation tools for managing these DB instances?

Do you use a “single” DB server instance with a database defined per customer?

Do you require all your customers to have identical schema, and do upgrade them all irrespective of individual customer concerns?

Thanks in advance!

rclayton · on May 26, 2020

I totally get why you are doing this, but I would think DB management and migrations would be really troublesome. Do you have a migration framework to manage migrations across all catalogs?

elialbert · on May 26, 2020

want to chime in that this matches my experience doing something similar in B2B in django 10 years ago

boulos · on May 26, 2020

Disclosure: I work on Google Cloud.

tl;dr: Wait until you need it, but there are good reasons for it!

Since I didn’t see anyone mention it, the term I’ve seen a lot of people use for this pattern is “multi single tenant”.

Part of the reason we have Tenant Projects [1] is precisely so you can do a 1:1 mapping of “Customer A can have different controls, settings, and blast radii from Customer B”.

Many of our first-party services do this, but then again, many of the largest-scale ones do “true multitenancy” instead. There’s no great way to offer a scale-to-zero database like Datastore or BigQuery without “internalizing” the multitenancy. But then it’s on you as the implementor to ensure isolation (both security and performance).

In your scenario, if you could make each database small enough (or use a database that handles the multitenancy for you) you gain a lot in terms of “capability” for enterprises. Customer X wants a feature that you know how to do but aren’t sure it should be for everyone? Their instance is separate. Customer Y has a drastically different read pattern and should have a high-IO instance? Great.

The easiest advice though is: wait until after you need it. A single replicated pgsql or MySQL can go insanely far. Let your business needs push you to the more complicated architecture, later. Prefer whatever gets you to a success disaster. Just don’t forget backups, PITR, and actually testing that :). (Another thing that’s harder with a shared database!).

Edit: Adding a newline and a tl;dr.

[1] https://cloud.google.com/service-infrastructure/docs/manage-...

neeleshs · on May 26, 2020

Somewhat tangetial: If we use something like bigquery which handles multitenancy well, there still doesn't appear to be a good way to expose it to a customer directly (say for a BI tool). Like with a simple username/pwd.

Any pointers?

jpgvm · on May 26, 2020

In GCP you can grant permissions to entities outside of your org, either to users or to an external service account. This is how things like Cloud Build and other services that require Google to have access to resources inside your projects work.

neeleshs · on May 26, 2020

Doesn't it still need the end users to have some kind of google account? At the very least a google email account?

jpgvm · on May 26, 2020

If you want to expose BQ directly yes.

delta1 · on May 26, 2020

> PITR

"Point-in-time recovery" for those of us that don't know the acronym

caogecym · on May 26, 2020

Thanks for shraring! It’s great advice! Mind elaborate on the single replicated db approach? Curious what the replica’s role here that makes a project scale.

gomox · on May 26, 2020

If an "account" is an "enterprise" customer (SMB or large, anything with multiple user accounts in it), then yes, I know at least a few successful companies, and I would argue in a lot of scenarios, it's actually advantageous over conventional multitenancy.

The biggest advantage is flexibility to handle customers requirements (e.g. change management might have restrictions on versioning updates) and reduced impact of any failures during upgrade processes. It's easier to roll out upgrades progressively with proven conventional tools (git branches instead of shoddy feature flags). Increased isolation is also great from a security standpoint - you're not a where clause away from leaking customer data to other customers.

I would go as far as saying this should be the default architecture for enterprise applications. Cloud infrastructure has eliminated most of the advantages of conventional multitenancy.

If an account is a single user then no.

PS: I have a quite a lot of experience with this so if you would like more details just ask.

randlet · on May 26, 2020

I run a very small SaaS and do this. Each customer gets its own isolated AWS account all organized under a root account for consolidated billing. It works great and has allowed me to develop client specific features where required (using git branches). Customers like the fact that their data is completely isolated from everyone else.

Rolling out updates is simple, I just run the same deploy script for each client (after scheduling with the client if need be) and if a deploy goes wrong, you'll usually notice it after it affects one customer rather than all of them.

The main downside for me is cost. The cost per customer scales linearly which hurts as you don't get the same economies of scale like you would in single tenant solutions.

gomox · on May 26, 2020

I don't know your application enough to comment, but that seems like an extreme version of what I'm proposing, which would be to share application and DB servers between customers, but not application code and databases (i.e. you can still run 3 customers on the same EC2 and RDS instances, they just have separate application code and separate database schemas).

One of the main benefits, like you describe, is lowered costs. Running an AWS account for each customer is pretty significant overhead and would only be worth it to me if each customer was quite substantial (50K/yr or above?).

randlet · on May 26, 2020

The way you are thinking would be much cheaper than what I'm doing with the downside that clients data exists on the same server (but maybe they won't care) and the fact that a performance issue for one client can impact other clients.

I haven't found an AWS account per customer to be to bad to manage yet and makes keeping track of your AWS spend per client simple. It's also trivial to run your resources in say Sydney for your Australian clients and London for your European clients.

gomox · on May 26, 2020

If it works, then it works :)

Company B I mentioned above still runs location specific infrastructure, but each one of these clusters can host several customers. Of course there is some overhead (i.e. very new regions sometimes operate at a loss if there is not a critical mass of customer in that cluster, but they are subsidized by more profitable regions until they reach a critical mass of customers to pay for the whole cluster costs).

sitkack · on May 26, 2020

I love that you went up an encapsulation level compared to what even the OP was asking. The ability to scale and customize on a single customer level is amazing. Do you find yourself capitalizing on this capability fairly often?

randlet · on May 26, 2020

80% of my clients run the same "master" version while 20% have minor customization's specific to their organization.

I don't want to end up maintaining 20 different applications, but being able to do minor modifications to the UI or data model on a per customer basis has been a nice selling point.

robot · on May 26, 2020

Do you not have to have a large dev ops team to handle upgrades or changes for each customer? I'd be against it to do more with fewer people, especially for an unproven < 5 year old startup.

gomox · on May 26, 2020

Actually, I think conventional multitenant (shared database with tenant_id on database rows) is a bad idea (as well as premature optimization) in the previously described context.

One of the companies I am thinking of had no dedicated ops team until 300+ customers, running on 300+ instances of the setup. Probably 40+ employees at the time the first dedicated sysadmin was hired.

The trick is that since the production environment is extremely similar to the developer environment, all the tools to upgrade versions, migrate databases, etc, exist and are constantly used by all the developers.

When the customer base grows, horizontal expansion is baked into the architecture. When your largest customer grows beyond what you can host, you scale up with trivial mechanisms (database replication, multiple application servers). All you need is orchestration of extremely common, well understood and mature tools.

Orchestration is home baked anyway at every company that I've had experience with (there is no off the shelf "multitenant application management system"), but by doing it this way you rely on extremely mature components for all the underlying tools, because they have been used to scale websites since 2000.

bearjaws · on May 26, 2020

For context, we implemented your exact model at my job prior to switching to "conventional multi-tenant". I agree with your premise that there are a lot of off the shelf tools to help this, but in enterprise I feel like we hit dozens of other issues and our customers didn't even care if they shared the same database as someone else.

If the production environment is extremely similar to dev, the problems are numerous, how do you ensure that a bug in one app server doesn't let users access other request data? We solved this via multiple app servers, but then how do you solve deploying to hundreds of individual app servers? We have 12 services per customer, replicated to 4 physical locations, that means 124500(customers) 24,000 containers to roll out any upgrade completely. (99.99% SLA)

Also some customers get different services depending on their tier, so now we need to bake in some form of Salesforce integration into our deployment so we know which services to start up(we didn't do this unfortunately, so all services started up and some didn't do anything).

If you split into many physical database servers vs many "logical" databases, how do you monitor that an individual customers performance is adequate? What about APM? One NewRelic instance per customer? One ELK cluster per customer? How do your developers / customer support engineers get access to a specific customer instances both in the software, in the database and APM? I cannot fathom having to manage 500 ELK clusters, even with AWS managed Elastic search.

How do you enable ETL and reporting solutions that support all these databases. What if you have data aggregation rights and need to combine different customer's data? What happens when your product team needs to report across all customers to determine what features to work on (gotta be data driven)?

Maybe this is only related to healthcare (where I work), but in our industry I cannot imagine our health systems and manufacturers not wanting answers to these. These answers are infinitely simpler in our new multi-tenant model vs our old multiple host model. Writing the processes and documentation around guaranteeing segmentation took less than 8 hours and is only 3 pages of our entire security model. Hi-trust specifically calls out many of these points, and I am sure SOC-2 does as well.

In my experience, not a single one of our health systems or auditors has had an opinion either way on this. They only care when you aren't doing APM properly, or taking the necessary steps to prevent attacks both from external parties and by internal parties. We do in-depth third party audits that are shared with our customers. Ensuring good processes for managing risk is always more important than wishful thinking that a design pattern will make your enterprise model work.

Our total lines of code for logical segmentation takes up less than 300 lines in all of our services combined.

Finally, given that Postgres10+ now natively supports row level security, I fail to see why anyone starting a new project would choose the multiple database option when you can bake in authorization straight into the DB at the lowest level.

Sorry this is really long, but I wanted to share my experience & thoughts.

gomox · on May 26, 2020

Both of the setups I'm familiar with run multiple application servers on the same "iron" (EC2 clusters), with shared services (database servers, caches, routing, load balancing, etc).

Monitoring, backup, ELK, New Relic etc are global, or shared among sets of clusters depending on the specifics.

I agree that 12 containerized services per customer seems like a headache. The applications I have experience with run fewer components, and they are mostly shared among clusters.

rrrix1 · on May 27, 2020

Thank you for this. Working on a new project using postgres and we were planning on doing single tenant. You've given me lots to think about!

bearjaws · on May 26, 2020

Your biggest advantage is actually a disadvantage, you have literally enabled your individual customers to boss you around and fork your codebase into a mess of different functionality per customer. Are you a contract shop where you get SOWs to expand your software? This seems like a terrible idea for both product team and the development team, who now need to know what bastardized version of the software each customer is running. What happens when a customer goes "I don't want that update"? Unless you can guarantee that each customer ends up on the same software quickly, this will get messy.

gomox · on May 26, 2020

Flexible and disciplined is a good alternative to rigid, and for young software business, probably the safer one!

juped · on May 26, 2020

I think the standard term is "tenant" for that organizational level, no?

polote · on May 26, 2020

What would be the best, different tables per customer or different db?

gomox · on May 26, 2020

What I have experience with is separate DB, and separate applications. Then basically you orchestrate a bunch of applications, a task for which a myriad of tools already exist.

brianwawok · on May 26, 2020

There’s overhead both ways. Multi-tenant is more efficient.

What is the most possible users you could have in 5 years? Can your choice be supported in that technology?

gomox · on May 26, 2020

Enterprise systems have little to no cross-customer functionality, and customer isolation is paramount. I think you are assuming that they will have a database per individual user, but the original question pointed to "tenants" (i.e. companies using the system).

brigandish · on May 27, 2020

There's always schema to consider too, you can have thousands of them too. An often overlook tool.

barrkel · on May 25, 2020

Yes, for multi-tenancy. Database per tenant works alright if you have enterprise customers - i.e. in the hundreds, not millions - and it does help in security. With the right idioms in the codebase, it pretty much guarantees you don't accidentally hand one tenant data belonging to a different tenant.

MySQL connections can be reused with database per tenant. Rack middleware (apartment gem) helps with managing applying migrations across all databases, and with the mechanics of configuring connections to use a tenant based on Host header as requests come in.

For async jobs in Java, allocating a connection can idiomatically only be done via a callback - e.g. `executeInTenantContext(Tenant tenant, Runnable block)` - which ensures that all connections handed out have the right tenant selected, and everything is reset to the default tenant (an empty database) when exiting. Per-tenant jobs either iterate through all tenants, or have a tenant parameter under which they get executed, and the rest of the code can be more or less tenant unaware.

It gives you the "freedom" to move tenants to separate servers, or consolidate them into single servers, if tenants are much larger or much smaller than expected. In reality this is sufficiently painful it's mostly illusory. We're looking at Vitess to help scale out, and move away from a mix of multi-tenant servers and single-tenant servers.

thayne · on May 26, 2020

It also makes sharding/ horizontal scalability a non-issue, at least until you get a big enough customer.

krainboltgreene · on May 26, 2020

But like, every database is easy to shard/horizontally scale until you get a big enough customer. Why do the extra leg work?

gomox · on May 26, 2020

If you have 500 medium customers and you need to add sharding to a multitenant production behemoth, it's not trivial.

sergiotapia · on May 26, 2020

It's mostly a contractual decision, not because the engineering team wants to implement it for giggles.

thomascgalvin · on May 25, 2020

The inability to reuse database connections would be a huge performance hit.

In a traditional webapp backend, you have a pool of connections to the database. User01 hits your service, and grabs a connection off the pool. User02 does the same, and so on. These connections get put back in the pool for reuse once a user is done with them.

In your design, every time a user hits your service, a new connection, specific to that user, will have to be made. This will incur network traffic and the overhead of logging in to the DBMS.

If you're thinking about using something like SQLite, you will hit a hard wall when the OS isn't able to open any more file descriptors, as well.

Like you said, DB administration will be a huge pain in the ass. Rather than having Flyway or Liquidbase or whatever run a migration on one database, you'll have to run it on thousands of databases. There will be significant downtime when your databases are not in a consistent state with one another. There will also be bloat from the migration tool's record keeping, which will be duplicated for every user, rather than every database.

A lot of the tools a database gives you for free will also need to be implemented in application logic, instead. For example, you might want to run a query that says "show me every user using more than 1GB of storage," but under your schema, you'll have to log into every user's database individually, determine storage used, and add it to an in-memory list.

If you ever want to allow users to collaborate, you will end up replicating the owner_id field type metadata anyway, and the entire benefit of this schema will evaporate.

Most frameworks are not set up to handle this style of database access, either. I don't use Rails, but Spring Boot would fight you every step of the way if you tried to do this.

toast0 · on May 26, 2020

> In your design, every time a user hits your service, a new connection, specific to that user, will have to be made. This will incur network traffic and the overhead of logging in to the DBMS.

If connecting to your DB is significantly increasing your page times, you've got seriously fast pages. Even back when I was working with a MySQL database regularly in 2010, connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far).

> If you're thinking about using something like SQLite, you will hit a hard wall when the OS isn't able to open any more file descriptors, as well.

You could always just only keep open the most recently used 1 Million databases. It's pretty easy to tune the FD limit, FreeBSD lets you set it to one FD per 16k of ram without modifying the kernel, but it's a challenge to be so memory and cpu efficient that that's a limit.

All that said, it really depends on the application. If this is consumer facing, and each account is an end user, one database per user is probably excessive overhead; one database (or sharded, if you've got the users) or one (flatish) file per user makes a lot more sense.

If it's business facing, than one database per account could make sense. You would have isolation between customers and could put big customers on dedicated boxes and let the customer drive upgrades and migrations etc. Just please please please consider that corporations merge and divide all the time, don't be like G Suite and not offer a way to merge and divide accounts to reflect their corporate ownership.

llimllib · on May 26, 2020

> connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far).

I try to get all my endpoints under 10ms! 5ms per call would be huge.

(Obviously I can't succeed all the time, but 5ms is big numbers imo)

toast0 · on May 26, 2020

Ok, you and I can be friends. A lot of people are using 'lightweight' frameworks where hello world is 30 ms, and then they call slow services and run slow queries, etc.

If your target is 10 ms, then you probably should worry about db connection time.

freedomben · on May 26, 2020

With Elixir/Phoenix I sometimes see response times measured in µs rather than ms!

Thristle · on May 26, 2020

>connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far)

is that with creating a new connection in the pool? we work with microservices so we have a "db gateway" so any request to the DB goes through that and it routes to the correct db server for that tenent. our latency for an already "hot" connection is about 40-50 on average but i belive the lowest number i got (for query by primary key in a kinda small table) was no less than 20-30 and opening a new connection added a couple of 10's atleast to that number

toast0 · on May 26, 2020

Yeah, that's a totally new connection (no pool), time from memory (could be off).

On my home system with a database I happen to be running anyway, I see:

    $ time mysql -u mythtv -h 192.168.0.12 mythconverg \
     -e 'select * from credits where person = 147628 limit 1' > /dev/null

    real    0m0.023s

Server is running a Celeron(R) CPU 1007U @ 1.50GHz, client is Celeron(R) 2955U @ 1.40GHz, networking is 1GBps. I don't have a super easy way to measure just the connect + login time, so this is connect + login + indexed query. The server is lightly loaded, and I warmed up the table, but it's also a laptop chip on a desktop oriented board with a lowend NIC.

Thristle · on May 27, 2020

my guess is the service http call chain can add up. common call chains are 3+ services

krainboltgreene · on May 26, 2020

> If connecting to your DB is significantly increasing your page times, you've got seriously fast pages. Even back when I was working with a MySQL database regularly in 2010, connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far).

One of the major ways I helped Mastodon was improving their connection pooling situation. If you haven't encountered connection size issues with your database count yourself lucky.

toast0 · on May 26, 2020

My experience was with Apache + mod_php, so there was no option to pool connections between workers, and you would set the Apache connection limits such that they summed up to less than the MySQL connection limits (unless you had a lot of traffic that didn't hit the database... then sizing would be tricky)

geofft · on May 26, 2020

I think (but it's not totally clear) that the proposal is about B2B services and it's one database per paying account, not one database per end user.

For instance, if you're Slack, the proposal would be to have one database per Slack workspace, not one database per person who has a login to any workspace. You absolutely need to have data relationships between multiple users in a workspace. You don't necessarily need relationships between multiple workspaces (as it happens, Slack only added this feature very recently), and having different databases means that if company 1 triggers pathological behavior on their Slack, company 2 is unaffected.

Or, if you're Wikia^WFandom, you'd have one database per site. You could certainly run one big database for the entire site, but you could also have one database for Wookieepedia, one for Memory Alpha, one for the Harry Potter Wiki, etc.

In these situations, you wouldn't have the problem about the performance hit or about making it work with frameworks - you'd run separate web application servers per workspace/site/customer, too. Some of your machines would be running Rails/Spring Boot/Django/PHP/whatever with an environment file for Wookieepedia. Some would run Memory Alpha. Ideally you'd throw this in some sort of autoscaler (anything from Apache-managed FastCGI processes to Kubernetes would work fine). But User02, when they visit Wookieepedia, would hit a Wookieepedia application server that User01 has previously used and already has a connection to the Wookieepedia DB.

Yes, you would need to deal with doing a DB migration per site/customer instead of one big migration - but on the other hand, you get to to a DB migration per customer instead of one big migration. Each one is unaffected by the others. You'd spend a bit of time automating the process of migrations and you'd hit your test sites first, and then migration is much more reliable for your customers. If you really need downtime, you can set separate downtime per customer, and if one customer has a bunch of data that takes forever to migrate, no other customer cares. It's a tradeoff, and it may be useful.

If you want to allow customers to collaborate, you need to build it as an API, as if they were two different products interacting at arms' length - but you can certainly do that.

barrkel · on May 25, 2020

Not a problem with MySQL, "use `tenant`" switches a connection's schema.

Rails migrations work reasonably well with apartment gem. Never had a problem with inconsistent database migrations. Sometimes a migration will fail for a tenant, but ActiveRecord migrations records that, you fix the migration, and reapply, a no-op where it's already done.

We don't use a single mysqld for every tenant mind, it's not like migrating tenants is completely serialized.

sp332 · on May 26, 2020

> USE `tenant`

But if the idea is to isolate accounts form each other, the different schemas would be available to different DB users. You would have to re-authenticate to get access to the other DB.

ryanbrunner · on May 26, 2020

Using schemas gives you imperfect but still improved isolation. It's still possible for a database connection to cross into another tenant, but if your schema search path only includes the tenant in question, it significantly reduces the chance that cross-customer data is accidentally shared.

sitkack · on May 26, 2020

I think numeric ids should be allocated out of the same key space, other identifiers should be hierarchical and scoped to the tenant in the database.

The same query run across all databases should either return 1 query (for the valid tenant) and empty set for all other databases, OR it should return the same result set regardless.

I just realized what I am proposing, a hidden out of band column that is effectively the "database id" for that row.

ryanbrunner · on May 26, 2020

If you built a tenanting library that used partioning rather than schemas, you'd probably end up with something that looked pretty close to what you're describing.

With schemas, it's definitely possible to use the same generator for ids across schemas (at least, I'm 90% sure it is in Postgres), but you'll probably end up fighting against ORM libraries to get it to work properly (Rails for instance makes a LOT of assumptions about how the id column works), and you aren't technically guaranteed uniqueness since you'll still have distinct PK columns.

barrkel · on May 26, 2020

The idea is prevent a simple SQL mistake from exposing information across tenants.

latch · on May 26, 2020

RLS seems like a simpler solution.

barrkel · on May 26, 2020

How well does that work with mysql 5.5 in 2012?

Exactly.

(It's not actually simpler when query execution over 100s of millions of rows is a perf bottleneck, and each tenant has several billion rows in the main tables. Then you're grateful you can schlep them around, and keep small tenants fast, etc. Even now, Postgres would still be a dubious choice due to the unpredictability of its query planner, though I use it for all my hobby projects.)

jbverschoor · on May 25, 2020

You can use the same connection across multiple databases without any problem.

WatchDog · on May 25, 2020

Not on postgres.

koolba · on May 25, 2020

You can use schemas.

Also it’s still a terrible idea.

andrewflnr · on May 26, 2020

Why is it a terrible idea?

gomox · on May 26, 2020

It's a terrible idea in the same way that using PHP instead of Rust to build a production large scale application is a terrible idea (i.e. it's actually a great idea but it's not "cool").

koolba · on May 26, 2020

It’s not a cool factor issue. It’s an issue of bloating the system catalogs, inability to use the buffer pool, and having to run database migrations for each and every separate schema or maintaining concurrent versions of application code to deal with different schema versions.

It’s be just as dumb to do it in Rust as it would be in PHP.

gomox · on May 26, 2020

As you can see now that the thread has matured, there are a lot of proponents of this architecture that have production experience with it, so it's likely not as dumb as you assume.

koolba · on May 26, 2020

> As you can see now that the thread has matured, there are a lot of proponents of this architecture that have production experience with it, ...

Skimming through the updated comments I do not see many claiming it was a good idea or successful at scale. It may work fine for 10s or even 100s of customers, but it quickly grows out of control. Trying to maintain 100,000 customer schemas and running database migrations across all of them is a serious headache.

> ...so it's likely not as dumb as you assume.

I'm not just assuming, I've tried out some of the ideas proposed in this thread and know first hand they do not work at scale. Index page caching in particular is a killer as you lose most benefits of a centralized BTREE structure when each customer has their own top level pages. Also, writing dynamic SQL to perform 100K "... UNION ALL SELECT * FROM customer_12345.widget" is both incredibly annoying and painfully slow.

gomox · on May 26, 2020

I don't think we share the definition of "scale".

Extremely few companies that sell B2B SaaS software for enterprises have 10K customers, let alone 100K (that's the kind of customer base that pays for a Sauron-looking tower in downtown SF). Service Now, Workday, etc, are publicly traded and have less than 5000 customers each.

All of them also (a) don't run a single multitenant cluster for all their customers and (b) are a massive pain in the ass to run in every possible way (an assumption, but a safe one at that!).

neeleshs · on May 26, 2020

Yep. Too easy and not cool. But works really well and no headaches

Aweorih · on May 26, 2020

Or you have a connection pool for every user, which is basically the same except that you must do some mapping by yourself and you have more connection pools and open connections.

JMTQp8lwXL · on May 26, 2020

Serverless lambdas only re-use connections in certain situations, IIRC. That might not be the "traditional webapp backend", but it's a growing concept.

greatjack613 · on May 25, 2020

This - golden

redmattred · on May 25, 2020

In the past I worked at a company that managed thousands of individual MSSQL databases for individual customers due to data security concerns. Effectively what happened is the schema became locked in place since running migrations across so many databases became hard to manage.

I currently work at a company where customers have similar concerns around data privacy, but we've been to continue using a single multitenant DB instance by using PostgreSQL's row level security capabilities where rows in a table are only accessible by a given client's database user:

https://www.postgresql.org/docs/9.5/ddl-rowsecurity.html

We customized both ActiveRecord and Hibernate to accommodate this requirement.

onefuncman · on May 26, 2020

Also have thousands of MSSQL databases, but with significant investment in tooling it really is transparent from a feature development standpoint, and our feature toggles are dirt simple.

Another comment suggested doing queries across so many databases is challenging, and it's just not, we have both adhoc query capabilities across the fleet and a traditional data warehouse...

Now, trying to make additional infrastructure changes is challenging, but the architecture is robust enough to solve all the immediate business needs.

patio11 · on May 26, 2020

I am aware of at least one company which does this from my consulting days, and would caution you that what you get in perceived security benefits from making sure that tenants can't interact with each others' data you'll give back many times over with engineering complexity, operational issues, and substantial pain to resolve ~trivial questions.

I also tend to think that the security benefit is more theatre than reality. If an adversary compromises an employee laptop or gets RCE on the web tier (etc, etc), they'll get all the databases regardless of whose account (if any) they started with.

(The way I generally deal with this in a cross-tenant application is to ban, in Rails parlance, Model.find(...) unless the model is whitelisted (non-customer-specific). All access to customer-specific data is through @current_account.models.find(...) or Model.dangerously_find_across_accounts(...) for e.g. internal admin dashboards. One can audit new uses of dangerously_ methods, restrict them to particular parts of the codebase via testing or metaprogramming magic, etc.

djd20 · on May 26, 2020

This is true if your application is running on a shared servers - however if you have fully isolated application and database deploys then you really do benefit from a security and scalability perspective- and by being able to run closer to your clients. I'd also say that it works better when you have 100s, rather than thousands of clients, most probably larger organisations at this point.

abhishektwr · on May 25, 2020

For Postgress you can use and scale one schema per customer (B2B). Even then, depending on the instance size you will be able to accommodate 2000-5000 customers at max on a Postgres database instance. We have scaled one schema per customer model quite well so far (https://axioms.io/product/multi-tenant/).

That said, there are some interesting challenges with this model like schema migration and DB backups etc. some of which can be easily overcome by smartly using workers and queuing. We run migration per schema using a queue to track progress and handle failures. We also avoid migrations by using Postgres JSON fields as much as possible. For instance, creating two placeholder fields in every table like metadata and data. To validate data in JSON fields we use JSONSchema extensively and it works really well.

Probably you also need to consider application caching scenarios. Even you managed to do one database per customer running Redis instance per customer will be a challenge. Probably you can run Redis as a docker container for each customer.

Keats · on May 26, 2020

there's a typo in the title of that page: Mulit-tenant

abhishektwr · on May 26, 2020

Thank you. Saved me.

geofft · on May 26, 2020

Slack does (or did, as of a few years ago) something like this - they shard teams onto a fleet of MySQL servers. See https://www.infoq.com/presentations/slack-infrastructure/ starting around the 10-minute mark and https://www.infoq.com/podcasts/slack-keith-adams/ starting around 7m15.

If I'm understanding this right, every team gets its own database, which is active-active replicated across two MySQL servers. There's also a separate mapping from teams to MySQL servers (which is itself a separate pair of MySQL servers), and of those two servers, one is primary for each team, distributed roughly evenly. Each MySQL server instance in the fleet is hosting many Slack workspaces, and they can scale that up/down as needed (i.e., presumably the tiny Slack I created years ago and forgot about is hosted on the same pair of MySQL servers as lots of other Slack instances in the same state and my employer's active Slack instance with thousands of people is on a less-crowded pair of MySQL server).

One user-visible effect is that it's possible to have outages that affect some fraction of workspaces. I also suspect this plays into things like rolling out changes over the course of many months - they don't need to do a database migration for everyone at one.

I don't think they use this for security - while yes, a buggy query can only affect one workspace at once, it doesn't sound to me like they do sharding at the PHP layer, it sounds like they're running a fleet of generic PHP instances that have the ability (and authorization) to talk to any workspace's database, not that they're running separate web/PHP instances per customer. But it definitely sounds like they rely on this for scalability/performance.

dmlittle · on May 26, 2020

I'm not sure if they were using it at the time Keith gave his presentation but today Slack is using Vitess[1] to handle MySQL clustering. There are a few KubeCon talks about it you're more interested in it. If I remember correctly they're on the tail-end of the Vitess migration and should finish sometime this year.

[1] https://vitess.io/

swlkr · on May 26, 2020

I worked for a company that did this, we had hundreds of database instances, one per customer (which was then used by each of those customers' employees).

It worked out pretty well. The only downside was that analytics/cross customer stats were kind of a pain.

The customers all seemed to like that their data was separate from everyone else's. This never happened, but if one database was compromised, everyone else's would have been fine.

If I were starting a B2B SaaS today where no customers shared data (each customer = a whole other company) I would use this approach.

Polylactic_acid · on May 26, 2020

It has been an actual requirement from our customers that they don't share an instance or database with other customers. It also seriously limits the scope of bugs in permissions checks. Sometimes I will find a bit of code that should be doing a permissions check but isnt which would be a much bigger problem if it was shared with other companies.

troydavis · on May 25, 2020

I’ve seen this done for performance. At one point, it was more common than you might think. These days, most data stores have a way to indicate how to store the data on disk, or one is using SSDs so random access/seeks are less expensive than they were with spinning disks.

As one example, New Relic had a table per (hour, customer) pair for a long time. From http://highscalability.com/blog/2011/7/18/new-relic-architec... (2011):

> Within each server we have individual tables per customer to keep the customer data close together on disk and to keep the total number of rows per table down.

In a situation like that, all queries and table operations are customer- and time-specific anyway. At the time, dropping an entire table took less I/O than deleting specific rows in a multi-customer table (for MyISAM tables, this may still be true: https://mariadb.com/kb/en/big-deletes/). Also, there was no risk from locking the table.

https://www.slideshare.net/newrelic/how-to-build-a-saas-app-... has a bit more. I think Lew Cirne gave that presentation in 2011 but I can’t find a video of it.

In the example you gave, if the goal is to support customer-defined fields, I don't think most people would map the customer-defined fields directly to SQL columns. Consider something like Postgres hstore (with indices as needed) or the many similar implementations in other data stores.

eigilsagafos · on May 26, 2020

Early stage B2B startup ShiftX here. We are successfully doing this with FaunaDB. In Fauna databases are as lightweight as tables and are nested in a hierarchy. This enables us to do management in the “root” database and keep all customer data separated in child databases. So when a user signs in to our app he gets a session that is tied to the specific tenant database. This model would also allow us to do completely separate database deployments for customers with special requirements.

conductr · on May 26, 2020

I’ve done this. But the service was suited for it in a couple ways;

1. Each tenant typically only has <10 users, never >20. And load is irregular, maybe only ever dealing with 2-3 connections simultaneously. Maybe <1000 queries per hour max. No concerns with connection bloat/inefficiency.

2. Tenants creates and archives a large number of rows on some tables. Mutable but in practice generally doesn’t change much. But >100M row count not unusual after couple years on service. Not big data by any means, limited fields with smallish data types, but...

I didn’t want to deal with sharding a single database. Also given row count would be billions or trillions at a point the indexing and performance tuning was beyond what I wanted to manage. Also, this was at a time before most cloud services/CDNs and I could easily deploy close to my clients office if needed. It worked well and I didn’t really have to hire a DBM or try to become one.

Should be noted, this was a >$1000/month service so I had some decent infrastructure budget to work with.

scosman · on May 26, 2020

I’ve managed a system with millions of users and tens of billions of rows, and I always dreamed of DB per user. Generally, ~1% of users were active at a given time, but a lot of resources were used for the 99% who were offline (eg, indexes in memory where 99% of the data wouldn’t be needed). Learned a few tricks. If this is the problem you're trying to solve, some tips below.

Start by optimizing your indexes. Ensure customer_id is in every index and is the first item. This allows the query to immediately filter to only rows for this customer. Ensure all queries include a customer_id (should be doing this anyway in multi-tenant environment). Even single row lookups by primary key can be really sped up this way; once the index becomes bigger than memory it has to be paged in/out. However with this approach the entire sub-tree of the index for “hot” users can remain in memory without paging, increasing cache hit rate, speeding up queries, and reducing IO.

The above is generally enough. However, some consumer apps have a ton of data per user and relatively low revenue per user. In this case there’s one more big trick to keep DB costs down: cluster the whole table by customer_id. Like the index problem, the row data can be inefficient. If your disk layout randomly places rows into pages, chances are there’s only one row for a given client on a given page. If you need to evaluate 1000 rows to do a a query, you’ll have to read close to 1000 pages, and IO is slow/expensive. You’ll use a ton of memory caching pages where very few of the rows are for hot/active users Note: this problem only really matters if your rows are small and you can fit many rows per page. To fix, cluster the table by customer_id. On MySQL+InnoDB this is easy (https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.h...). On Postgres there’s a CLUSTER command but it’s one time, locking and it will take forever; MySQL is the easier solution (and I really prefer Postgres). Lots of the NoSQL DBs allow you to control the physical layout. Once setup, your cache hit rates will go way up, total IO will go way down, and you'll see a pretty good speedup.

As always, benchmark a lot for your use case before implementing (but don't forget to benchmark the hot/cold use cases).

PS — I remember a post a few days ago about a company running a distributed SQLite (many DBs). Looked really interesting but can’t find the link. For smaller scales than millions of users, look into partitions.

ec109685 · on May 26, 2020

It was Expensify: https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...

ramraj07 · on May 26, 2020

Thanks for the insight into clustering. We do slightly similar things by using snowflake or redshift, but that's because we can afford few hundred millisecond timescales for the analytical queries. Good to know MySQL can cluster!

I have dreamed of personal projects where I'd have to handle a ton of data for an individual user (eg. Service where I'll put all the users office documents in their Dropbox under git on the cloud and give them fine-grained revision history), and the most cost effective solution I was able to imagine involved one sqlite per user, stored on something like EFS (or a very large EBS). Never took too many steps there though, mainly because it wasn't clear if the product was of any value.

mailbag · on May 25, 2020

I've maintained an enterprise saas product for ~1500 customers that used this strategy. Cross account analytics were definitely a problem, but the gaping SQL injection vulnerabilities left by the contractors that built the initial product were less of a concern.

Snapshotting / restoring entire accounts to a previous state was easy, and debugging data issues was also much easier when you could spin up an entire account's DB from a certain point in time locally.

We also could run multiple versions of the product on different schema versions. Useful when certain customers only wanted their "software" updated once every 6 months.

squiggleblaz · on May 26, 2020

We do that where I am. I think it's been in place for about twenty years - certainly more than a decade. We're on MySQL/PHP without persistent connections. There have been many questionable architectural decisions in the codebase, but this isn't one of them. It seems quite natural that separate data should be separated and it regularly comes up as a question from potential clients.

Each client's db is a little different due to some more questionable decisions (e.g. different features can result in different databases turning on, and most tables will have variable names). But it's not really any harder to write code that says "`select identifier from features_enabled` -> derive table name -> `alter (derived table name) add column baz`" (actually, nowadays we have an inhouse database migrator that does that for us, we just say "the table abstractly known as `foo` should have columns bar, baz, quux" and the migrator says "for this client, they have the feature uses_foo enabled with the identifiers A and B, so A_foo and B_foo should both have the columns baz, but do not; I will add it after `bar`".)

Perhaps it has discouraged us from seeking to use persistent connections. But that is not the biggest inefficiency in our system at this stage.

If I was designing the application from scratch, I think I would want a static schema, but database-per-tenant seems fine.

Mooty · on May 26, 2020

Actually it might have some simple benefits : clients are willing to pay to have a separated database and a separated server from anything else for security purpose. All of our clients at https://www.bluelearning.fr have their own separated DB when they choose it. So far it has been a huge hit, as most of them paid more just for this.

codecurve · on May 26, 2020

We set off building a new product using this approach a few years ago and wrote about our motivations here: https://medium.com/hackernoon/exploring-single-tenant-archit...

I can't remember all of the reasons that we ditched the idea, but it quickly became clear that we would be writing a lot of the core tooling ourselves, and we kept running into problems with AWS around dynamically provisioning the load balancers and certs for N boxes.

I wouldn't dream of trying to manage different schemas across all the databases either, that sounds like a serious headache that could only be mitigated by having separate codebases and separate development teams managing them.

If a customer needs custom fields, I would make them a first class citizen and the required `fields_definitions / field_values` tables to your database and let them manage those fields themselves.

I'm glad we ended up defaulting to a multi-tenant architecture even though we lost some of the proposed benefits (isolation, independent scaling etc).

ericol · on May 26, 2020

Not quite there... but almost.

At some point in the life of the product my company has, we were having a real bottleneck in the DB.

The solution was to "split" the data in different DBs, that would hold several clients. So, we could say we have the worst of both worlds (As code still has to deal with the "security" part).

It is even more complicated, as some table exist only in the "main" db, and we fetch data from there constantly.

So far (Probably 10+ years since we implemented this).

We haven't had any real issue because of this, and we have developed small in house tools to keep things tidy (like schema updates or data inspection).

Server is MySQL.

hardwaresofton · on May 26, 2020

Schemas[0] are the scalable way to do this, not databases, at least in Postgres.

If you're going to go this route you might also want to consider creating a role-per-user and taking advantage of the role-based security features[1].

That said, this is not how people usually handle multi-tenancy, for good reason, the complexity often outweighs the security benefit, there are good articles on it, and here's one by CitusData[2] (pre-acquisition).

[0]: https://www.postgresql.org/docs/current/ddl-schemas.html

[1]: https://www.postgresql.org/docs/current/ddl-rowsecurity.html

[2]: https://www.citusdata.com/blog/2016/10/03/designing-your-saa...

krainboltgreene · on May 26, 2020

Schemas should be approached with extreme caution. They still suffer from the same issues as multiple databases, but now also you have to juggle search path and when I used schemas our databases backups were insane. This is because two same name tables with same name indexes can't be compressed together, as I understand it.

I would also be very careful about using roles this way as it makes connection pool formulas a lot more magical and less deterministic.

kiloreven · on May 26, 2020

We're a small company (~50 customers) delivering SaaS using Django/Postgres/uWSGI for a niche B2B market where privacy and data confidentiality is paramount.

Currently we deploy one DB + unique uWSGI instances for each customer. This has some drawbacks which has made us look a bit into multi-tenancy as well. Everything is served on dedicated hardware, using common codebase, and each customer is served on a unique sub-domain.

The two primary drawbacks of running unique instances for each customer are ease of deployment and utilization of resources.

When a new customer is deployed we need to set up the database, run migrations, set up DNS, deploy the application, deploy the task runner, set up DNS and configure the HTTP vhost. Most of this is painfully manual right now, but we're looking into automating at least parts of the deployment.

In the future, we aim to offer an online solution for signup and onboarding, where (potential) customers can trigger the provisioning of a new instance, even for a limited demo. If we were doing multi-tenancy that would just require a new row in the database + some seed data, which would make the deployment process exceptionally simpler.

The other issue is the utilization of resources. Running a few instances of the application with a big worker pool would be much easier to scale than running 50+ instances with their own isolated worker pool.

We're considering maybe going for a hybrid multi-tenant architecture, where each customer has their own isolated DB, but with a DB router in the application. That would give us a compromise between security (isolated databases - SQL queries don't cross into another customer's data) and utilization (shared workers across customers). But this would add another level of complexity and new challenges for deployment.

Do anyone have a similar case as this?

Fiahil · on May 26, 2020

We're deploying unique kubernetes cluster per client with their own application, database, task runner and, yes, DNS. Unlike your situation, most of it is completely automated ( :) ) on Azure and AWS using terraform, good old bash scripts and a custom go CLI we maintain.

Each client is billed for their own resource usage and we can have version disparities between clusters.

On the downsides, maintenance, upgrades and deployments take more time, but we are thinking about potential solutions for managing a fleet of k8s clusters.

This approach makes a lot of sense for B2B customers, and I would add that it's better to separate everything down to the infrastructure level, rather than stopping at the database schema. I would probably do it again in a similar situation !

machiaweliczny · on May 26, 2020

In such case if server gets exposed then one can have creds to all DBs. Can this be avoided somehow?

harel · on May 26, 2020

I considered it a few years ago for a project. My options were one db per account, one schema per account or good old fashioned multitenant. The first two options dropped after realising what a maintenance nightmare that can be. In my scenario I would have thousands and thousands of accounts. So a migration becomes a crazy notion you'll be trying to avoid. The same applies for multi schema - there is a limit of how many schemas you can run reasonably before performance is affected.

What I ended up doing is going for a multi tenant architecture, but I built it in such a way where the data is transient from one db to another. All account data lives in a separate 'master db', and the data in another. All ids are UUIDs. I build database routing into the application so each "account" can have it's data on a different database, and if needed I can move one account's data from a loaded db to a more available one.

paulmendoza · on May 26, 2020

I have used this architecture at 2 companies and it is by far the best for B2B scenarios where there could be large amounts of data for a single customer.

It is great for data isolation, scaling data across servers, deleting customers when they leave easily.

The only trick are schema migrations. Just make sure you apply migration scripts to databases in an automated way. We use a tool called DbUp. Do not try to use something like a schema compare tool for releases.

I have managed more than 1500 databases and it is very simple.

mariocesar · on May 25, 2020

The now closed Ubuntu One FileSync service (a Dropbox like service) had a 1 database per user approach. And they were actually using it in top of SQLITE I think. The project is opensource now. And is based on U1DB https://launchpad.net/u1db but I know it didn't get lot of traction.

SkyMarshal · on May 26, 2020

In Postgres, Schemas are intended to solve this problem - enable user account isolation without creating multiple databases:

https://www.postgresql.org/docs/current/ddl-schemas.html

Given how popular Postgres is, I’m sure there are lots of teams using this architecture.

a-wu · on May 26, 2020

CloudKit actually does this.

https://www.vldb.org/pvldb/vol11/p540-shraer.pdf

> within each container CloudKit uniquely divides the data- space into many private databases that manage user-scoped data

> Each private database belongs to exactly one container and one user

oldmanhorton · on May 26, 2020

Azure has a product built specifically for this, so it must be at least vaguely common. The rationale given in docs is: "A common application pattern is to provision a single database for each customer. But different customers often have varying and unpredictable usage patterns, and it's difficult to predict the resource requirements of each individual database user."

https://docs.microsoft.com/en-us/azure/sql-database/sql-data...

vyrotek · on May 26, 2020

We used this successfully at my company many years ago when it was first released. Even automated updates through DevOps. Have there been any recent big changes that have improved it? Been looking at it again.

_hl_ · on May 25, 2020

This will be very inefficient due to the way DBMS commonly lay out data in pages. And if you want to do any kind of aggregate queries (e.g. analytics) you're probably in for some royal pain.

If you want to do this for security, why not layer the DB behind some system that requires and verifies the users access tokens for each request?

The only situation where such a setup might make sense is when you actually need per-user migrations to cater to specific customer's needs, but then you'll make it very hard to work with all customer's data through a generic interface.

gomox · on May 26, 2020

Most enterprise systems either don't require or deliberately forbid mixing different customers' data in aggregate queries. So it's a bad use case to optimize for.

edoceo · on May 26, 2020

I have similar. One PG database per tenant (640). Getting the current DSN is part of auth process (central auth DB), connect through PGBouncer.

Schema migrations are kind of a pain, we roll out changes, so on auth there is this blue/green decision.

Custom fields in EAV data-tables or jsonb data.

Backups are great, small(er) and easier to work with/restore.

Easier to move client data between PG nodes. Each DB is faster than one large one. EG: inventory table is only your 1M records, not everyone's 600M records so even sequential scan queries are pretty fast.

crazygringo · on May 26, 2020

I guess the question is, why do you want to?

The only real reason you mention is security, but to me this sounds like the worst tool for the job. Badly written queries accidentally returning other users' data, that makes it into production, isn't usually a common problem. If for some reason you have unique reasons that it might be, then traditional testing + production checks at a separate level (e.g. when data is sent to a view, double-check only permitted user ID's) would probably be your answer.

If you're running any kind of "traditional" webapp (millions of users, relatively comparable amounts of data per user) then separate databases per user sounds like crazytown.

If you have massive individual users who you think will be using storage/CPU that is a significant percentage of a commodity database server's capacity (e.g. 1 to 20 users per server), who need the performance of having all their data on the same server, but also whose storage/CPU requirements may vary widely and unpredictably (and possibly require performance guarantees), then yes this seems like it could be an option to "shard". Also, if there are very special configurations per-user that require this flexibility, e.g. stored on a server in a particular country, with an overall different encryption level, a different version of client software, etc.

But unless you're dealing with a very unique situation like that, it's hard to imagine why you'd go with it instead of just traditional sharding techniques.

wim · on May 26, 2020

We do this for our wiki/intranet SaaS app. In our case we don't use it for all data in the app though, but rather for a specific feature. One of the blocks customers can add to their wiki pages are form fields so they can build custom forms. The app then automatically generates a corresponding database in the backend to store records for their form, using a separate SQLite db.

Our original idea was to allow customers access to their form db directly so they could even run custom SQL queries on it and so on. In the end we actually never used that part (users just view and filter their records in the UI), so unless we still need this in the future, one could argue it's.. just slightly overengineered. It works well otherwise though, and migrations aren't really a problem for us, because they're done dynamically by the app (when people make changes to their form). The scaling is also no issue as one db will always be small enough to fit on one server, so we can "shard" easily by file. So I think the only obvious lesson here was make sure you really really need it (and if migrations/scaling would have meant a lot of overhead, I don't think we would have even considered it).

amanzi · on May 26, 2020

WordPress Multisite gives each blog a set of tables within a single database, with each set of tables getting the standard WordPress prefix ("wp_") followed by the blog ID and another underscore before the table name. Then with the hyperdb plugin you can create rules that let you shard the tables into different databases based on your requirements. That seems like a good model that gives you the best of both worlds.

gomox · on May 26, 2020

My hunch is that they did this because that was a reasonable way to operate in the old shared hosting days (where you got a single application and a single database server). A lot of people would also run multiple "blogs" from one account.

This requires application level code to manage the different sets of tables and the corresponding access permissions. Unless your use case requires this flexibility, I think one database per tenant is a better default architecture.

xmodem · on May 26, 2020

I have a bit of experience with this. A SaaS company I used to work with did this while I worked there, primarily due to our legacy architecture (not originally being a SaaS company)

We already had experience writing DB migrations that were reliable, and we had a pretty solid test suite of weird edge cases that caught most failures before we deployed them. Still, some problems would inevitably fall through the cracks. We had in-house tools that would take a DB snapshot before upgrading each customer, and our platform provided the functionality to leave a customer on an old version of our app while we investigated. We also had tools to do progressive rollouts if we suspected a change was risky.

Even with the best tooling in the world I would strongly advise against this approach. Cost is one huge factor - the cheapest RDS instance is about $12/month, so you have to charge more than that to break even (if you're using AWS- we weren't at the time). But the biggest problems come from keeping track of scaling for hundreds or thousands of small databases, and paying performance overhead costs thousands of times.

gomox · on May 26, 2020

You don't need an RDS per customer. In MySQL jargon, an RDS instance is like a MySQL "server". You just need a "database" per customer.

On the other hand, if a tenant pays you less than $12 a month or any amount where the hosting costs are significant in the grand scheme, then I agree that this is likely not a good architecture.

sk5t · on May 26, 2020

If you make one database per tenant, then some issues to consider might nevertheless include maintaining many versions of the schema, separate backup scripts (other than backing up the entire cluster/instance), busting query cache, busting connection pooling, baking into the app layer (uncommon) logic to get a connection to the proper database, some complexity to collect usage-based info for billing or anomaly detection; others?

gomox · on May 26, 2020

Many versions of the schema: indeed, that's one of the main tradeoffs. I consider this to be a feature, not a bug, in an enterprise application context. The correct version of the schema that corresponds to a given version of the code is unique, which dramatically simplifies debugging and migration.

Separate backup scripts: only if you want, you can always backup the whole database server/instance at once (i.e. an RDS snapshot). On the other hand, you can do more fine grained backup controls if you want to (and with enterprise customers, you likely will). Need short data retention for an EU based healthcare provider? Good luck purging that data out of your consolidated cluster-wide backups.

Busting query cache: I don't think this setup is detrimental to query cache functionality, unless you have many queries shared between tenants, which you generally don't in this type of setup (i.e. frequent queries like "SELECT * FROM post_categories WHERE tenant_id = XX" is not a cache entry that can be shared between tenants anyway).

Busting connection pooling: not necessarily, but if you want separate DB credentials for each tenant, yes. If that is not an acceptable tradeoff, you can reuse connections, at the expense of lessened data isolation between tenants.

Connection to the proper database: I would argue there's nothing uncommon about connecting to a specific database, but I'm not sure I see your point. You just issue a "use XX" statement before you start querying, or connect with different credentials if not reusing connections.

Collection of data across tenants: yes, cross tenant analytics gain overhead and complexity, but those are typically not part of the application code anyway because most of the time you need to manage separate clusters anyway (region specific, fault tolerance, dedicated, etc).

Anomaly detection at the database level becomes more complex, but (a) in my opinion, it's the kind of custom problem that you will custom solve anyway (so no huge cost savings) and (b) a lot of it happens at higher layers (New Relic, etc) which are not impacted by this architecture choice.

sk5t · on May 26, 2020

I should have included the context that my notes pertain somewhat more to Postgres than other database systems; "USE" may work for MSSQL (and others) where the database is treated as more of a namespace concept, whereas Postgres requires a new connection per database. If you've got 1,000 databases on a Postgres cluster and are limited to 1,000 connections, there's going to be a tradeoff in connection setup latency vs. connection concurrency vs. efficient resource use. Or, spend time writing a dynamically weighted set of pools? Oof.

A parameterized query like "SELECT * FROM post_categories WHERE tenant_id = ?" would very much be cacheable by the query planner. Stuff like "does tenant_id have an index? is it worth using the index based on its histogram? are those index pages already in memory?" etc.

gomox · on May 26, 2020

I can see how that can be a problem. I run MySQL but other commenters said that Postgres has "schemas" which appear to be more similar to MySQL's "databases".

In the same way I thought you meant the query result cache, not the query plan cache. In my experience query planning hasn't been a significant bottleneck, but that's got to be workload dependent.

alberth · on May 26, 2020

Virtual Private Databases.

What a lot of enterprise SaaS vendors do is have one single database for all customer data (single tenant). They then use features like Virtual Private Database to hide customer A data from customer B. So that if customer A did a “select *” they only see their own data and not all of the other customers data. This creates faux multi-tenancy and all done using a single db account.

sk5t · on May 26, 2020

AKA row-level security - my gut says this feature is not too frequently used in SaaS, to the extent it may bust connection pooling, requires synchronization of database accounts and application identities, etc.--or else pushing some kind of non-db-user session/security context into each query. Similar issues one would see with MSSQL and Kerberos impersonation of the user principal from the app server to the database.

tedeh · on May 26, 2020

This sounds very much like Row Level Security, but I've never heard the term "Virtual Private Database" to describe it.

alberth · on May 26, 2020

It’s the same. Sometimes also referred to as Label Security. Conceptual all accomplishing that same goal.

Thristle · on May 26, 2020

What we do at my current job is server per multiple accounts each server holds 500-1000 "normal sized" customers and the huge or intensive customers get their own server with another 10-50 customers Currently moving from EC2 + mysql 5.7 to RDS, mainly for ease of managing.

However, we dont use a tenent id in all tables to differentiate customers we use (confusingly named) DB named prefix + tenent id for programatically making the connection.

Have a single server + db for shared data of tenents like product wide statistics, user/tenent data and mappings and such things. In the tenent table just have column for the name of the DB server for that tenent and that's pretty much it. Migrations are handled by an internal tool that executes the migrations on each tenent DB and 99% of the time everything works just fine if you are careful on what kind of migration you do and how you write your code

Some pitfalls concern column type changes + the read replicas going out of sync but that was a single incident that only hurt the replica.

gregjor · on May 25, 2020

Seems impractical and slow at scale to manage even a few hundred separate databases. You lose all the advantages of the relational model — asking simple questions like “Which customers ordered more than $100 last month” require more application code. You might as well store the customer info in separate files on disk, each with a different possible format and version.

vyrotek · on May 26, 2020

Those queries are definitely convenient early on but eventually you shouldn't be making those against that system and instead aggregate the data into warehouse.

Technically, there are DBs that let you do cross shard queries. See Azure Elastic DB.

mozey · on May 26, 2020

When using a single db I'd highly recommend adding `account_id` to every single table that contains data for multiple accounts. It's much easier to check every query contains `account_id`, as opposed to checking multiple foreign keys etc. Depending on the db you can then also easily export all data for a specific account using filters on the dump tool

otar · on May 26, 2020

In my case this worked out pretty well. Other than data separation and ease of scaling database per-customer (they might have different behavior of read/write operations), they other benefit was that we could place customer's database in any jurisdiction, which for some enterprise customers appeared an important point, regulations wise...

mildavw · on May 26, 2020

The apartment gem enables multi-tenant Rails apps using the Postgres schemas approach described by others here.

It’s slightly clunky in that the public, shared schema tables, say, the one that holds the list of tenants, exists in every schema — they’re just empty.

I rolled my own based on apartment that has one shared public schema, and a schema for each tenant. Works well.

jacquesm · on May 26, 2020

As usual, it depends. If you have a few large customers and a large number of smaller ones I'd isolate the large ones in their own DBs and put the rest in a single DB. If any of the smaller ones become dominant then you move them out as well.

If all your customers are individuals or small accounts then I'd put them all in a single DB, but I'd still build in the option to redirect to another DB if the software has applicability to enterprise level customers.

Note that it is perfectly OK to upsell the capability to isolate your customers data from each other, segregation options could be offered at multiple levels (all in one bucket, different DB, different cluster), each with their own pricepoint. Some customers will simply demand this and move elsewhere if you don't offer it (and they are typically quite insensitive to price as long as the right boxes get checked for their auditors).

zupa-hu · on May 26, 2020

I did something like this early on for a website builder using MySQL but it very quickly hit bottlenecks.

I also wanted one-filesystem-per-customer support and full isolation as in Docker, so I ended up writing a new OS to support all of this natively. It runs in production on boomla.com, a hosted website builder that is also a programmable platform.

WARNING: it gets even weirder, databases can be nested, a database entry is also a file and a directory at the same time, or a nested database if you want. There is no way understood this sentence. :) For example, you can store files in files. As in, style.css/background.png, etc.. Every change also creates a new filesystem snapshot, which may also be nested. All right I stop here.

This is not as much an advice for OP (to write a new OS) but more like a plug for the intellectually curious to explore a parallel universe. :)

wheelerwj · on May 25, 2020

Seems pretty odd. The closest example I can think of would be maybe salesforce? Which basically, as far as I can tell, launches a whole new instance of the application (hosted by heroku?) for each client. I'm not a 100% sure about this, but i think this is how it works.

snuxoll · on May 25, 2020

Not at all how Salesforce works, they take a lot of pride in their multi-tenant setup (for better or worse). Every org on a given instance shares the same application servers and Oracle cluster.

If I were to make a Salesforce competitor that’s one thing I would do differently, with tools like Kubernetes it’s a lot easier to just give every customer their own instances. Yes, it can take up more resources - but I cannot imagine the security nightmare involved with letting multiple customers execute code (even if it’s theoretically sandboxed) in the same process, plus the headache that is their database schema.

nitwit005 · on May 26, 2020

Salesforce has a lot of trial or developer orgs, which would be quite expensive if you tried to host them all as some sort of VM in a cloud hosting companies.

I assume most of them end up as a few rows in some set of DBs, which is far cheaper.

snuxoll · on May 26, 2020

Salesforce charges and arm and a leg for extra sandbox instances anyway, and at that scale you aren’t paying Amazon for compute either.

A full-copy sandbox past the included one for Unlimited Edition orgs is something like 30% your annual spend.

nitwit005 · on May 26, 2020

You can fill out a form and cause a free org to exist on their main site, so hundreds or thousands probably get created a day. It looks like they expire in 30 days, so if you spun up an app and DB for each it would get expensive quick.

cdcarter · on May 26, 2020

As snuxoll writes, Salesforce does use a shared database with tenant_id (org_id) as a column on every table. You can read a lot about our multi-tenancy mechanisms in a whitepaper published a while back [https://developer.salesforce.com/page/Multi_Tenant_Architect...].

shakna · on May 25, 2020

There aren't a lot of benefits to doing it. If you have frequent migrations, then it probably isn't something you ever want to do.

For a site I run, I have one large shared read-only database everyone can access, and then one database per user.

The per-user DB isn't the most performant way of doing things, but it made it easier to:

+ Encrypt an entire user's data at rest using a key I can't reverse engineer. (The user's DB can only be accessed by the user whilst they're logged in.)

+ Securely delete a user's data once they delete their account. (A backup of their account is maintained for sixty days... But I can't decrypt it during that time. I can restore the account by request, but they still have to login to access it).

There are other, better, ways of doing the above.

programmarchy · on May 26, 2020

Can you elaborate on how you achieved encryption at rest with a key you can’t access? I’m assuming the key is sent in an authorization header, then lives in memory for the duration of the session, but wondering what your tool chain looks like.

shakna · on May 27, 2020

Pretty much.

The encoded key is sent, once, in a header, and then stored in a secure session cookie, that has a reasonable timeout on it, and is user-revokable, and is encrypted in memory server-side, unless it is being accessed.

(Setting up session cookies to only decrypt when being accessed sort of required reinventing the wheel, as that's apparently not something anyone goes to the effort of usually, and sends you down some optimisation paths around timing that will have you pulling out your hair).

User-revokable session cookies are simple enough - each user gets their own session cookie key, and they can roll that over from a settings page.

Worth noting: This is a great way to decimate your server performance, because most websites aren't constantly handling decryption.

The prototype was written for Flask [0], then rewritten for Bottle [1] when it was clear I wasn't using 90% of the Flask stack, and monkeypatching most of what I was using. Nowadays it's a strange mix of Hug [3] and Bottle.

But there's nothing there that's unique to Python or even the framework. It's easily doable in just about any language. I made three prototypes when I was coming up with this batty idea, the Flask prototype, one for vibe.d (D), and one for Go. I settled on Python for no particular reason. They all had similar performance, because encryption became the bottleneck.

[0] https://flask.palletsprojects.com/en/1.1.x/

[1] http://bottlepy.org/

[2] https://hugapi.github.io/hug/

programmarchy · on May 27, 2020

Thanks for sharing, that’s an interesting approach. Does seem very hard to scale. Do they just set their key from a settings page and then off to the races? i.e. no login credentials?

shakna · on May 27, 2020

Certificate file. Generated on registration and handed over as a download and shredded server-side. Not as trust-fulfilling as a user supplying one, but less of a learning curve. (Still need validation on it either way, which can be painful).

Which, of course, means "forgot my password" doesn't work.

withinboredom · on May 26, 2020

Haha, oh, for a second there I thought you meant tables. But leaving the below..

How about dozens per account? :) I didn’t ship this, but I work for Automattic and WordPress.com is basically highly modified WordPress MU. This means every time you spin up a site (free or otherwise) a bunch of tables are generated just for that site. There’s at least hundreds of millions of tables. Migrating schema changes isn’t something I personally deal with, but it’s all meticulously maintained. It’s nothing special on the surface.

You can look up how WordPress MU maintains schema versions and migrations and get an idea of how it works if you’re really curious. If you don’t have homogeneous migrations, it might get pretty dicey, so I’d recommend not doing that.

tilolebo · on May 26, 2020

I have no experience in that field, but the isolation brought by a 1 customer <-> 1 DB relationship sounds VERY appealing to me. I'm talking about schemas, no dedicated database servers.

Creating dedicated database app users with the proper permissions pretty much guarantees that you'll never serve data to the wrong customer.

It also probably makes bug and performance troubleshooting much easier.

The biggest downside is probably the maintenance overhead, but I suppose automation can mitigate it quite well.

It maybe makes internal reporting a bit harder, as you can't get the whole data from a single SQL query. You'd have to handle this in the application layer.

brigandish · on May 25, 2020

This talk may be helpful[1]. It's given by Jeremy Evans, the maintainer of Sequel, and it's about how he's made Roda (a Rack-based web framework) more secure than your average web framework does by using database security and some of the features of databases that all too often are overlooked by app developers. You could possibly use Roda for the authentication phase of a Rails app (among other things) but the insights will be helpful regardless.

[1] https://www.youtube.com/watch?v=z3HZZHXXo3I

theshrike79 · on May 26, 2020

We did this in a company long long time ago, each customer had their own Access database running an ASP website. Some larger migrations were a pain, but all upgrades were billed from the customers, so it didn't affect anything.

If you can bill the extra computing and devops work from your customers, I'd go with separate environments alltogether. You can do this easily with AWS.

On the plus side you can roll out changes gradually, upgrade the environments one user at a time.

Also if Customer X pays you to make a custom feature for them, you can sell the same to all other users if it's generic enough.

JamesBarney · on May 26, 2020

We weren't able to find any silver bullets. We used an orm to split applications for non enterprise clients. And then just installed a duplicate application+db in the cloud for enterprise clients and made sure to charge them for it.(30k+ annual subscription fee)

So to upgrade the application we'd upgrade the everyone application and then do the same with each enterprise client. We only had a few.

I'd recommend trying to avoid it if you can, and if you can't charging the appropriate amount of money which is an arm and a leg.

andrea_sdl · on May 26, 2020

We did it many years (11) ago for a B2C Product, around 200K users (free, 5k paid). We both had a common Postgres where we stored generic info and the customer database in Derby (product was in java).

At the time we weren't that much experiences so we also implemented migrations (whereas today I would have chosen flyway or something similar).

Customer data was text based so we didn't have that many custom fields but at the sime time we had many migrations during the years.

So, would we still do it? Yes and no. It was amazing because it let us isolate each user. With Filesystem backups we could restore just in time the data from each user without having to reset the whole database and with little to know system knowledge.

At the same time it restricted our choices when improving the product. If we had to do a whole refactor of the project we always thought "How do I migrate all the users together?" We had to think about versioning, upgrade in different times and so on.

From a tech standpoint it wasn't much of a problem, but it burdened the project a little bit. Bigger refactorings are harder and you need much more depth when you think about big new features.

I still think it was a nice solution, but this solution came also because we didn't knew a thing about partitioning in databases and we weren't expert in clustering. All in all it allowed us to create a product sustaining a decent customer base with low knowledge for that time.

With today technology I'd test if we can do it other ways.

joshuanapoli · on May 26, 2020

Database per tenant makes a lot of sense for enterprise applications where a hard partition is desired. This does let you run different versions of your service for different customers... but managing the replicated infrastructure will be much easier if you can keep them all on the same version. Enterprise customers may want to control when upgrades are released to them... I think that it's much easier to continuously upgrade the server but hold back features via flags.

winrid · on May 25, 2020

One company in SF using MeteorJS deploys a whole container with DB and everything per customer.

I think their primary reasoning is that Meteor doesn't scale super easy, so it was easier to just "shard" the whole stack per customer.

Personally, it's a lot of work. It depends on what you're doing to make the tradeoffs worthwhile.

I see this as an optimization. Build everything so you can deploy once for all your customers. If you need to shard by customer later, it's just an infrastructure problem.

formition · on May 26, 2020

I did this with MS SQL and .NET CORE for members.org.au ,they share a common app infrastructure with is scalable and based on the url the database connection string changes.

vyrotek · on May 26, 2020

How did you roll out DB schema changes to everyone? I've used Azure Elastic DB do something similar through DevOps in the past.

DecoPerson · on May 26, 2020

I worked for one of the biggest boarding school software companies. The only option was full-service, but clients could chose between hosted by us or hosted by them. We didn’t just do 1 database per school, we did entirely separate hardware/VMs per school. Some regions have very strict data regulations and the school’s compliance advisors tended to be overly cautious; they interpreted the regulations and translated them to even stricter requirements. These requirements were often impossible to satisfy. (How can the emergency roll call app both work offline AND comply with “no student PII saved to non-approved storage devices”? Does swap memory count as saving to a storage device?? Is RAM a “storage device”??? Can 7 red lines be parallel!?!?)

Shared DB instances would have been completely off the table. Thankfully, most boarding schools have massive IT budgets, so cost minimization was not as important as adding additional features that justified more spend. Also the market was quite green when I was there. Strangely, the software seemed to market itself; the number of out-of-the-blue demo requests was very high, so first impressions and conversion to paying clients was the primary focus.

michaelt · on May 26, 2020

> How can the emergency roll call app both work offline AND comply with “no student PII saved to non-approved storage devices”?

By having its offline data store approved as a storage device?

planxty · on May 26, 2020

I worked for a company that did this, and our scale was quite large. It took a lot of work to get AWS to give us more and more databases on RDS. We had some unique challenges with scaling databases to appropriately meet the needs of each account. Specifically, it was difficult to automatically right-size a DB instance to the amount of data and performance a given customer would need. On the other hand, we did have the flexibility to manually bump an account's database to a much larger node size if we needed to help someone who was running into performance issues.

I think the biggest problems had to do with migrations and backups. We maintained multiple distinct versions of the application, and each had a unique DB schema, so there was frequent drift in the actual schemas across accounts. This was painful both from a maintenance POV, and for doing things like change data capture or ETLs into the data warehouse for data science/analysis.

Another big problem was dealing with backup/restore situations.

I suspect this decision was made early in the company's history because it was easier than figuring out how to scale an application originally designed to be an on-prem solution to become something that could be sold as a SaaS product.

Anyway, I think choosing a solution that nets your business fewer, larger database nodes will probably avoid a lot of maintenance hurdles. If you can think ahead and design your application to support things like feature flags to allow customers to gradually opt in to new versions without breaking backwards compatibility in your codebase, I think this is probably the better choice, but consider the safety and security requirements in your product, because there may be reasons you still want to isolate each tenant's data in its own logical database.

planxty · on May 26, 2020

Forgot to mention: we were not a Rails shop, but I think the kind of code or framework being used isn't the most important challenge here.

tobyhede · on May 26, 2020

Years ago I worked for a startup that provided CMS and ecommerce software for small business. Each of our 3000+ customers had their own MySQL database.

We had a long tail of customers with negligible usage and would run several thousand MySQL databases on a single server. As customers scaled we could migrate the database to balance capacity. We could also optionally offer "premium" and "enterprise" services that guaranteed isolation and higher durability.

Scaling was never a real issue, but the nature of our clients was steady incremental growth. I don't think we ever had a case of real "overnight success" where a shared host customer suddenly melted the infrastructure for everyone.

However, managing and migrating the databases could be a real issue. We had a few ways of handling it, but often would need to handle it in the code, `if schemaVersion == 1 else`. Over time this added up and required discipline to ensure migration, deprecation and cleanuop. As a startup, we mostly didn't have that discipline and we did have a fair bit of drift in versions and old code lying around.

superdex · on May 25, 2020

What’s ‘scale?’

We do it, but everyone gets the same schema and same app. .net/IIS/sql server and when we update the schema, we apply to all dbs via in-house scripts. It provides security warm fuzzies to our clients that their data is separated from other clients. If you want to try and version the app/schema for different accounts, that’s where your headaches are, regardless of db model

madmax108 · on May 26, 2020

B2B CRM space startup. We have somewhat of a middle-ground approach. Our level of isolation for customers is at a schema-level.

What this means is each customer has her own schema. Now, large customers want to be single tenant, so they have a single schema on the entire DB. Smaller (SMB) customers are a bit more price conscious so they can choose to be multitenant i.e multiple schemas on same DB.

Managing this is pushed out to a separate metadata manager component which is just a DB that maps customer to the DB/schema they reside on. Connection pooling is at the DB level (so if you are multitenant then you may have lower perf because some other customer in the DB is hogging the connections)... But this has not happened to us yet.

Large customers are more conscious in terms of data so want things like disc level encryption with their own keys etc, which we can provide since we are encrypting the whole DB for them (KMS is the fave here).

We are not really large scale yet, so dunno what they major gotchas will be once we scale, but this approach has served us well so far.

Nican · on May 26, 2020

Stackoverflow's DBA had just posted about this: https://twitter.com/tarynpivots/status/1260680179195629568

He has 564,295 tables in one SQL Server. Apparently this is for "Stack Overflow For Teams"

skissane · on May 26, 2020

One model I have seen used successfully is a hybrid model in which the product is designed to be multi-tenant, but then it is deployed in a mix of single tenant and multi-tenant instances. If you have a big mix of customer sizes (small businesses through to large enterprises) – single-tenant instances for the large enterprise customers gives them maximum flexibility, while multi-tenant for the small business customers (and even individual teams/departments within a large enterprise) keeps it cost-effective at the low end. (One complexity you can have is when a customer starts small but grows big – sometimes you might start out with just a small team at a large enterprise and then grow the account to enterprise scale – it can become necessary to design a mechanism to migrate a tenant from a multi-tenant instance into their own single-tenant instance.)

taylorcooney · on May 26, 2020

There are definitely downsides to scaling out thousands of tenants - I've been told Heroku supports this, and at a glance I found this doc that says it may cause issues, https://devcenter.heroku.com/articles/heroku-postgresql#mult... but it really doesn't change whether you're on Heroku or not. At the end of the day it's just about your application structure, how much data you have, how many tables you have etc. Unfortunately the Apartment gem even has these problems, and even its creators have expressed some concern (https://zeph.co/multitenancy-without-subdomains-rails-5-acts...) about scalability with multiple schemas.

The acts_as_tenant gem might be what you’re looking for:

> This gem was born out of our own need for a fail-safe and out-of-the-way manner to add multi-tenancy to our Rails app through a shared database strategy, that integrates (near) seamless with Rails.

My recommended configuration to achieve this is to simply add a `tenant_id` column (or `customer_id` column, etc) on every object that belongs to a tenant, and backfilling your existing data to have this column set correctly. When a new account signs up, not a lot happens under-the-hood; you can create a row in the main table with the new account, do some initial provisioning for billing and such, and not much else. Being a multi-tenant platform you want to keep the cost really low of signing up new accounts. The easiest way to run a typical SQL query in a distributed system without restrictions is to always access data scoped by the tenant. You can specify both the tenant_id and an object’s own ID for queries in your controller, so the coordinator can locate your data quickly. The tenant_id should always be included, even when you can locate an object using its own object_id.

carapace · on May 26, 2020

It's not really the same thing, but the question reminds me that we almost had SQLite in the browser[1], but Mozilla spiked it in favor of IndexedDB[2] (yet "Firefox saves Web storage objects in a SQLite file"[3] so I dunno what to conclude from all that. SQLite is good enough for FF devs but not users?)

Anyway, if you have a web app you already have a DB-per-user, FWIW.

[1] https://en.wikipedia.org/wiki/Web_SQL_Database

[2] https://en.wikipedia.org/wiki/Indexed_Database_API

[3] https://en.wikipedia.org/wiki/Web_storage

0xbadcafebee · on May 26, 2020

At scale? It's not only usually unnecessary, it's a bad application<->data model, and will lead to fractured code, deploy processes, and bit rot. Depending on design.

Remember that a database is supposed to be shared. It's designed that way for performance reasons. If your big issue is you're constantly doing flyway and it's unmanageable, go schemaless, or refactor where/when/how you place your data. Rethink the architecture/design, not the tool. If it's a microservice-based application, remember that you are supposed to have independent state per service, so there shouldn't be one giant database anyway.

But for like 12 customers, sure, go for it. It's very common for "whale" accounts to get dedicated infra/databases while the rest get a common sharded layer.

rpg3 · on May 25, 2020

Seems like a ton of extra work with no real upside. For one, if your migrations fail to complete across all databases for whatever reason then you could hit a point where you have databases with differing schema.

Additionally, like someone else pointed out, trying to run any reporting data across multiple customers will become difficult code wise and less performant.

Realistically, if you are handling the sort of scale that would require more interesting scaling solutions for typical db software, you are most certainly making enough money to implement better approaches.

FWIW, I worked for a company that was handling a few hundred thousand customers with millions of order records on a relatively small AWS RDS server. Set up a database cluster and you're rolling for a while.

LeonB · on May 26, 2020

I believe that FogBugz used this approach, back in the day (with a SQL Server backend).

The reasoning was that customers data couldn't ever leak into each other, and moving a customer to a different server was easier. I vaguely recall Joel Spolsky speaking or writing about it.

tomaszwro · on May 26, 2020

I worked on a project with PostgreSQL schemas per tenant (almost like databases). Also worked on another one with "normal" account_id field separation.

I documented how they compare in a blogpost: https://blog.arkency.com/comparison-of-approaches-to-multite... - funnily it was waiting unpublished for some time, stumbling on your post made me finally publish it.

Looking forward to go through this comments question and see what others have experienced.

I have another draft in the making - about some of the pitfalls of PostgreSQL schemas approach, should be released soon.

rudyrigot · on May 27, 2020

Yes, we did it at Kenna Security. About 300 paying customers, but over 1000 with trials, and overall about 6B vulnerabilities being tracked (the largest table in aggregate). Some of the tables were business intelligence data accessible to all customers, so they were on a “master” DB that all could access; and some of the tables were fully multi-tenant data, so each customer had their MySQL DB for it.

The motivation was that we were on RDS’s highest instance and growing, with jobs mutating the data taking a less and less excusable amount of time.

The initial setup was using just the Octopus gem and a bunch of Ruby magic. That got real complicated really fast (Ruby is not meant to do systems programming stuff, and Octopus turned out very poorly maintained), and the project turned into a crazy rabbit hole with tons of debt we never could quite fix later. Over time, we replaced as many Ruby bits as we could with lower-level stuff, leveraging proxySQL as we could; the architecture should have been as low-level as possible from the get-go... I think Rails 6’s multi-DB mode was going to eventually help out too.

One fun piece of debt: after we had migrated all our major clients to their own shards, we started to work in parallel on making sure new clients would get their own shard too. We meant to just create the new shard on signup, but that’s when we found out, when you modify Octopus’s in-memory config of DBs, it replaces that config with a bulldozer, and interrupts all DB connections in flight. So, if you were doing stuff right when someone else signs up, your stuff would fail. We solved this by pre-allocating shards manually every month or so, triggering a manual blue-green deploy at the end of the process to gracefully refresh the config. It was tedious but worked great.

And of course, since it was a bunch of Active Record hacks, there’s a number of data-related features we couldn’t do because of the challenging architecture, and it was a constant effort to just keep it going through the constant bottlenecks we were meeting. Ha, scale.

Did we regret doing it? No, we needed to solve that scale problem one way or another. But it was definitely not solved the best way. It’s not an easy problem to solve.

enigma20 · on May 26, 2020

This question reminds me of some legacy system which I've seen in the past :D :D :D

In summary it was working in the following way:

There was table client(id, name).

And then dozens of other tables. Don't remember exactly the structure, so I will just use some sample names: - order_X - order_item_X - customer_X - newsletter_X

"X" being ID from the client table mentioned earlier.

Now imagine dozens of "template" tables become hundreds, once you start adding new clients. And then in the code, that beautiful logic to fetch data for given client :D

And to make things worse, sets of tables didn't have same DB schema. So imagine those conditions building selects depending on the client ID :D