> SQLite isn't just on the same machine as your application, but actually built into your application process. When you put your data right next to your application, you can see per-query latency drop to 10-20 microseconds. That's micro, with a μ. A 50-100x improvement over an intra-region Postgres query.
This is the #1 reason my exuberant technical mind likes that we use SQLite for all the things. Latency is the exact reason you would have a problem scaling any large system in the first place. Forcing it all into one cache-coherent domain is a really good way to begin eliminating entire universes of bugs.
Do we all appreciate just how much more throughput you can get in the case described above? A 100x latency improvement doesn't translate directly into the same # of transactions per second, but its pretty damn close if your I/O subsystem is up to the task.
How do any writes end up on other horizontally scaled machines though? To me the whole point of a database on another machine is that it is the single point of truth that many horizontally scaled servers can write to and read each others' updates from. If you don't need that, you might as well read the entire dataset into memory and be done with it.
I know TFA says that you can "soon" automagically replicate your sqlite db to another server, but it only allows writes on a single server and all other will be readers. Now you need to think about how to move all write traffic to a single app server. All writes to that server will still take several milliseconds (possibly more, since S3 is eventually consistent) to propagate around all replicas.
In short, 100x latency improvement for reads is great but a bit of a red herring since if you have read-only traffic you don't need sqlite replication. If you do have write traffic, then routing it through S3 will definitely not give you a 100x latency improvement over Postgres or MySQL anymore. Litestream is definitely on my radar, but as a continuous backup system for small apps ("small" meaning it runs and will always run on a single box) rather than a wholesale replacement of traditional client-server databases.
Litestream does a couple of things. It started as a way to continuously back sqlite files up to s3. Then Ben added read replicas – you can configure Litestream to replicate from a "primary" litestream server. It's still limited to a single writer, but there's no s3 in play. You get async replication to other VMs: https://github.com/fly-apps/litestream-base
We have a feature for redirecting HTTP requests that perform writes to a single VM. This makes Litestream + replicas workable for most fullstack apps: https://fly.io/blog/globally-distributed-postgres/
It's not a perfect setup, though. You have to take the writer down to do a deploy. The next big Litestream release should solve that, and is part of what's teased in the post.
> We have a feature for redirecting HTTP requests that perform writes to a single VM. This makes Litestream + replicas workable for most fullstack apps: https://fly.io/blog/globally-distributed-postgres/
Thereby making it a constraint and (without failover) a single point of failover? What’s the upper limit here?
This constraint is common to most n-tier architectures (with Postgres or MySQL) as well. Obviously, part of what's interesting about Litestream is that it simplifies fail-over with SQLite.
Once you have that constraint, it means you will either have the same network latency when writing to SQLite (if it is fronted by some lightweight proxy), or have a lot more frequent failover of SQLite (if it is running embedded within the app, thus following the app's deployment schedule).
I suppose if someone decides to deploy Postgres/MySQL replicas as a sidecar, then it will be the same as what you will end up with?
Yes: nobody is claiming otherwise. SQLite drastically speeds up reads, and it speeds up writes in single-server settings. In a multi-server setting, writes have comparable (probably marginally poorer, because of database-level locking, in a naive configuration) performance to Postgres. The lay-up wins of SQLite in a multi-server environment are operational simplicity (compared to running, say, a Postgres cluster) and read acceleration.
> The lay-up wins of SQLite in a multi-server environment are operational simplicity (compared to running, say, a Postgres cluster) and read acceleration.
What's the operational simplicity? You still have to do backups and replication and SSL. Maybe you don't have to worry about connectivity between the app and the database? Maybe auth?
I mean, there are managed SQL services too. Comparing managed SQLite to DIY Postgres seems disingenuous.
EDIT: I didn’t expect this to be controversial, but I’d like to know where I’ve erred. If you need lightstream to make SQLite operationally simple (beyond single servers, anyway), that seems pretty analogous to RDS to make Postgres operationally simple, right?
>I didn’t expect this to be controversial, but I’d like to know where I’ve erred
I don't think what you are saying is controversial but it feels like you are being contentious for no reason. Your rebuttal doesn't even make sense - telling me I can pay someone to manage it for me, is not the same as it not needing management at all.
Whether I manage it, or someone else does; there is still an operational or financial cost.
I didn't downvote you. Postgres as a database server is operationally more complex when compared to Sqlite. Since Postgres is a network service, you have to deal with networking and security. Upgrading Postgres is a big task in and of itself. Backups has to happen over the network.
Number of network connections is another sore point. One of Postgres' biggest pain point is the low number of connections it supports. It is not uncommon to have to run a proxy in front of Postgres to increase the number of connections.
Sqlite gives you so much for free as long as you can work within its constraint, which is single writer (for the most part.)
You have a more complex network setup actually. You have north-south traffic between your client->LB->servers. and you have east-west traffic between your servers for sqlite replication. Both happening on the same nodes and no isolation whatsoever. More things can go wrong and will require more tooling to disambiguate between different potential failures.
W.r.t security, you have same challenges to secure east/west vs north/south traffic. W.r.t # of connections, Postgres has a limit on number of connections for a reason – if you are running a multi-process or milt-thread app framework that's talking to sqlite, you have just traded connection limit to concurrent process/thread access limit to sqlite. I don't know if one is better than other – it all depends on your tooling to debug things when things inevitably fail at redline stress conditions.
You're technically right, which is to say: Not at all.
You don't replicate east-west with Litestream, you replicate north -> south where the "North" is your writer and your "South" is some S3-like service. Basically every application has "Some S3 Like Service" as a dependency now, so that's not a new network connection.
You make a really good point about trading Sqlite write limits to the limited number of Postgres connections. My comment is mostly about having to run a proxy, which is another service that ends up being a liability. Regardless, if you need multi-writers, then the obvious solution is Postgres.
"Both happening on the same nodes and no isolation whatsoever. More things can go wrong and will require more tooling to disambiguate between different potential failures."
As a Postgres fan, the extra Postgres service to me is a far bigger liability than Sqlite. The east-west networking is true for Postgres replication as well, so it's a wash to me.
> Upgrading Postgres is a big task in and of itself.
Learnt it the hard way when I first upgraded the major version, Only to realize that the data needs to be migrated first. pg_upgrade requires binaries of the older version and so we need copies of data, as well as binaries of old & new version of postgres[1] i.e. if not manually dumped; Fortunately it was just my home server.
It still has to run in a server process on a server host, and virtually all of the challenges of managing a database server are just moved up to the application layer. There are fewer actual hosts, but for the likes of fly.io or a cloud provider a difference of a handful of hosts is negligible because hosts are cattle rather than pets.
There might be advantages to SQLite (e.g., maybe lightstream's replication/backup/etc story is simpler than Postgres's), but "number of hosts" or "number of server processes" doesn't seem compelling to me.
Of course, it has nothing to do with the number of hosts or server processes, and I didn't see anyone claim that it does.
It's difficult-to-impossible to truly run Postgres as a proper "herd of cattle", due to the need to failover replication roles between reader and reader-writer, ensure replication is in sync on startup before allowing reads, handle upgrades carefully, etc. If you're using something like RDS or another managed Postgres, this is still the case, it's just being handled for you by someone else.
So it's not that you're just reducing the number of hosts; you're eliminating an entire class of hosts that have complex state and maintenance requirements. Your application processes are hopefully a herd of identical cattle and can be killed off and relaunched at will, and that property remains the same after introducing Lightstream.
No; there's no such thing as a sqlite3 server. The database is the file(s). Litestream runs alongside everything else using sqlite3 and ensures that it's replicating. If Litestream crashes, reads from the database keep working fine (though, of course, they'll start to stale if it doesn't come back up).
This is why we called out in the post that Litestream is "just sqlite3". It's not sitting between apps and the database.
That seems disingenuous. If sqlite3 isn't a server, then neither is apache2. But in reality they're both binaries 'serving' 'files' over an interface. You're just hosting them on the same machine, reverting to a monolith-style deployment. Which is fine, but then lets call it what it is.
That’s my point though, if you want to use SQLite in production / with Litestream, you’re comparing two databases that need servers to function just implemented quite differently.
In the same vein as your article, it seems very fair to say that if the Litestream server went down in production you’d have a broken app for most use cases within a few minutes. So in practical effect, the server of Litestream is about as essential.
> Litestream crashes, reads from the database keep working fine.
fly-app's litestream-base dockerfile suggests that the litestream process supervises the app process... I guess then that's a limitation specific to fly.io's deployment model and not litestream?
> have a lot more frequent failover of SQLite (if it is running embedded within the app, thus following the app's deployment schedule).
That does sound like it's going to be difficult to get right. But if Litestream eventually implements a robust solution for this problem, then I think some added complexity in the deployment process will be a reasonable price to pay for increased app performance the rest of the time.
For what it's worth, I think this problem (the complexity that bleeds into the app for handling leaders) is mostly orthogonal to the underlying database. You have the same complexity with multi-reader single-writer Postgres. But the code that makes multi-reader SQLite work is a lot easier to reason about.
Unless I'm misunderstanding something, I do think using SQLite makes a significant difference in the complexity of app deployment. When using multi-region Postgres, it's true that you only want the Postgres leader to be accessed by app instances in the same region, so the app instances all have to know which region is running the leader. But multiple app instances in that region can connect to that Postgres leader, so it's easy to do a typical rolling deploy. With SQLite, only one app instance at a time can write to the database, so IIUC, there will have to be a reliable way of doing failover with every app deploy. I suppose the same thing has to happen in the Postgres scenario when updating Postgres itself, but that's way less frequent than deploying new versions of the app.
You can have two app instances writing to the database concurrently, as long as they are running on the same machine. Then it’s possible to deploy a new version without downtime by starting the new binary, switching traffic to the new instance, draining connections to the old instance, and shut it down. That’s kind of how nginx for example upgrades itself without downtime. That’s not the usual way nowadays with containers and VMs, but it was very common before. It’s definitely doable with SQLite, and not very complex.
No. The number of processes requests in unchanged. They are just dispatched between 2 instances: the new one and the old one. Actually, all new requests are going to the new instance, and only the pending requests are processed by the old instance.
Agreed, RAM usage may be slightly elevated during the switch. But it doesn’t seem to be a big issue in practice. nginx for example is updating itself like this and it works fine.
On the other hand, your application code is probably less reliable than Postgres or MySQL and now it can bring down your master, so failover is a more likely scenario. Probably not "worse", just "different".
I don't think so. Single-master, multi-reader is the most common clustering pattern for all these databases. If you lose the app server that's handling writes in any of those systems, you have the same problem. Meanwhile, when your app server is down, the sqlite database is just fine, and so is Litestream.
I'm not thinking very careful in answering this, but I think there's a subtlety I'm not capturing well here, and that it's not the case that this pattern has poorer fail-over than clustered Postgres. I think there are more things that can go wrong with Postgres.
IMO the benefit to n-tier is that you can have multiple instances on your app tier with a single-master multi-reader database tier, and that makes the system resilient to app server crashes.
For a real-world example, some time ago I wrote an admin tool as part of a production monolith, for a rare (but frequent-enough) internal use case where a user could put in a URL, the app server would fetch the remote HTML, process it, and save the results to a database. A few months later, we start getting weird error reports that the server is crashing. Sure enough, a malformed document could cause enough recursion to exceed the Python maximum recursion depth and crash the process.
If this had been the single process running server-side SQLite, even if multithreaded and/or concurrent (with e.g. something like gevent), the crash would have taken down our entire site for however long it would take to restart the process. But since there were other app server processes eager and ready to take ongoing requests, the only disrupted user was the one trying to use that admin tool, not all the other customers using the product at the time. Said user was confused, but was very glad they didn't disrupt the critical work our customers were doing at the time!
Of course, one size doesn't fit all, and there are many services that are limited enough in scope, working with known and battle-tested components, that this wouldn't be a problem. But if I make the conservative assumption that any process that is running code that I myself wrote, no matter how simple-seeming, has a much higher chance of segfaulting or otherwise crashing than a standalone professional database... then that late-night ping is much more likely to be something that can be handled the following morning.
It's worth noting that "single-writer" refers to a single machine, not a single process. Multiple programs can write to the same SQLite DB at once (although contention can be a problem, etc etc). So, if that admin tool was running on the same machine plugged into that same SQLite file, it could crash and leave the main customer-facing services running fine.
There's nothing that's quite ready to look at for easy failover, but this is a big priority. We absolutely have to figure out failover during deploys. It's doable we just want to reduce the operational complexity as much as we can.
> if you have read-only traffic you don't need sqlite replication.
I agree with you that the main use-case here is backup and data durability for small apps. Which is pretty big deal, as a database server is often the most expensive part of running a small app. That said, there are definitely systems where latency of returning a snapshot of the data is important, but which snapshot isn't (if updates take a while to percolate that's fine).
I’d argue that persistence also includes data portability. It’s very handy to be able to just copy your data around in a SQLite file. That’s not really a feature that’s terribly useful in a remote deployed application, but very handy if you have multiple applications all reading the same data.
I do understand the point of running SQLite in-process to speed up reads.
I do not understand why SQLite must also handle intense write load with HA, failover, etc.
I would rather have the best of both worlds: a proper DB server (say, Postgres) replicated to super-fast and simple read replicas in SQLite on every node.
(My ideal case would be some kind of natural sharding where each node keeps its own updates, or just a highly available data browsing app, with data in SQLite files updated as entire files, like a deploymen.)
I've not done this but it's intriguing; potentially a best-of-all-worlds solution.
I think "proper" automatic replication is not possible given the mismatch between Postgres and SQLite - not everything in Postgres maps to a thing that is possible in SQLite.
That said, there are a variety of ways to get data out of Postgres, and a variety of ways to get things into SQLite.
You could periodically export CSVs or whatever from Postgres and periodically import them into SQLite.
Or you could do a more realtime sync by using a Postgres foreign data wrapper like this one: https://github.com/pgspider/sqlite_fdw which would let you write directly to SQLite. Combine that with database triggers on the Postgres side of things and you've got something pretty close to realtime replication.
Those sorts of solutions wouldn't be as robust as "real" replication (specifically, what happens when one of the SQLite replicas is unavailable? do you catch things up later?) but could be very useful for a lot of scenarios. You could have Postgres importing gobs of data, and "replicating" it over to your read-only reporting server which uses SQLite as a data source.
My idea was to either use triggers or a process that reads Postgres's WAL and replay transactions to SQLite, by sending updates, rebuilding files entirely, or anything else.
Such replicator processes can be horizontally scaled, and made into an HA configuration.
Of course you should be careful to use on the Postgres side only the SQL features that will map well on SQLite. The range of such features is wide enough for practical applications though.
My real-life example would be a realty site, where changes are infrequent, but the desire to slice and dice the current large(ish) dataset in interesting ways is very high. The latter could be done using a per-node SQLite DB.
I've worked on an 'eventually consistent' system with read/write SQLite dbs on each host with a background worker that replayed the log into a central source of truth db and workers that made updates across each host instance of SQLite.
It could have been made a lot faster, I think the replication sla for the service was 10 minutes usually done in seconds. But our specific workflow only progressed in one direction, so replaying a step wasn't a huge issue now and again though that was quite rare. If you were to put a little more effort than we did into replication layer and tuning your master db, it could be a really effective setup.
One of the best parts is that when instances are stopped or isolated, they were also almost isolated from everything that used the service so if you go into a black box with your clients, you work as normal and when connection or other hosts are brought back up they replay the db before accepting connections. We could take entire availability zones offline and the workers and clients would keep humming and update neighbors later.
One of the ideas I tried to sell at my last company was to bake SQLite into the lambda images used to run the app. It wouldn’t have been for transactional data, but for the mess of other data you wind up needing (country codes, currency codes, customer name to ID, etc.). It was all stuff that changed infrequently if ever. Unfortunately nobody wanted to do it. I think they wanted to have DynamoDB on their resume.
What if, due to ridiculous latency reductions, your business no longer requires more than 1 machine to function at scale?
I'm talking more about sqlite itself than any given product around it at this point, but I still think it's an interesting thought experiment in this context.
I have to imagine having your service highly available (i.e. you need a failover machine) is far more likely to be the reason to need multiple machines than exhausting the resources on some commodity tier machine.
I'll point out that the ridiculous latency reductions don't apply to replicating the writes to S3 and/or any replica servers, that still takes as long as it would to any other server across a network. The latency reductions are only for pure read traffic. Also, every company I ever worked at had a policy to run at least two instances of a service in case of hardware failure. (Is this reasonable to extrapolate this policy to a company which might want to run on a single sqlite instance? I don't know, but just as a datapoint I don't think any business should strive to run on a single instance)
This write latency might be fine, although more than one backend app I know renewed the expiry time of a user session on every hit and would thus do at least one DB write per HTTP call. I don't think this is optimal, but it does happen and simply going "well don't do write traffic then" does not always line up with how apps are actually built. Replicated sqlite over litestream is very cool, but definitely you need to build your app around and also definitely something that costs you one of your innovation tokens.
There's no magic here (that there is no magic is part of the point). You have the same phenomenon in n-tier Postgres deployments: to be highly available, you need multiple instances; you're going to have a write leader, because you're not realistically want to run a Raft consensus for every write; etc.
The point of the post is just that if you can get rid of most of the big operational problems with using server-side SQLite in a distributed application --- most notably, failing over and snapshotting --- then SQLite can occupy a much more interesting role in your stack than it's conventionally been assigned. SQLite has some very attractive properties that have been largely ignored because people assume they won't be able to scale it out and manage it. Well, you can scale it out and manage it. Now you've got an extremely simple database layer that's easy to reason about, doesn't require you to run a database server (or even a cache server) next to all your app instances, and happens to be extraordinarily fast.
Maybe it doesn't make sense for your app? There are probably lots of apps that really want Postgres and not SQLite. But the architecture we're proposing is one people historically haven't even considered. Now, they should.
I'm not sure "litestream replicate <file>" really costs a whole innovation token. It's just SQLite. You should get an innovation rebate for using it. :)
> But the architecture we're proposing is one people historically haven't even considered. Now, they should.
I think this offering, and this idea, are absolutely fantastic, and if not the future, at least a big part of it, for the reason outlined in the post: namely, that for a lot of apps and use cases, sqlite is more than enough.
But I also suspect this is probably already the case, and we don't know about it because people don't talk about it.
Amusingly, I was recently scolded here on HN for suggesting to use sqlite, by someone who said HN was a place for "professionals":
Directing specific queries to a write connection (dbserver) vs directing requests to specific application servers (potentially mid-request) does seem operationally “harder” though.
I’m coming at this from the perspective of a traditional django app that has .using(“write”) when wanting to write data. Otherwise you’re replaying requests at a completely different app server.
This may or may not be that hard, depending on your server. You could proxy all "mutation" HTTP verbs to your one writer instance, and probably do similar if you are using GraphQL.
If you are using something like gRPC I feel this might be more complicated because it's not obvious which actions are read/write.
I'm in the same boat as you though overall - I'm not sure what the ideal strategy is, or if one even exists, since this seems to create a problem that does not normally exist.
If you are greenfield, maybe you create one service that only does writes - this may be CQRS-like.
This is great and I'm definitely going to be using it this week in a client project.
That being said, you don't get an innovation rebate for using a new tool, even if, as here, it's a parsimony-enabler. It's still a new tool.
A description from TFA reads "The most important thing you should understand about Litestream is that it's just SQLite." (This reminds me an awful lot of the tagline for CoffeeScript: "It's just JavaScript" -- where did that leave us?) But that info box is just under a description of how the new tool is implemented in a way that makes it sound (to someone who's never looked at the SQLite codebase) like it's breaking some assumptions that SQLite is making. That's the sound of an innovation token being spent.
CoffeeScript was not in fact just Javascript. Litestream literally is just sqlite3; it's not an app dependency, and doesn't interpose itself between your app and sqlite3. You could add it to an existing single-process SQLite-backed app without changing any code or configuration, other than running the Litestream process.
It's brilliant that a person can ship their WAL from an app that doesn't know anything about Litestream. That's cool. But it is not in fact just SQLite. If it were, there wouldn't be a blog post, or an additional binary to download, or a backup of my database in S3, or...
I think saying it is “just SQLite” is (unintentionally) misleading. Your app may not know it’s anything else but operationally it’s a sidecar, so another process to manage.
I actually had to go look that up because it was a little unclear from the blog post and this comments section.
If we're designing a system that relies on an unconventional and otherwise quite rare use-case of a dependency in order to make critical long-term stability guarantees, I would rather that dependency be SQLite, for sure.
> The latency reductions are only for pure read traffic.
Well, no, because every insert will still be fast (until there's too many). It does not block until it's written to e.g. s3.
So there's a window, let's say 1 second, of potential data loss.
I assume syncing the wal to s3 is much faster than inserting to sqlite, so it will never fall behind, but I have not tested.
> Also, every company I ever worked at had a policy to run at least two instances of a service in case of hardware failure.
Yeah, but the goal is not to have X instances, the goal is to limit downtime. In my experience the complicated setups have downtime as well, often related to how complicated they are.
In my mind a setup like this would only be used where some downtime is OK. But that's quite common.
With Postgres, you might have one server, or one cluster of servers that are coordinated, and then inside there you have tables with users and the users' data with foreign keys tying them together.
With SQLite, you would instead have one database (one file) per user as close to the user as possible that has all of the user's data and you would just read/write to that database. If your application needs to aggregate multiple user's data, then you use something like Litestream to routinely back it up to S3, then when you need to aggregate data you can just access it all there and use a distributed system to do the aggregation on the SQLite database files.
This sounds a lot like ActorDB [1], which is a horizontally replicated and sharded SQLite.
With ActorDB, each shard (what it calls actors) is its own dedicated SQLite database (but efficiently colocated in the same block storage as all the others, so essentially a virtual database). The intention with the sharding is to allow data that logically belongs together to be stored and replicated together as a shard; for example, all of a single user's data. When you want to run an SQL query you tell it which shard to access (as part of the query), and it routes the request to the right server.
It has some pretty clever stuff to allow querying across multiple such "actors", so you can still get a complete view of all your data.
Sadly, I don't think it's maintained anymore. Last commit was in 2019.
For a project I currently have, I am parititioning not only the database but also the software for each "customer" (let's call them that for now), so basically I have my single-threaded Python application that uses SQLite and put that in a container with Litestream and run one of those containers for each customer on one or more kubernetes clusters or on anything else that runs a container (or you can run the software directly). Then you can take the data from one to N of those customers and aggregate it and process it for monitoring and central configuration control and backup with another bit of software that does a distributed query on the S3 data that Litestream is saving. I can also control several of the systems in a coordinated way, if needed, by linking them together in a configuration file and letting them elect a leader that will be the central point of contact for a source of "work" (let's call it that for now) and then that leader can notify the other instances of the work to be done and help to dole out that work in a reasonable manner based on the resources and capabilities of the aggregation.
A few similar projects I worked on that were centralized were a huge mess in both the database and the codebase due to trying to do the same thing for thousands of customers with several dozen horizontal-scaling services that had to be coordinated. In my system, I just write one single-threaded bit of software that does the whole thing, but just for one customer at a time, so the whole system is horizontally scaleable instead of just each service within the system. I can still do the same things to aggregate data from the individual customer software units, and as described above I can still add coordination features for groups of individual customer units, so there is no loss in features, but the whole system is simple to reason and think about with none of the usual synchronization pitfalls you get in the usual distributed system.
Hold on, doesn’t one-database-per-user totally absolve all ACID guarantees? You can’t do cross-database transactions (to my knowledge), which means you can end up with corrupted data during aggregations. What am I missing?
One database per tenant only makes sense in multi-tenant applications that don't have any cross-tenant actions. I imagine there are many B2B applications that fall into this category.
If you have a use case with data that is extremely partitionable (like controlling individual industrial devices and collecting data to improve their performance or monitor the processes), then SQLite and Litestream could be a great option, but if you can't reasonably partition the data then it's probably better to use a centralized database server.
There can also be shared, mostly read-only databases, with no transactions crossing database boundaries.
For example, one database per retail store with sharded customers, orders and inventory (most transactions involve one order of one customer fulfilled from that store) and a shared one with item descriptions and prices (modified, say, daily).
>Hold on, doesn’t one-database-per-user totally absolve all ACID guarantees?
No it doesn't. What gave you that idea? You still have all "ACID guarantees" within each database.
>You can’t do cross-database transactions (to my knowledge),
That's true of most databases. If you have two apps and they use two different databases you won't have transactions across those two apps.
>which means you can end up with corrupted data during aggregations.
No, aggregations within each database work as you would expect.
>What am I missing?
As others have said, you use this pattern only if you don't intend to cross databases. By the way, in NoSQL databases like MongoDB, every document is its own little database so having a per user database is a massive upgrade over NoSQL.
- I was talking about ACID guarantees across databases (ie across users)
- I was talking about aggregations across databases (ie across users)
Of course working inside one database works as you would expect it to. My point was that this pattern of database-per-user seems to be a totally different design than people have used with traditional n-tier designs.
Good point about NoSQL! But, wasn’t part of the reason MongoDB fell out of favor because it was lacking consistency?
Yeah, I think MongoDB went through a set of steps
- atomically consistent at document level (which is fine for many apps as you have most related data in the single document anyways)
- atomically consistent within a collection
- and now, with MongoDB 4.0 and higher (released in 2017? 2018? whatever, a long time ago), MongoDB supports full transactional consistency across documents, collections, and shards.
It took them awhile, but they got there.
A lot depends on your consistency requirements and data model here.
I use SQLite heavily, and have evaluated litestream and rqlite but not deployed them, so bear that in mind.
If the application is set up so that it serves a user for a session, so a given session ID is reading and writing from the same SQLite database, there are many opportunities to replicate that data optimistically, so that you won't lose it if a meteor hits the server, but it might not live in all the replicas right away, since applying patchsets off the gossip network happens in downtime.
If concerns can't be isolated like this then yes, dedicated swarms of database servers are the way to go. Frequently they can be, and using SQLite punches way above its weight here.
There are many systems that have much higher read to write traffic and so writes only need logarithmic scaling or perhaps with the square root of the system size. Waiting for faster hardware worked for these system for a long time, and to a small extent, still does.
The dirty secret is that a lot of systems that require very high write traffic are essentially systems built for narcissists. "Social websites" have higher write traffic than simpler consumption based systems, but we've gone beyond those initial steps into very aggressive systems that are based on recording every interaction with the user and providing them instant gratification for many of those.
These applications don't scale in a way that others do, easily. And maybe it's a feature, not a bug, if the tools I use discourage me from jumping into the maelstrom by making it difficult to even consider doing so. Constraints are where creativity comes from, not possibility.
What a ridiculous marketing term. This is a RYW (Read your writes) level of consistency which is a far cry from Strong consistency (see https://jepsen.io/consistency). Seems like eventual consistency with some affinity bolted on.
That page does not give a hard definition for strong consistency, it says that it uses them informally as relative terms. AWS is not claiming serializability, they call it "strong read-after-write consistency." I don't see the problem here? S3 famously wasn't guaranteed to read data you had just written for a long time, and now it is. That's significant.
I have always understood strong consistency to refer to linearizability or sequential consistency - i.e. all clients have the same view of the global order but with sequential consistency permitting slightly more flexibility in how operations of different clients can be reordered wrt each other.
Hey, having actually looked at the link you provided, in fact both examples they give are linearizations so they could plausibly be providing linearizability (with respect to a single key). It's hard to say whether there are corner cases in which different clients could observe different orderings but if not then I stand corrected!
There was another thread where somebody claimed it was causally consistent. I’m sort of surprised Amazon hasn’t been clearer about this, but my feeling is that they would say it was linearizable if they were sure it was linearizable. Would love to read a real deep dive on this, I checked to see if Kyle Kingsbury had looked into it yet but he hasn’t.
> > Any read (GET or LIST request) that is initiated following the receipt of a successful PUT response will return the data written by the PUT request.
> So this is stronger than RYW.
I'm not sure that it is? The examples listed below that description only specify making an update and then immediately reading it back from the same process.
Look at the graphics in the section "Concurrent applications," specifically the first one.
At T0 Client 1 writes 'color = red.' Write 1 completes at T1.
At T2 Client 2 writes 'color = ruby.' Write 2 completes at T3.
At T4 Client 1 reads 'color = ruby,' the result of Write 2 from Client 2.
The explanation above says "Because S3 is strongly consistent, R1 and R2 both return color = ruby." There are clearly some subtleties (as explained further down the page) but I don't think Amazon are really being deceptive here.
Maybe it's just my suspicious-of-everything-distributed brain, but that diagram seems to assume a single universal time scale without any discussion of the implications.
You successfully nerdsniped me and I'm having a lot of trouble finding discussion of the formal implications of what they call "strong consistency" here, other than reading that they did in fact formally verify it to some extent. The best that I could find is this other HN thread where people claim it is causally consistent in a discussion about a Deep Dive (frustratingly shallow, as it happens): https://news.ycombinator.com/item?id=26968627
I have never heard strong consistency used to describe such a weak guarantee before - i.e. it's marketing bs. Usually strong consistency refers to linearizability (or at the least sequential consistency). The diagram a few pages in to this paper gives a nice overview: https://arxiv.org/abs/1512.00168
Yes I actually read that paper while I was digging around but it didn’t seem to help in this case because Amazon don’t specify whether reads made after a concurrent write is made are guaranteed to return the same value as each other. If they are I think the system would be linearizable, yes? Either way they don’t say linearizable anywhere and they describe it specifically as “read-after-write” so I think it would be wrong to assume linearizability. What’s missing from this model for linearizability? S3 doesn’t have transactions after all.
I'm not convinced that issues requests from the multiple clients for the same key actually matters. My speculation is that they map a key to their backend via some type of (consistent/rendezvous) hash and then ensure that all requests for said key lands on the same process/server* that contains the state for the key.
This means that for a specific key, you end up on 1 specific process. If you can ensure this, you basically get monotonic reads/writes along with RYW and Writes-Follow-Reads. All this maps to causal consistency so it is believable.
* The request could probably be sent to a read-only replica first but it could then forward it to the leader replica handling writes by examining some logical timestamp.
> Latency is the exact reason you would have a problem scaling any large system in the first place.
Let's not forget why we started using separate database server in the first now…
A web server does quite a lot of things: Parsing/formatting HTTP/JSON/HTML, restructuring data, calculating stuff. This is typically very separate from the data loading aspect and as you get more requests you'll have to put more CPU in order to keep up (regardless of the language).
By separating the web server from the database server you introduce more latency in favor of enabling scalability. Now you can spin up hundreds of web servers which all talk to a single database server. This is a typical strategy for scalability: decouple the logic and scale up individually.
If you couple them together it's more difficult to scale. First of all, in order to spin up a server you need a full version of the database. Good luck autoscaling on-demand! Also, now every write will have to be replicated to all the readers. That's a lot more bandwidth.
There are definitely use cases for Litestream, but it's far from a replacement for your typical Node + PostgreSQL stack. I can see it being useful as a lower-level component: You can use Litestream to build your "own" database server with customized logic which you can talk to using an internal protocol (gRPC?) from your web servers.
I don't think anyone's seriously arguing that the n-tier database architecture is, like, intrinsically bankrupt. Most applications are going to continue to be built with Postgres. We like Postgres; we have a Postgres offering; we're friends with Postgres-providing services; our product uses Postgres.
The point the post is making is that we think people would be surprised how far SQLite can get a typical application. There's a clear win for it in the early phases of an application: managing a database server is operationally (and capitally) expensive, and, importantly, it tends to pin you to a centralized model where it really only makes sense for your application to run in Ashburn --- every request is getting backhauled their anyways.
As the post notes, there's a whole ecosystem of bandaids --- err, tiers --- that mitigate this problem; it's one reason you might sink a lot of engineering work into a horizontally-scaling sharded cache tier, for instance.
The alternative the post proposes is: just use SQLite. Almost all of that complexity melts away, to the point where even your database access code in your app gets simpler (N+1 isn't a game-over problem when each query takes microseconds). Use Litestream and read-only replicas to scale read out horizontally; scale the write leader vertically.
Eventually you'll need to make a decision: scale "out" of SQLite into Postgres (or CockroachDB or whatever), or start investing engineering dollars into making SQLite scale (for instance: by using multiple databases, which is a SQLite feature people sleep on). But the bet this post is making is that the actual value of "eventually" is "surprisingly far into the future", "far enough that it might not make sense to prematurely optimize for it", especially early on when all your resources, cognitively and financially and temporally, are scarce.
We might be very wrong about this! There isn't an interesting blog post (or technical bet) to make about "I'm all in on the n-tier architecture of app servers and database servers". We're just asking people to think about the approach, not saying you're crazy if you don't adopt it.
I just wanna equivocate about a single phrase: scale out.
I have a few years of experience w/ SQLite as a backend for dev/test purposes, and the biggest lesson has been in reinforcing best practices... the kind Postgres demand and you're lucky if MySQL reminds you about.
So my newb two cents is that Litestreams makes some unique and significant contributions: it's not "scale out" but "pivot out" to reflect that you've got great backup/replica solutions with impressive agnosticism around what the future solution is.
Thats a lot of leeway to prove that "X's brainchild is a viable product" while minimizing "Y's devops team and infrastructure" without compromising on durability essentials, especially where the solution to so many things is "dump/serve a(n) [encrypted] copy to/from S3" already.
Does Fly give some magic around splitting reads vs. writes to instances? In a typical Django/Rails app I’m not sure how I’d mark some API endpoints as routed to the single node which is allowed to write to the DB. (I know you guys have some sauce around how you route requests between regions, maybe this falls out of your existing infra here?)
I’m just not seeing how I can operate my traditional Django app in this model, other than just having one instance. I’m probably missing something though!
> There are definitely use cases for Litestream, but it's far from a replacement for your typical Node + PostgreSQL stack
If you're a language like Node.js then horizontal scaling makes a lot of sense, but I've been working with Rust a lot recently. And Rust is so efficient that you typically end up in a place where a single application server can easily saturate the database. At that point moving them both onto the same box can start to make sense.
This is especially true for a low-traffic apps. I could probably run most of my Rust apps on a VM with 128MB RAM (or even less) and not even a whole CPU core and still get excellent performance. In that context, sticking a SQLite database that backs up to object storage on the same box becomes very attractive from a cost perspective.
This is "vertical scaling" and that is indeed a very valid approach! You just have to be aware that vertical scaling has some fundamental limits and it's going to suck big time if it comes at a surprise to you.
Alternatively, instead of just betting on it, you could do a benchmark, figure out the limits of your system and check if your current implementation is capable of handling the future needs.
Not only is it possible that your users grow, it's also quite possibly that your application complexity does. If you suddenly need to run expensive queries joining multiple datasets or do some heavy computation, your apps resource usage might jump quite a bit. Of course, you can always throw money at it, but the point where it gets seriously expensive can come quite fast.
Not sure about that. It would be smarter to just failure test your apps. Once you cross some threshold, you scale. Lots of companies build formulas costing out their cloud spend based on infra needs and failure tests.
This is a large part of what Rich Hickey emphasizes about Datomic, too. We're so used to the database being "over there" but it's actually very nice to have it locally. Datomic solves this in the context of a distributed database by having the read-only replicas local to client applications while the transaction-running parts are remote.
Only trouble with that particular implementation is that the Datomic Transactor is a single threaded single process that serializes every transaction going through it. As long as you don't need to scale writes it works like a charm. However, the workloads I somehow always end up working with are write heavy or at best 50/50 between read and write.
If you're pushing the database up into the application layer, do you have to route all write operations through a single "master" application instance? If not, is there some multi-master scheme, and if so, is it cheaper to propagate state all the time than it is to have the application write to a master database instance over a network? Moreover, how does it affect the operations of your application? Are you still as comfortable bouncing an application instance as you would otherwise be?
The answer is: yes, you do have to write through a single primary application instance.
So far.
The two important things here are:
1. Fly.io makes it really easy to write through a single primary application instance
2. There are ways to solve this problem so your application doesn't have to worry about it.
Right now, you have to be a little careful bouncing app instances. If you bounce the writer, you can't perform writes for 15s or whatever. This is a big problem during deploys.
There are a tremendous number of Fly.io users that are fine with this limitation, though. It's pretty valuable for some segment of our customers right now.
This is exactly the reason I am so skeptical of the cloud. I don't care how easy it is to stand up VMs, containers, k8s, etc. What I need to know is how hard is it to lug my data to my application and vice a versa. My feelings on this are so strong as I work mostly on database read-heavy applications.
What confuses me about this architecture I guess is: why have a SQL database at all? This sounds like a local cache. Which sure, of course those are super fast. But why does it need to be relational if all the data fits on the edge?
You get SQL and ACID. If you don't need those then you pay a performance price for having them. If you do need them, then you pay a price for not having them.
The best solution depends on the unit economics of the problem you are trying to solve. If you have a small number of high value users, then these approaches are premature optimisation, just use Postgres. If your business model is ad eyeballs then squeezing every last drop begins to seem very attractive because you can multiply your profitability (potentially).
You usually want ACID, but with N+1 queries not being a problem you no longer need SQL. The database can now be a really low-level key-value store with your favorite query language on top.
Litestream author here. I wrote an embedded key/value database called BoltDB a while back and I experimented with using it for application development. It had its pros and cons. The pros were that it was crazy fast and a lot of data access is just basic iterators. Some of the biggest cons were having to implement indexes on my own and that my schema was tied to my application language. Migrations can be more difficult if everything is tied to structs in your app and it's also difficult to inspect your database since there's no CLI REPL.
So the answer to my question might be "because SQLite already does some stuff really well, like indexes and its CLI, and it just happens to be relational and use SQL as its query language, but that's not the important part". I buy that.
Yeah, but what if my favorite query language is SQL? And what if I want to grab things not by the key (all items in area X, between times Y and Z)? Key-value isn't really great at that.
Objects / documents / "key-value". If the data is small enough to fit on a single application server node, it seems to me that the relational model is likely overkill. But I don't doubt that there are good counter-examples to this, it's just my intuition.
Key-value pretty trivially fits into the relationship model if need be. But yeah, if you’re very sure of the key-valueness of your data, something like RocksDB could be a more direct fit. It’s just a little harder to work with in my experience.
if you can tolerate eventual consistency and have the disk/ram on the application vms, then sure, keeping the data and the indices close to the code has the added benefit of keeping request latency down.
downside of course is the complexity added in synchronization, which is what they're tackling here.
personally i like the idea of per-tenant databases with something like this to scale out for each tenant. it encourages architectures that are more conducive for e2ee or procedures that allow for better guarantees around customer privacy than big central databases with a customer id column.
> personally i like the idea of per-tenant databases with something like this to scale out for each tenant.
So do I. And that type of architecture has come up a few times now in this comment thread. Given that Fly has the lead developer of the Phoenix web framework on staff, maybe it would make sense for him to work on integrating this type of architecture, with Litestream-based replication and the ability to have different master regions for different tenants, into Phoenix.
> SQLite isn't just on the same machine as your application, but actually built into your application process.
How is that different than whats commonly happening? Android and iOS do this... right? ... but its still accessing the filesystem to use it.
Am I missing something or is what they are describing just completely commonplace that is only interesting to people that use microservices and never knew what was normal.
This is how client apps use sqlite, yes. Single instance client apps. Litestream is one method of making sqlite work for server side apps. The hard part on the server is solving for multiple processes/vms/containers writing to one sqlite db.
> the hard part on the server is solving for multiple processes/vms/containers writing to one sqlite db.
I feel like if you have multiple apps writing to the database then you shouldn't be using SQLite. That's where Postgres etc completely earn their place in the stack. Where litestream is really valuable is when you have a single writer, but you want point-in-time backups like you can get with postgres.
We disagree, a little. I feel like that if you have multiple apps writing to the DB, Litestream is pretty close to making sqlite viable for a lot more apps.
It's normal (and HN does something similar, working from in-process data) for systems that don't have to scale beyond one server. If you need multiple servers you have to do something, such as Litestream.
a few years back working on Java project. Used H2 instead of postgres, and included H2 db as in application memory access. It speed up quries tremendously. There is just no beating in application db.
Just wait until (some) devs realize they don't even need sqlite, and can serialize their data directly to binary flat files with simple locking synchronization for backups.
I'm half joking but I've witnessed many devs use databases when a binary file will do. I've done this personally for years for most of my 'fits-in-RAM', non-transactional, denormalized datasets, which is almost all of them.
Better yet, use both if you have both types of data. The performance benefits are enormous and well worth the complexity tradeoff in my experience.
That seems exactly opposite to the growing trend of "sqlite-as-application-file-format". There's a lot of nice features you get "for free" doing this, primarily way better consistency, than you do rolling your own binary format.
I don't want to have to deal with locks if at all possible. Binary works fine if each file is atomic, but that does not sound like the case you are advocating.
Who said anything about rolling your own? I'm talking about writing your data models directly to disk, the serialization and deserialization are done for you with a simple annotation.
And in most applications you don't ever have to deal with locks as most applications don't need multiple threads writing and reading, that can be done with a single thread and a lockless queue that multiple threads write to. You would need a lock for making backups of the files themselves but this is trivial and takes the form of error handling. The OS itself handles the lock, you just need to handle the case where you can't open the file for writing.
This approach is not all that rare, lightweight, and very useful for minimizing latency. Why would you ever use a database if you don't actually need the features? It is much simpler to immediately have access to all your data in your application's data models.
The usual use case for a database is that it has multiple users in different places which would be difficult with SQLite. But for other use cases I don't see a problem. I don't know how it scales ad infinitum, but you can manage a lot of data with it and the usual SQL server has limits too. Could be a good choice even before you care about latency.
> Latency is the exact reason you would have a problem scaling any large system in the first place.
Not always. It depends on the architecture and your hosting strategy. I think it’s more likely for an instance of a web app to receive more requests than it can handle, causing the app to not service any requests.
Just the latency is really important to me! I even built an ERP system that has a response time below 100 ms for all operations, it's a design goal.
My thought is that if you can see consumer changes depending on latency (for example on amazon or google) it is equally important for internal tools. Employee time is expensive.
Throughput for a single service / app improves, but does it really scale? Across a cluster, you will have to have data replication and sync routines, that are a whole mess themselves.
The latency is not reduced, it is shifted elsewhere.
This is the #1 reason my exuberant technical mind likes that we use SQLite for all the things. Latency is the exact reason you would have a problem scaling any large system in the first place. Forcing it all into one cache-coherent domain is a really good way to begin eliminating entire universes of bugs.
Do we all appreciate just how much more throughput you can get in the case described above? A 100x latency improvement doesn't translate directly into the same # of transactions per second, but its pretty damn close if your I/O subsystem is up to the task.