Store SQLite in Cloudflare Durable Objects

eropple · on Jan 25, 2022

I read through this entire post going "oh, this is cursed", slowly growing in volume the entire time. As is proper for something this clever.

I've been doing a bit of prodding at Durable Objects lately, in part inspired by the same note that the OP quoted about implementing datastores atop the framework, and this is the sort of environment that I could see being really, really useful at scale in the future. I'm anxious about using it without a public spec and some portability, but the concept is just tremendously cool.

techgnosis · on Jan 25, 2022

https://www.urbandictionary.com/define.php?term=Cursed

Second definition

Similar to "janky"

juancampa · on Jan 25, 2022

What do you mean by "cursed"?

eropple · on Jan 26, 2022

Substitute "this is a terrible idea and I love it" if you want. ;)

k__ · on Jan 25, 2022

My guess in this context: "A hacky solution that probably is unstable"

samwillis · on Jan 25, 2022

This is super exciting!

I see there is concern about Asyncify. I believe in the browser there is a way around this using Atomics and a shared buffer (Absurd SQL[0] uses it), but suspect this isn't possible in CloudFlare Workers as it is working across the worker/main thread boundary? Is it possible to launch another worker from a CloudFlare worker, in the same process, like a web worker?

I am somewhat hoping that CloudFlare have something up their sleeve to make these sorts of projects easer, maybe even some sort of distributed block store.

0: https://github.com/jlongster/absurd-sql

kentonv · on Jan 25, 2022

Indeed Cloudflare Workers does not and likely will not support threads with shared memory, due to Spectre concerns. Asyncify is the way to go here.

samwillis · on Jan 25, 2022

Thats a pity, is that due to the architecture of sharing a V8 runtime with other customers workers? How is this different from V8 in Chrome and shared buffers?

kentonv · on Jan 25, 2022

Chrome is all-in on strict process isolation as their Spectre defense. Given the web platform that they need to support, they don't really have another choice -- precise timers and threads, as well as many other ad hoc timing mechanisms, were already part of the platform before Spectre hit. Using process for everything is pretty costly, though, to the point that they have to make some compromises which are open to attack: https://www.spookjs.com/

In Cloudflare Workers, process isolation for every worker isn't practical -- if we had to do that then we'd only be able to offer real "edge compute" to a small number of big enterprises with deep pockets, rather than at prices affordable to everyone. Our fundamental difficulty is that we need a single machine to be able to support thousands of running applications where each app may only get a small trickle of traffic, so that we can deploy every application to every one of our edge locations. It's like if you always had 10,000 tabs open in Chrome.

Instead our strategy is to stack a lot of different mitigations that make attacks slower, until the point where they are too slow to be remotely useful[0]. One part of the strategy is a novel defense we designed with researchers at TU Graz called Dynamic Process Isolation[1], in which we use hardware performance counters to detect execution patterns indicative of an attack and move those workers into private processes. For that strategy to work, though, we first need to slow down attacks enough to give ourselves a chance to detect the patterns -- and that requires disallowing precise timers or anything that could be used as a precise timer, such as multiple threads with shared memory. Luckily, we were thinking about timing attacks from the very start of the project (even before Spectre was known), so we were able to avoid ever putting these into the platform in the first place.

In general I think threads are not as important on servers because distributing work across machines is more powerful anyway. We're trying to create a platform where that is really easy.

[0] https://blog.cloudflare.com/mitigating-spectre-and-other-sec...

[1] https://blog.cloudflare.com/spectre-research-with-tu-graz/

samwillis · on Jan 25, 2022

Thanks for the detailed explanation, makes complete sense.

I suppose that means that for projects such as python-wasm [0] that are porting other language runtimes to WASM they will have to go the Asyncify route for CF Workers with all the overhead that entails?

From the looks of it they don't think asyncifying Python will be possible in the near to medium term [1].

0: https://github.com/ethanhs/python-wasm

1: https://github.com/ethanhs/python-wasm/issues/12

kentonv · on Jan 25, 2022

If this becomes a big enough problem, we might be able to build a work-around into the Cloudflare Workers Runtime itself so that it can support synchronous waits. It'll take some hacks but I think it's doable, without actually supporting threads.

ec109685 · on Jan 25, 2022

Essentially doing what Asyncify is doing as part of the platform?

kentonv · on Jan 25, 2022

I'd probably use fibers, like node-fibers used to do. The problem is V8 stopped supporting fibers a few versions ago. We'd have to hack support back in. That's something we probably could get done but I'm not exactly excited about it. :)

meekins · on Jan 26, 2022

> we'd only be able to offer real "edge compute" to a small number of big enterprises with deep pockets, rather than at prices affordable to everyone

What a nice way to formulate the tradeoffs between cost and security. When Workers came out with huge headlines about performance and cost I was very disappointed to find out there was no special technical wizardry behind that but just conscious trade-offs in disregard to customer data security. Workers simply skip all the sandboxing steps other providers take to implement a multi-tenant application runtime in a secure manner. So far the mitigations in place seem to make the possibility of Cloudflare customers getting bitten by a V8 vulnerability unlikely instead of impossible.

Sure, the platform has interesting ideas and I'm looking forward to try it out as a full-stack serverless platform. I just cannot foresee running anything serious on it before they come up with a more convincing security story.

kentonv · on Jan 26, 2022

Other providers run attacker-provided native code directly on hardware, deeply relying on bug-free silicon for their whole security model to work. I honestly think that's far more precarious than what Workers is doing.

psanford · on Jan 25, 2022

I built something very similar for using DynamoDB as a VFS backing store for sqlite: https://github.com/psanford/donutdb

Glench · on Jan 25, 2022

Whenever SQLite comes up, always have to link to this post about using SQLite in production: https://blog.wesleyac.com/posts/consider-sqlite :)

I'm running https://extensionpay.com off SQLite and a $5/month DigitalOcean box and it's serving around 3 million requests a month with absolutely no issues and seriously low CPU load. I'm kind of astonished, frankly.

nicoburns · on Jan 25, 2022

> 3 million requests a month with absolutely no issues and seriously low CPU load

That's just under 1 request per second, so I can't say I'm surprised. Even if traffic is spiky that would leave you with a lot of headroom!

Glench · on Jan 25, 2022

Yeah, thanks for putting that in perspective! ExtensionPay is still relatively new, too, so I'm glad there's plenty of headroom :)

ryanianian · on Jan 25, 2022

A more useful metric for the parent comment to include would be the peak throughput in ops/time versus CPU or IO load.

bborud · on Jan 25, 2022

When I develop server software I always include support for SQLite and PostgreSQL so I can run the server entirely stand-alone if I want to. Choosing whether to use the embedded database or connecting to PostgreSQL is a command line switch.

I have been using SQLite in an IoT application where the measurement of a long-running test instance has 65 million rows in the main data table and is 3.6Gb in size....and the response time running queries against the database isn't bad. I'm starting to question of I actually need PostgreSQL for a lot of situations.

achillean · on Jan 25, 2022

We've been using it for https://internetdb.shodan.io and it's been doing a few thousand lookups/ second without problems.

rexreed · on Jan 25, 2022

Do you have any tips load-balancing or horizontal scaling of SQLite if you have to have a 2nd server for whatever reason?

otoolep · on Jan 25, 2022

https://github.com/rqlite/rqlite provides a form of horizontal scaling for reads, but it's not a drop in replacement for SQLite.

https://github.com/rqlite/rqlite/blob/master/DOC/READ_ONLY_N...

(I'm the author of rqlite)

Glench · on Jan 25, 2022

I don't.

My current understanding is that even though there are sharding / replication projects out there for SQLite, it's usually a better idea to just go with a typical database server when you need many machines. But a lot of sites never get to that point, especially given how fast servers and SSDs are these days, so SQLite seems like it can get you further than you might realize.

bob1029 · on Jan 25, 2022

> But a lot of sites never get to that point, especially given how fast servers and SSDs are these days, so SQLite seems like it can get you further than you might realize.

SQLite on top of NVMe with WAL mode enabled is about as fast as it gets. We've been doing this in production for years now.

We don't have a scenario where we would find ourselves writing more than 2 gigabytes per second to DB/disk, nor do we ever think we would encounter one with our product, so we have committed ourselves to single instance SQLite architecture. Saves a lot of time and frustration when you can accept your circumstances and go all-in on simpler solutions.

hu3 · on Jan 25, 2022

I asked OP this and would appreciate your input as well.

What tool do you use to access production database remotely? Say you need to browse some data, perform queries. Is there something like pgAdmin?

I'm interested in SQLite and I'd like yo know more.

bob1029 · on Jan 25, 2022

We do not permit direct access to any application databases across any network. If something in one of our SQLite databases needs to be adjusted in an operational context, it needs to be done by way of some administrative interface that we have integrated into the product.

For offline troubleshooting/QA/Dev, we would simply grab a copy of the database from wherever and we would analyze it using DB Browser for SQLite. We actually use this tool so often that I felt it necessary to contribute to the author's Patreon account. It really is an amazing tool. Try reading/editing a binary field containing an image or other file in SSMS...

Having extreme discipline with your schema, queries, indexes, etc. is critical for being able to operate like this. If you are uncertain of what should live in what table, you probably just want to work against a hosted database engine until the whole team can agree on one standard path that you can burn into your codebase.

We did NOT start with SQLite for all the things. We had to make a lot of mistakes with hosted SQL providers before getting to this level of confidence.

justinclift · on Jan 25, 2022

> We actually use this tool so often that I felt it necessary to contribute to the author's Patreon account.

Thanks heaps for that btw!

Because of our Patrons, we've recently been able to buy an M1 Mac Mini. We should be able to start releasing ARM based macOS builds in the near-ish future. :)

simonw · on Jan 25, 2022

My application Datasette can serve this need: https://datasette.io/

Lots of live demos - two of my favourites are https://github-to-sqlite.dogsheep.net/ and https://global-power-plants.datasettes.com/

Glench · on Jan 25, 2022

I just scp the whole database to my local machine and use "db Browser for SQLite" (a mac app) :)

Would love to hear if there are other server-based db browsers for SQLite, though.

Glench · on Jan 25, 2022

Cool, what's your setup / workload look like?

eatonphil · on Jan 25, 2022

I just learned about ZeeSQL yesterday which is SQLite+Redis. It's proprietary but if I understand it correctly (the marketing is terribly unspecific about their primary goal) it let's you scale SQLite across all your Redis nodes.

https://zeesql.com/

Zababa · on Jan 25, 2022

I'll be honest, I would consider running SQLite in production, at least for my personal stuff, but the part that scares me is that there's no "managed SQLite" offering and I would have to manage it myself. Litestream seems like a great option for people like me, but I'd like to hear some experience report on it, especially on more complex case, and how it fails.

Another problem (which Litestream might solve) is that SQLite is not compatible with Heroku, that I use since I don't have much ops/sysadmin experience. I feel like people saying that SQLite makes things easier often have already some experience in sysadmin/ops stuff, which isn't my case.

This is not a criticism of SQLite in any way, just an expression of my own limitations and why it makes it harder for me to consider SQLite. I'll gladly take any learning resources about that.

benbjohnson · on Jan 25, 2022

Litestream author here. I'm working on a serverless option that'll run as a managed service later this year. I think there's a huge opportunity to get a simple option for folks running on ephemeral platforms like Heroku, Google Cloud Run, & Cloudflare Workers.

Glench · on Jan 25, 2022

Ooh hi Ben, thanks so much for Litestream! Can you say more about the service you're describing and how it will work?

Zababa · on Jan 25, 2022

That's nice to know, I'll keep an eye out for when it comes out.

Glench · on Jan 25, 2022

The conservative approach — I like it! Yeah it's not for everyone but it's a real delight if your needs align with what SQLite offers.

> there's no "managed SQLite" offering

It's just a file :) You can copy it to a remote server or use Litestream or whatever you want to do with a file.

> SQLite is not compatible with Heroku

Yeah, that's pretty annoying. Apparently fly.io is similar to Heroku but lets you use persistent disks — I've never used it though.

I'd love to see more "managed app server + sqlite" options, since a lot of the times the "managed database server" option from cloud hosts is kind of expensive. Send emails to Heroku and DigitalOcean and Google Computer platform. Tell the world!

Zababa · on Jan 25, 2022

I see how SQLite being just a file makes it easy to do lots of things from a developer point of view. However, I'm looking at this from a "business owner" point of view, at which point it's not just a file, it's my clients data. That's a bit scary for me, so I have a bias for "reassuring stuff" like managed databases with automated backups. This is why I'm looking for more info on "the hard parts" of managing SQLite. I'm conscious that this bias might not be based on reality, especially since I don't know well MySQL or PostgreSQL, especially how they store data on files. But as a relatively young developer, I've heard a lot that "data should go in the database" and databases are big scary things that should be managed by people with lots of experience and knowlegde in that role.

I'll take a look at fly.io, thanks.

7steps2much · on Jan 25, 2022

Just to add to this: offering managed SQLite services isn't really all that much of a thing because essential, as it is just a file, there isn't much to manage.

With databases like postgres you have daemons, config files and more that need an experienced DB-Admin to configure and run. All of that complexity grows exponentially when you include scaling on multiple machines.

With SQLite you essentially need backups. That's about it. A managed SQLite service would essentially just be something along the lines of:

"Tell us which file is your database, we backup it and if you need it restored press this button."

Zababa · on Jan 25, 2022

That's a good point. I think this is my lack of experience showing here, mixed with lots of marketing aimed at people like me. After all, you can sell a managed Postgres for a lot more than you can sell a managed SQLite.

Glench · on Jan 25, 2022

Well maybe to you "managed database server" feels more reassuring which is fine, but to me I run my business off SQLite which I've found insanely simple to set up and maintain, so that is more reassuring to me.

And just an aside, I think if you looked into it a bit you'd probably find that while there is a lot of nuance to database fine tuning, you can get most of that without needing to be an expert.

gen220 · on Jan 26, 2022

If you want an even-earlier-published article to point to as well, this has been mine: https://crawshaw.io/blog/one-process-programming-notes

There's also the venerable "Taco Bell Programming": http://widgetsandshit.com/teddziuba/2010/10/taco-bell-progra....

It all boils down to unix principles, I guess: use the most simple (but not simplistic) tools for the job.

eatonphil · on Jan 25, 2022

"always have to link" to a post that was written less than a month ago?

Glench · on Jan 25, 2022

Yes indeed! Even though it's only been a month I keep posting about it in HN threads about SQLite and people keep going "wow I never thought about that!"

_wldu · on Jan 25, 2022

If you setup Sqlite properly (Remote Clients <-> RESTful API <-> DB) and use WAL mode, you can run a DB backed website fairly cheap. It will be performant, reliable and simple to maintain as well.

hu3 · on Jan 25, 2022

Hi. What tool do you use to access production database remotely? Say you need to browse some data, perform queries. Is there something like pgAdmin?

I'm interested in SQLite and I'd like yo know more.

Glench · on Jan 25, 2022

Copying and pasting my answer from another subthread:

I just scp the whole database to my local machine and use "db Browser for SQLite" (a mac app) :)

Would love to hear if there are other server-based db browsers for SQLite, though.

eatonphil · on Jan 25, 2022

scp is the way I implemented querying a remote SQLite database under the hood in my data ide. Now that I think of it though I should probably add some clarification to docs or the UI that you're querying a local copy of the database and that your edits won't affect the original.

https://github.com/multiprocessio/datastation

jprd · on Jan 25, 2022

CloudBeaver works great for me, though I'm not doing anything intense.

https://github.com/dbeaver/cloudbeaver/

tmp_anon_22 · on Jan 25, 2022

Its SQLite so you would just copy the file and download it to a local machine for analysis through any number of tools. Presumably your setup would already be doing this as a form of backing up the data periodically.

irq-1 · on Jan 25, 2022

You can use the command line for sqlite on the server. There are lots of GUI tools if you copy the file local. Here's one I use https://sqlitestudio.pl

justinclift · on Jan 25, 2022

> Is there something like pgAdmin?

A fairly well used GUI is https://sqlitebrowser.org (a project I help out personally).

shedam · on Jan 25, 2022

You can do that with adminer https://www.adminer.org/

buryat · on Jan 25, 2022

how do you get replication?

Glench · on Jan 25, 2022

I don’t. I have the one server with streaming dB backups. Although apparently litestream might include a way to do live SQLite replication soon.

nerdyadventurer · on Jan 26, 2022

How do you do encryption, authentication and authorization?

Glench · on Jan 26, 2022

I'm not sure what that has to do with my choice of database. I just do it the way every web app does those things.

logankeenan · on Jan 25, 2022

How do you manage backups of your database and how often?

Glench · on Jan 25, 2022

I do streaming updates with Litestream: https://litestream.io/

I recently set it up after putting it off for a while and it took like 5 minutes. It was great. And the backups are currently costing me 0 cents per month on backblaze.

quadrature · on Jan 25, 2022

A long these lines if you're looking for a way to sync your SQLite DB to an S3 like object store there is https://litestream.io/. It creates a snapshot of your DB on s3 and then handles checkpointing of the WAL.

Glench · on Jan 25, 2022

Yes, love Litestream! It felt like the missing piece for replacing Postgres or MySQL with SQLite.

wirelesspotat · on Jan 25, 2022

Does anyone know if it's possible to use Litestream with SQLite running on Cloudflare Durable Objects?

I suppose you'd have to compile Litestream to WASM or have Litestream running on a different machine and reading SQLite from the Durable Object?

benbjohnson · on Jan 25, 2022

Litestream author here. I haven’t looked at Durable Objects yet so I don’t think Litestream would currently work with it. I may add support in the future but I’m adding separate serverless functionality to be released later this year. That’ll allow Litestream to run on Cloudflare Workers or GCR or other ephemeral platforms like those.

Glench · on Jan 25, 2022

Haha check the original post, my friend

WrtCdEvrydy · on Jan 25, 2022

That is something I was looking for.

hinkley · on Jan 25, 2022

I really think I want a “JSON” format for SQLite that is a strict subset of the SQLite file syntax that can be generated fairly trivially by other applications and read by SQLite. There are apps that produce JSON that would be better served by something I can run search queries upon and jq is nice but it’s not SQLite.

Is there a protobuf implementation of the SQLite file format out there somewhere?

rileymat2 · on Jan 25, 2022

I am having trouble with the mapping? I presume this touches on the orm impedance problem? Unless sqlite dictates the json format. But then it probably won’t be the json you like to deal with?

pphysch · on Jan 25, 2022

PostgreSQL has excellent JSON support today, including advanced queries, indexing, constraints. There are many cheap/free managed offerings that could rival SQLite in operational simplicity.

If you want a SQL interface to JSON with room to grow, that's your best bet IMO.

ignoramous · on Jan 25, 2022

> There are apps that produce JSON that would be better served by something I can run search queries upon...

https://duckdb.org/ ?

mathgladiator · on Jan 26, 2022

I actually think this is onto something that I'm finding in a different way. Instead of a massive database, what if we had a key-value store mapping keys to tiny databases.

This is, to some degree, what I'm building over at http://www.adama-lang.org/ without a full SQL engine. Each document has tables, and the tables can be indexed. I have yet to find a usecase (in my domain) which requires joins. HOWEVER, I've had a ton of fun building it and I'm getting ready to start making games.

I do believe it would be amazing to have a key-logger service where a reducer like sqlite/adama could come into collapse the log into a single file.

The closest I see is from the Boki paper ( https://www.cs.utexas.edu/~zjia/boki-sosp21.pdf ) which was presented at SOSP21.

dantodor · on Jan 26, 2022

I see the bundle size after `npm install -S @rkusa/wasm-sqlite` it's a little bit above 4MB. To my current knowledge, limit is 1MB for workers. I asked one of the companies I work with and that uses CF extensively a few months ago for a limit increase, it took them like almost one month and gave us 2MB. So my question is: what is the limit increase one should ask? What was your experience with that and what I need to ask for if I want to run this? Other than that, great job, and thanks a lot for sharing with the community. Until now I was trying to get DuckDB to run in a durable object (they also have a WASM bundle), now I have an excellent alternative. Thanks again!

vdm · on Jan 25, 2022

sqlite vfs on browser indexeddb: https://jlongster.com/future-sql-web

liveoneggs · on Jan 25, 2022

WebSql could have been..