> Our new engine is based on intercepting SQLite's disk writes and doing clever stuff with them. It was so easy because the file format is not just well designed but amazingly well-documented.
Quite the hobby project for a lot of people too! Other folks doing this:
I'm sure there's a lot of really cool local-first databases out there, but SQLite has the benefit of being incredibly widely battle-tested, with literally billions of installations worldwide. It has received thorough security research and fuzzing (it's part of Chrome's attack surface after all). And there's tons of resources online to help people understand how to use it. Although I'm sure there are alternatives that serve certain use cases better it's hard to imagine anything coming close for ours.
That said, the storage engine we've built is not that heavily dependent on SQLite specifically. Any database that uses a write-ahead log like SQLite does should be possible to adapt to it in the future. So maybe we'll eventually open it up to a variety of choices, or even let you bring your own as a Wasm module.
Oh, I've been informed that DuckDB uses SQLite under the hood, so maybe compiling DuckDB to Wasm and running it on top of this will be possible, we'll see.
> DuckDB is indeed a free columnar database system, but it is not entirely built on top of SQLite. It exposes the same front-end and uses components of SQLite (the shell and testing infrastructure), but the execution engine/storage code is new.
Correct. DuckDB is really interesting technology, but it's not a direct successor to SQLite for transactional workloads. It's also very new: there's a LOT of new code in DuckDB on top of the (heavily fuzzed) SQLite parts.
(I use it personally, but it's not the same thing as what we're building with D1)
Also, for OLAP workloads there's Workers Analytics Engine. Analytics Engine is arbitrarily scalable as opposed to a D1-like solution - I can almost guarantee I've inserted more data into its internal variant (which is built on extremely similar underlying infrastructure) than just about any customer would think of doing, and it handles it like a champ.
Basically, we already have a product for doing DuckDB-like things, and the D1 architecture isn't great for high volume OLAP workloads.
Disclaimer: I'm an engineer at Cloudflare, but not on a developer platform team. I'm not speaking for our developer platform strategy or anything, I'm just commenting on how it looks from where I sit.
I enjoy Cloudflare and have been using workers + D1 for a few months, but warn that Cloudflare's definition of beta is far more beta than what other companies mean. The API surface may change repeatedly without corresponding documentation and their Discord can be a bit sparse on help.
Yeah, I’d imagine this will be cheaper than AWS RDS for low storage use cases (lack of fixed monthly compute costs wins out), but more expensive for high storage. Like quite cheap for a 10-100 GB DB, quite expensive for a 1-10 TB GB.
Though they do say:
> when we enable global read replication, you won’t have to pay extra for it, nor will replication multiply your storage consumption
With AWS you’ll have at least one read replica for failover, so $0.23/GB. And if you really want global read replicas, with AWS you might end up with something like a primary in North America and read replicas in South America, Europe and Asia. That would work out to $0.46/GB, so closes the gap a bit.
Depends if you do more compute than store data. I’ve been in a lot of analytical workloads running different types of reports besides your normal CRUD.
The costs as a whole has me worried. I'm not sure it'll be better than just using https://turso.tech/pricing as that is already a free tier of 8GB and it might cost less overall the paid tier.
So serverless meaning you don’t manage the db servers, so they may or may not put your stuff on the edge, cloudflare takes care of maybe load balancing too?
Yes, Cloudflare handles load balancing and neat tricks like measuring the latency between your users, your code and databases etc. and moving your code around the network to make it run fast: https://blog.cloudflare.com/announcing-workers-smart-placeme...
IMO serverless is more about the pricing model more than whether a service is managed. Both Amazon RDS and Aurora are managed services, but Aurora is the one with the serverless pricing model.
You'd use it whenever you're building an application on Cloudflare Workers and you need to store data. D1 provides you with a SQL database. It's based on SQLite, so it's designed for relatively small datasets but can serve them very quickly from the edge.
Note there are several alternatives here, too. Workers Durable Objects[0] provide a lower-level primitive for building advanced distributed systems. But D1 is easier to use for typical use cases. For blob storage you might use R2[1]. And for large databases Workers can easily integrate with several serverless database providers.[2]
Is it a non-goal to be long term usable for larger databases? That would force the usage of something like turso your closest direct comparison as a possible migration strategy or relying on "Smart Placement" (which from my point of view reduces the benefit of global edge) for other serverless non-global dbs.
Personally, I'm a firm believer that most "web app" use cases are better served by many small databases (e.g. per-user or per-document) rather than a single monolithic databases. This is especially true when serving users all around the world -- per-user databases can be located near each user (both for speed and to comply with data locality laws).
What I'd like to enable here is a progression where you start out prototyping your app with a single D1 database, which is easy to use and reasonably fast. Then as you grow we provide tools to let you transition to many D1 databases sharded in a way that makes sense (e.g. per-user). Apps that want even more control can move to using full-on Durable Objects (which will soon support a SQLite database per-object).
That said, there are certainly many use cases out there where simple monoliths make sense, especially non-interactive data crunching. I'm not sure yet if D1 will ever be the right choice for those, but the Workers platform aims to provide many options.
Thanks for the insight, I greatly appreciate it! This definitely is a reasonable idea for many things and I'm looking forward to seeing something similar to the sharding mechanism in the future.
I've only started to think about this and I'm thinking the hardest part will be dealing with cross-cutting concerns (in a non-auto sharded world manually creating multiple database) and trying to find a way to keep each database isolated without extra burden compared to using a hosted Postgres.
As an aside, that lan optimized house was a gaming dream. Hope your new house is as awesome.
I've been working on a little project that uses D1 and have been hoping that is how D1 will evolve. A db per customer in my case. Is the colocation in a durable object so 'business logic' and sqlite can live in the same isolate for performance and security? Would that be a post 'out of beta' feature? Ive been building as a monolith thinking that the number of databases in the alpha was indicative of the future.
Sorry, I don't quite understand what you're asking.
In the future each DO will have a private SQLite database. The key/value store will actually be redirected to store into a special table in this database, but probably new apps will just use the database and not the KV store.
Separately from that, I would like to develop tools that make sharding Durable Objects (and D1 databases) easier. Today it's a pain to do manually. This is independent from the underlying storage model, though.
I mean, our object storage is called R2, our first database offering is called D1, and if we were to offer a fully Distributed Database then D2 seems like a good name.
So I hate cloudflare with a burning passion, at some point cloudflare decided me home IP was bad and has started flagging it all over the place, which leads to vast swathes of the internet now being inaccessible to me.
Have you tried using Privacy Pass? It is a browser extension which helps prove that your traffic is legitimate which should reduce the incidence of challenges from Cloudflare, even if your IP has bad rep. It uses advanced cryptography to do this without creating any new tracking vectors.
(Disclosure/disclaimer: I work for Cloudflare but in a different department; I'm not an expert on Privacy Pass.)
So this is even more insiduos, and an interesting way for Apple, Google, and MS to pay Cloudflare to use its dominating market position to help safeguard them from other incumbents: you either work to get your devices approved in this scheme, or your users get more Cloudflare CAPTCHAs preventing them from having a nice web experience.
All in the name of "privacy" and "security". Though of course, the only security that is served here is the ad network's revenue stream, who are the only ones who really care if a real device or a bot is accessing their services. For most other cases, a DDoS is a DDoS regardless of it being initiated by actual users (e.g. an HN hug of death) or by a bot network.
When I started using Firefox with the Temporary Containers extension I would get absolutely bombarded by Cloudflare's captchas. To the point where if I clicked on a search result that sent me to a captcha, I would just close the tab and click the next result (which often resulted in another captcha, rinse repeat). That's when I realized just how big Cloudflare had gotten.
I still use Temporary Containers and lately, I've noticed a sharp decline in these Cloudflare captchas. I don't know if its because people are moving away from them, or Cloudflare just found a better way to finger print me.
The issue I have here is why does Cloudflare take the blame? There are 2 parties (well could be more) - your ISP and Cloudflare. Who says it's not your ISP that is the problematic 1?
Nah, D1 is in a different league.[1] One key feature of D1 is having free read replicas around the globe. That's CF's DNA and I expect this to happen this or latest Q1/24, and again, for free (so, without any surcharge). Re storage: You get ofc more than 1GB on the paid plan.
Neon doesn't have read replicas and even if they already work on it, I wouldn't expect it before 2025 if at all and never at CF's pricing (Neon still charges for egress).
[1] I would compare D1 rather to Turso or LiteFS from Fly or PlanetScale with many read-replicas
I am pretty excited about D1 and in general a CF fanboy, so congrats to this huge milestione. But, we need more--following is just a wish list:
- Development since start took time (with a good reason), hope we see soon read replication and higher pace
- So, some roadmap _with_ ETAs would be great, e.g., will replication come this or next year?
- I don't care about read performance because I know that sqlite paired with CF's infrastructur will perform but I would love to have more infos on write performance; this is really sqlite's weakest point and I haven't seen any implementation which could compete anywhere with other dbs; sqlite slows down very quickly when doing a couple of small writes/sec
Then, I'd like to see the close competition here on HN, so db-providers with many read-replicas (CEO's and/or devs from Turso and Fly/LiteFS), commenting on D1 and how they compare against and what they plan to compete. This is a too exciting space and time to be laissez-faire.
LiteFS author here. I’d love to compare and contrast D1 with how LiteFS works but there’s not enough technical details at the moment. I’m looking forward to future technical blog posts from Cloudflare or perhaps they could open source their implementation.
There are a lot of ways to approach SQLite replication and one isn’t better than another necessarily. They simply have different trade-offs.
Hard to say without knowing what specific problem you have had but we are improving all the time. Several improvements announced just this week, take a look at the blog.
Among other things, we made Wrangler (Workers CLI tool) use the open source workerd by default for local development, so local dev should produce a much more precise simulation now (since it's literally running the same code).
I ended up moving the project over to Netlify + Edge functions. I had it all working in like 5-10 mins as it should. Took me two hours to figure out why Workers weren't working in my Pages project, and could never get Workers working properly with my Astro project.
I think you're working exclusively on the engine of Workers which is really top notch, but Cloudflare really needs to improve the outer layer which affects DX considerably.
This is my first time using it and I've been very pleased with it so far. It keeps it simple, has solid typing & schema building, and reminds me of LINQ. I'm also a thin models kind of person and the fact this is just an object without needing to build ORM classes is even better.
Still way to slow (coldstart, throughput and general latency), I was referring to the latest version which has still a convoluted, nonsense stack. I'd stay away from Prisma, there's enough info online.