Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: WunderBase – Serverless OSS database on top of SQLite, Firecracker (wundergraph.com)
179 points by jensneuse on Sept 15, 2022 | hide | past | favorite | 47 comments



This just runs a request proxy that turns off after 10 seconds of no activity and starts it up (with a half second delay) when there is a new request. It runs SQLite with Prisma. Prisma is an API server that puts a GraphQL API in front of a DB.

It's a nice blog post about gluing technology and I can see how this could be a really nice way to run some lower-cost databases in a non-demanding development environment. However, it is not a reliable way of operating a database. For me it isn't really serverless since it only scales between 0 and 1 instance whereas a serverless DB ideally would scale-out but should at least have some ability to scale to greater load in response to demand, along with higher reliability and availability, and backing up data to object storage.


Hey there, I'm Nikolas from the Prisma team. Just came here to quickly clarify this notion:

> Prisma is an API server that puts a GraphQL API in front of a DB.

Prisma is an ORM which generates a JavaScript/TypeScript client library for your database.

Your description is very true for Prisma 1 (which has been in maintenance mode for several years and is officially deprecated by now [1]), but the latest version(s) of Prisma (v2+) don't expose a GraphQL API any more. Prisma 1 also used GraphQL SDL for data modeling, the Prisma ORM on the other hand has its own, custom modeling language for describing database schemas in a declarative way and also comes with a flexible migration system.

That being said (and as Jens also mentioned elsewhere), the Prisma ORM does use GraphQL _internally_ as a wire protocol. However, as a developer, you _never_ touch this internal GraphQL layer and are not even supposed to be aware of it (you actually have to jump through a lot of hoops to even "find" it). It's also very likely that we'll replace GraphQL as a wire protocol in the future, so "GraphQL" really isn't something you should be thinking about as a developer who is using Prisma.

Hope that clarifies the situation a bit, let me know if you have any further questions around this topic.

[1] https://github.com/prisma/prisma1/issues/5208


Is there any post which goes into evolution of prisma from initial days of exposing graphql API directly to using it as wire protocol in the query engine?

And now the likely removal of graphql as wire protocol. What are the key reasons to remove the use of graphql query language?


> Is there any post which goes into evolution of prisma from initial days of exposing graphql API directly to using it as wire protocol in the query engine?

To be honest, I think the most comprehensive resource for this is a Twitter thread [1] that I posted from my personal account that explains the different turns we took in the product direction :D

Otherwise, there's yhr blog post "How Prisma & GraphQL fit together" [2] that also touches on the same topics but might be a bit dated since it was published before we released the Prisma ORM for production use.

> And now the likely removal of graphql as wire protocol. What are the key reasons to remove the use of graphql query language?

The short answer is that it would allow us to make us more optimizations for the bridge between JS- and Rust-land in Prisma Client. However, this is not an urgent issue for us at the moment so very likely won't become a priority in the near/mid-term future.

[1] https://twitter.com/nikolasburk/status/1384908813069869058

[2] https://www.prisma.io/blog/prisma-and-graphql-mfl5y2r7t49c


You can, of course, scale to >1 with the Fly Machines API. I don't know enough about how they're managing the SQLite part of this to say more about how this scales, except that I think scaling out SQLite is about to get a lot more interesting.

But I mostly agree that we need a better term than "serverless" for this kind of stuff. The big things people seem to want from "serverless" solutions are "not managing long-running server instances" and "true usage-metered billing".

Whether or not there are servers, like, at all has not all that much to do with things.


For many fly.io is not a viable alternative especially those that are already on the AWS train.

I think that AWS already offers Aurora Serverless v2 which pretty much accomplishes what a lot of these me-too-serverless services that won't integrate as well as something that is offered out of AWS.

Even if you were insistent on cloud-agnostic mandate (which is really not logical since there is at best 3 public clouds to choose from that are also vulnerable to targeted cyberattacks and faultlines), it would be hard to convince a large organization to switch to using Sqlite on Fly.io


> I think that AWS already offers Aurora Serverless v2 which pretty much accomplishes what a lot of these me-too-serverless services that won't integrate as well as something that is offered out of AWS.

People should just be aware that Aurora Serverless v2 won't scale to zero, and you'll pay for it even if you never use it.

https://www.lastweekinaws.com/blog/no-aws-aurora-serverless-...


ah thanks for pointing that out, since this is new its bound to change. will be surprised if this didn't eventually scale to zero but if I had to bet I would back AWS here going forward. This is way too critical for it's serverless stack to get the Cognito treatment.


Forget the Fly.io part; SQLite is what's interesting here. I agree in advance that it's unlikely anyone's going to convert a large app from Postgres to SQLite; if full-stack SQLite succeeds as a trend, it'll be with new apps that grow up using it.


interested to know what you see in sqlite here? why is there so much interest in this all of a sudden lately? am I missing something?


It's a database that in full-stack culture has been relegated to "unit test database mock" for about 15 years that is (1) surprisingly capable as a SQL engine, (2) the simplest SQL database to get your head around and manage, and (3) can embed directly in literally every application stack, which is especially interesting in latency-sensitive and globally-distributed applications.

Reason (3) is clearly our ulterior motive here, so we're not disinterested: our model user deploys a full-stack app (Rails, Elixir, Express, whatever) in a bunch of regions around the world, hoping for sub-100ms responses for users in most places around the world. Even within a single data center, repeated queries to SQL servers can blow that budget. Running an in-process SQL server neatly addresses it. Conveniently, most applications are read-heavy, and most performance-sensitive app requests are reads.


hmm but how would the replication and sync be handled if you have many sqlite instances on edge locations around the world? If someone inserts a row with id 234 and somebody from other side of the world does it, wouldn't this type of logic involve reaching into a central source of truth to compare the diff?

tryna wrap my head around this architecture, it is quite interesting but concerning that it is now sharding into close-to-local sqlite instances located near the user.


Yes: the model topology you should have in your head is "single writer, multiple readers" --- exactly the same way it would work with a conventional Postgres setup. What you're getting with SQLite here is that the reads themselves are served out of the app process rather than round-tripping over the network; otherwise, it's the same architecture.

(You're not generally "reaching back to the central source of truth to compare" things, so much as "satisfying the write centrally and shipping out the new database pages back to the read replicas at the edges").

More on this model: https://fly.io/blog/globally-distributed-postgres/


Interesting, do you have plans to support GPU as well? I can see this is a bottom up approach: put a low load instance close to the user for reads and have a globally synced write that should handle race conditions etc

Are there cold start delays? From the moment I type domain.com is it going to spin up a fly instance closest to me and serve the SQLite database reads?

I'm gonna give this a go this weekend to see what it can do


This is getting into Fly.io stuff and not WunderBase or SQLite stuff. GPU is a ways off for us: the programming interface for GPUs is tricky to implement with full isolation between VMs. The post we're commenting on talks a bit about cold start delays (a couple hundred milliseconds).


This is SQLite, so how would you scale itr to > 1? Certainly you can put an app tier in front of this DB tier and scale the app tier to infinity.


By replicating the SQLite transactions to other SQLite databases.


Hey, I like SQLite and I also like fly.io, but "distributed DB built with sqlite as storage" is really a very different beast from just "sqlite".


I'm mostly interested in the SQLite part of this, and we don't have an SQLite offering, only Postgres, just to be clear. So you can't hurt my feelings here.

When does SQLite become "a distributed DB built with sqlite as storage"? Did Postgres stop being Postgres when someone plugged log shipping into it? That's basically what we're talking about here --- not stuff like rqlite, which I'm also pretty interested in, but which really is a new database built on SQLite.


Users generally don't plug ad-hoc log shipping solutions into Postgres. They generally use the built-in battle-tested Postgres replication features, and they can setup synchronous replication to avoid data loss. Shipping a log is trivial but synchronous replication and failover are quite difficult to get right (see jepsen.io), and setting up failover for Postgres is still quite difficult. Newer DBs have been built from the groundup (CRDB, TIDB, etc) in part because of the difficulty of attempting to operate traditional DBs as reliable distributed systems.


They do now, but that wasn't always the case, and people didn't say that you weren't running Postgres when you did that.

Cockroach is not the same thing we're talking about here; it's a much more ambitious design, just like rqlite is much more ambitious than shipping SQLite transactions. What we're talking here is the tooling needed to generate a single-writer multi-reader cluster the way you would for Postgres, but for SQLite instead. I don't know if single-writer multi-reader clusters for Postgres qualify as "easy", but they're not science projects.

If it's not obvious: we love Cockroach. Our commercial bias is that we built a platform that is especially useful for distributed services and clusters, and Cockroach is very much that.


Okay, I read the fly.io documentation. What is being done now to globally distribute PG results is asynchronous replication and thus stale reads and presumably the potential for data loss during failover. So I can see why it would make sense to use SQLite to achieve a similar setup.

Still, I would expect the windows of read staleness and data loss to be much wider with an approach of just shipping logs rather than re-engineering things like RQLite, etc. Trying to ship a copy of SQLite to every app will mostly achieve lower latency at the cost of staler reads, although I can see how ideally it could cut the latency in half by streaming data ahead of time. But of course it increases storage costs dramatically as things scale out.


Sure, that seems very likely to be true! The flip side of the coin though is that rqlite really is a different database than SQLite. I'm not making a case against things like rqlite and Cockroach; I think they're great, and also my bread is buttered on the same side as theirs. I'm saying that if you're not going to use rqlite --- and most people aren't --- you can get SQLite to the same multi-reader single-writer cluster setup that most people do with Postgres.

And there's a distinct advantage to doing this with SQLite: if you can viably deploy SQLite (because your tooling is asymptotically approaching the state of the art for Postgres), your reads are ultra fast, because you're not hitting the network for an extra round trip every time you query your database.


I was pretty interested to see how this worked as well. I think you are right, this is a toy. It will be interesting to see if they can solve scaling.

It would be cool if you could ditch the graphql layer. It seems like there are other alternatives that go the vfs route so you still get to use a standard sqlite client.


As discussed in the post, the next steps are to add read replicas. Regarding backups, that's possible with Litestream: https://github.com/benbjohnson/litestream


Are you going to use LiteFS then for replication [1]? LiteFS replication is asynchronous, meaning failover can lose the latest data. Will LiteFS scale down to 0? Does scaling down to zero mean electing a leader when scaling back up to 1 and will that have a delay? Will the read replicas scale down to 0 along with LiteFS when the primary scales down to 0?

[1] https://github.com/superfly/litefs


LiteFS/Litestream author here. LiteFS will scale to zero with a persistent volume attached. For short-lived instances (aka serverless), we still have a few more features to complete on the road map (e.g. S3 replication, synchronous replication) to make that work well. Pure serverless (e.g. Lambda, Vercel) is also something we plan to support but we want to get LiteFS working well on more traditional deployments (e.g. longer running instances, Kubernetes, etc) first.


Couldn't find anything in the Prisma docs about it exposing a GraphQL API.



SQLite, KVM, Firecracker, GraphQL, Serverless... if this was written in Rust it would hit the holy trinity of all the HN buzzwords that pull a post to the frontpage ;)


The Prisma Query Engine is indeed written in Rust :D


500ms seems high for a cold start for a proxy layer. Is there anything I’m missing? JVM is pretty awful at cold starts but with graalvm I got a pretty big micronaut project starting in ~80ms.


Another way to do sqlite over S3 is:

- https://github.com/uktrade/sqlite-s3vfs (Read/Write) - https://github.com/michalc/sqlite-s3-query ( Read Only)


Slightly off-topic: according to what I read, it is a lightweight proxy written in golang that is capable of starting a vm when receiving network traffic and 10 seconds after the last request it turns off the fly machine.

I've been looking for something similar for some time to use in my development docker instances (specifically with dokku). I have many services that, although they consume little CPU time, they do have a high overall consumption of RAM, but they are actually used for a few minutes each day.

I don't want to use kubernetes for this as it adds too much complexity for the benefit I would get.

Do you know any solution similar to this, to turn on / off docker containers when network traffic comes in?


I'd love to say that the most interesting thing here is Fly Machines (i am bias) but really it's SQLite, which, no matter what platform you're using, makes it architecturally simpler to scale up and down, since you don't have to scale a database server up and down with your workload; the database is embedded in the app, which already had the scaling logic.

People have been sleeping on SQLite and are starting to wake up and I'm kind of psyched to see what else they come up with (another very cool example of a software tool that really plays to SQLite's strengths is Datasette: https://datasette.io/).


As soon as you start scaling the app beyond 1 replica, you have to handle data replication again


Not necessarily. For instance, Firebase's Realtime Database doesn't scale vertically or horizontally. The only option is to create additional, entirely independent databases. It's up to the application developer to implement some type of sharding scheme.


Of course.


pocketbase is great, I used it in several personal projects, and I really like it

https://github.com/pocketbase/pocketbase

PocketBase is an open source Go backend, consisting of:

  * embedded database (SQLite) with realtime subscriptions

  * built-in files and users management

  *convenient Admin dashboard UI

  *and simple REST-ish API


How are they handling real time subscriptions? Like, how do they figure out when a row is updated? Postgres has replication functionality, but how are they achieving same with SQLite?


The document for pocketbase's realtime is: https://pocketbase.io/docs/api-realtime/

The Realtime API is implemented via Server-Sent Events (SSE). Generally, it consists of 2 operations:

  * establish SSE connection

  * set client subscriptions
SSE events are sent for create, update and delete record operations.

Pocketbase is mainly a single application with SQLite embedded, so, whenever you update records through PocketBase's API, the main application always knows that.

This is different with postgres, since it may be connected by several different applications/servers remotely.


> Like, how do they figure out when a row is updated?

AFAICT because you can only update a row via the UI, it's easy to tell when a row is updated.

But if you're talking about raw SQLite, there's [1].

[1] https://www.sqlite.org/c3ref/update_hook.html


Serverless database. Loving it! Good-luck!


Welcome to the serverless era: https://aws.amazon.com/rds/aurora/serverless/


Wasn’t familiar with serverless Aurora, thanks!


>I do not recommend to expose WunderBase to the public internet. The intended use case is to run it on a private network and expose it to your frontend via an API Gateway, like WunderGraph!

My SEC team felt a disturbance in the force from me even considering this on our internal network. Security should not be a secondary consideration for a DB!


Seeing how many MongoDB instances were running wide open to the internet despite calls to not do so, your concern is certainly valid.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: