Hacker News new | past | comments | ask | show | jobs | submit login
Things that surprised me while running SQLite in production (joseferben.com)
232 points by joseferben on July 3, 2023 | hide | past | favorite | 108 comments



One thing missing from this:

There are massive performance gains to be had by using transactions. In other RDBMSs transactions are about atomicity/consistency. But in SQLite, transactions are about batching inserts for awesome speed gains. Use them!


Cool!

This example from the better-sqlite3 docs makes more sense now:

const insert = db.prepare('INSERT INTO cats (name, age) VALUES (@name, @age)');

const insertMany = db.transaction((cats) => { for (const cat of cats) insert.run(cat); });

insertMany([ { name: 'Joey', age: 2 }, { name: 'Sally', age: 4 }, { name: 'Junior', age: 1 }, ]);


Along the same lines, SQLite is unlike other RDBMS in that you typically do NOT want to create a new connection per logical unit of work, because that means opening an actual file on disk each time. The more you can re-use a SQLite connection instance the better.

SQLite is internally handling the locking for you under most providers[0]. Any sort of external attempts at the same will just make things go slower. The only thing that could really beat the internal SQLite mutex is to put a very high performance MPSC queue in front of your single SQLite connection and take resulting micro batches of logical requests into transactions as noted above, or some more consolidated form of representation.

[0]: https://www.sqlite.org/compile.html#threadsafe


> SQLite is unlike other RDBMS in that you typically do NOT want to create a new connection per logical unit of work

i would've imagined that most RDBMS would have a pool of connections that get reused, rather than creating a new connection every time.


In Java-land it's usually up to the application/libraries to manage this. I see Hikari used a lot.


Inserting batches of items in one db transaction is faster in Posgres as well, in my experience. I expect that to be the case for all rdbms. Each transaction has a cost of "flushing", and every non-transaction query is implicitly a transaction. By doing multiple operations in one explicit transaction, the overhead of a transaction is amortized over more operations.


It's the same for mssql.

A bunch of inserts in a single transaction is generally fast and unproblematic. Though if you have weird clustered indexes and a lot of concurrency you might run the risk of frequent deadlocks.


Another useful trick you can do is to take advantage of memoization combined with BEGIN TRANSACTION / COMMIT to push at once a massive set of data.

What I have done so far was to collect safe data, that is not sensitive info like passwords and such, and save them in an array (in my case I used PHP language), and at a specific number of elements you know it's enough to commit in your database, you do so only once via transaction; in my case was 25K rows and worked flawlessly without users realizing anything at all.

I have even done the same thing with 100K rows and didn't even break a sweat!


That's a fantastic tip, thanks! I've been on the fence for using sqlite, I love the simplicity but always concerned about write performance.


If you're staring a Rust project then make sure to check that your SQLite dependency supports txs. Without txs, inserts can take a ~15ms. The `rusqlite` crate supports txs, with `sqlite` I didn't find an ergonomic solution.


I'm curious about how true that is under WAL mode, since a big part of that is implicit batching if i understand correctly.


still true, in my experience


>The first thing I've come to appreciate about SQLite is its documentation

I've told this story before, but the first time I used sqlite, years ago, I needed to store a floating point value, so I read the documentation: https://www.sqlite.org/datatype3.html

>REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.

I parsed this as "an 8-bit floating point number". Wow, I thought. That's weird as hell. Oh well, I guess it's sql lite, ha ha. I went on to implement storing numeric values as text strings, which worked exactly as well as you'd expect. Only months later did I find out that it's a normal regular IEEE double, 64 bits long.


Is the point of the story that no matter how good the documentation is, we don't read well anyhow? Else, I don't get the point.


Just a funny anecdote. There's not always a point.


I was confused as well. How you phrased it, it seemed that your funny story was about how cool the docs were and saved your ass or something like that :)


It wasn't his, you answered to someone else.

It's funny that under "maybe it was an example of nobody reading shit all" there is a comment where someone didn't read the nickname and thought he's answering to the anectdote's poster comment :D


Right now? Almost half past 12.


Well, I thought it was funny, but I read and understood all of the words.


That's a very salient point. And reading well all the time is super hard. I think that's one of the reasons why people don't write documentation. First of all, who's gonna read all that? Second of all, who's gonna read all of that correctly?


My dad knows very little about computers but he likes joking with me whenever i’m struggling with some technical problem: “8 bits is a byte! That’s all you need to know! How hard could it be?”


More advice for a kid contemplating engineering:

An EE only needs to know that you don't stick your finger in a socket. A mechanical engineer only needs to know that you can't push a rope. And a civil enginer only needs to know that sh*t rolls downhill.


TI C62xx and C64xx DSPs don't understand 8 bit bytes at all and everything was at least 16 bit. There was no uint8_t there, char is 16 bits etc.


That's awesome haha. I aspire to be a dad like that one day xd.


You should tell him this story!


might interest you both that not all bytes are equal


Finding out that there are just 5 datatypes and that BIGINT, VARCHAR, BLOB, etc. are just mapped onto those was actually a breath of fresh air.


Which I think is fine (though you could make a case for a proper datetime type). What I am a lot less of a fan is the dynamic typing, ie that if you have an int column, sqlite will happy store a string in there. I am not sure why they do that but to me, it lets some bugs go unnoticed.


There are STRICT tables in newer versions of Sqlite


I saw that complaint but for all purposes it's a moot issue

If your code allows it you're fucking doomed anyway and will have multiple other type-related problems all over the codebase.


There are a variety of ways that bugs like that could sneak through a codebase.

I was taught to think of a RDMS as the last line of defense against data integrity issues. SQLite, on the other hand, is more than happy to let you shoot yourself in the foot.


I mean if you write database code in PHP or JS fair enough but you're fucking doomed if you do anyway.


> I went on to implement storing numeric values as text strings, which worked exactly as well as you'd expect.

I imagine it went went ok!



>I went on to implement storing numeric values as text strings, which worked exactly as well as you'd expect.

It can be made to work. I would've gone with Binary-Coded Decimal if IEEE 754 standard isn't precise enough.


The comment about concurrency was strange. They have 24 x 5 + 2 x 2 * 60 = 360 writes per minute, or about 6 writes per second. They're not seeing concurrency issues because the amount of time that's spent doing writes is like 1%. Naturally retrying a couple milliseconds later should succeed almost all the time.


Why do you need to manually retry at all? Doesn't it end up in a queue and gets executed as soon as it can? I thought this stuff was handled by the database automatically.


Sqlite has a gotcha where if your transaction is upgraded to a write transaction half way through it is impossible to safely restart automatically in the dvent of a conflict. Which is why sqlite provides a "begin immediate" keyword to work around it if you know you are making a write transaction.


If it blocked you until it was finished, then you wouldn't be able to handle a timeout because you would be blocked waiting for it to finish, so you'd have no capacity to realise the task needed to timeout (unless you wrote multithreaded code to detect the timeout "out of band"). If it doesn't block you, then you won't know whether it's been committed or not and so now you have to write some kind of loop to continuously test whether the query you just executed has been committed, abandoned or is still in progress.

In principle, you could have some kind of configuration that says "try to commit this for so long, but if you can't commit it by then just abort". But you haven't really solved the manual retry problem, just made it somewhat less likely to occur.

As far as I know, given the costs of a request, most database servers tend to use the latter approach. But in SQLite requests are very much cheaper, and at least part of SQLite's work happens in your thread, so it goes for the former approach. But from an end programmer's perspective it's only a quantitative difference, not a qualitative one.

Also, "the database" is a highly ambiguous term, but in SQLite it can really only refer to a file on disk. There's no database server, it's just a library that you call to manipulate the file. So who even is the database server? Maybe it's the code you're writing. Maybe it's you who should be handling this stuff. And that's SQLite's decision, and fair enough too. SQLite isn't trying to be a database server, it never offered to be a database server, and you shouldn't think it's a database server. You should think of it as a library for executing SQL queries against a file that holds a relational database. If you think that's tantamount to being a database server, then consider what you normally mean by "server". Is a library for manipulating zip files tantamount to an FTP server?


Yeah I would imagine that too. Specifically that the later process would hang on acquiring the lock.


True, on their own those numbers are not that impressive. To give some more context: That load is average, there are bursts. I use ATOMIC_REQUESTS=True which ties the transaction to the Django POST request. Given that it's quite easy to run into N+1 queries using the ORM, I somehow expected to see "database is locked" issue more often.


Wouldnt database is locked only happen on concurrent writes where one is locking a row and timeout exceeded?

If there are n+1 reads I don't think it would lead to database is locked.


If there is a write in a read transaction (with a n+1 read), the transaction becomes a write transaction.


Fly.io has inspired me for one-human scaled production. There's a ton you can do at 100% uptime... When the user is looking. Which is not often! You've got a ton of time for maintenance and reshuffling your infrastructure around. You've got tons of downtime if you do it in five minute intervals in the evening, and plenty of concurrency if you've got just one user. Serving from a file at a time is great. You can snapshot and replicate every five seconds and never miss more than one or two actions if you have to fail over if you're doing so for a single user's data, and for a single user.

That's a tradeoff you make if you're a user, and it's pretty worthwhile. There's something to play with


Can you link more concrete details about this? It kinda sounds like you're referring to the design of sharding databases by user, which i almost forgot about (relevant to me now, so i appreciate it). Though i'm super curious in how synchronizing data between users would work.. ie notifications, single producer multi consumer data, etc.

Time for some researching :)


https://fly.io/blog/all-in-on-sqlite-litestream/ is top of the list for reading.

There's two classic pieces of wisdom that aren't so much wisdom anymore but pieces of the landscape: The first is, don't write your own datastore, use a database. This is pretty much taken for granted today, but in the year 2000, there weren't a plethora of web frameworks. At the time, the unix filesystem didn't seem like such a bad interface to store things. Storing each user as their own file made sense. Simple enough. One (just one of several) way it runs into problems is when:

The other landscape concept - Run multiple copies of your app. This one is kinda controversial to this day, but it's core to the "Twelve Factor App", which was a somewhat prescient memo for it's day, if looking a little long in the tooth to my eye[1]. It's why people keep using Kubernetes. And it's correct advice - If you need to be running a ton of concurrency, you are better off scaling by process and keeping heavy-lifting logic out of your database. If you keep the database and application on separate hosts, you can scale to one big database server for writes, and many application servers and read-only database shards. This works really well for most crud and render-heavy applications you see on the web today; Social networks in particular absolutely thrive on this model.

This brings up another application architecture anomaly that's mostly disappeared - The "App in Database". There's a mildly successful model whereby almost all application logic lives in the database itself as stored procedures. You can do an entire app update as an atomic DDL commit! The database's permissions handle everything, and users just connect as users. In the old MVC model, the Database tables are M, the stored procedures C, and you write a V on top. I've encountered a few instances of this paradigm, most notably an accounting software that worked pretty great.

Fly suggests throwing all of this out the window. Most applications don't need a lot of write-scaling. You don't even need read concurrency. Run a single shard, make that shard fast, performant, and replicable - When it's shut down, you can make many copies of it, and resume from any of them. When a new one starts up, it becomes the primary, all of the others are marked as stale, and you resume making copies. If you're using Litestream like Fly does, this makes for a drop-dead simple app environment. Alternatives, like I'm using at home, are ZFS replication - SQLite is snapshot safe, so I just snapshot frequently, replicate those snapshots frequently, and reverse direction when necessary.

[1] https://gist.github.com/GauntletWizard/1fd1298304e811529d4b9...


It's great that people are realising that SQLite is a solid tool to use everywhere. But it amusing that this article reads like: I tried a 25 year old database developed by an elite team in the fastest and best supported systems level programming language and it was good!


Yeah, I feel like there's this ingrained assumption people have that because SQLite can run in-memory that it's "not a real database". I've never really understood it (since I _love_ using it in my personal projects), but I've also never really had the chance to use SQLite in production.


I think the name gives it a poor initial reputation as well. Who wants lite SQL? That's diet SQL. Inferior.

SQLight has connotations of speed and energy and all those marketing keyword association goodies.


For some reason there's a lot of people on Reddit who post "Sqlite is only for toys".

In my experience Sqlite is great for everything, with a number of exceptions: https://dev.to/johntellsall/re-more-on-sqlite-vs-mysqlpostgr...


Fun article. Would be interesting to see any benchmarks folks have on a variety of devices and computer sizes. I'd expect most developer machines to be quite beefy compared to, say, my kids Chromebooks. Those are still far stronger than anything I grew up with, of course. In ways that make any intuition I have basically worthless on performance.


The author tested only 100k rows. That almost certainly fits in your kids chromebook's memory just as much as it would a beefy computer. It would probably fit in memory even on a raspberry pi or a computer from the 90s.


You reminded me of walking home from school one day in the late 90s and talking with a friend about how RAM had hit £1/MB, and how amazing that was.

Today I can buy it at much faster speeds and better timings for £0.03/MB. Damn capitalism.


2114 static RAM in 1985 was around £1/kB [1].

That's 7 orders of magnitude in 35 years.

[1] https://stardot.org.uk/forums/download/file.php?id=29208&sid...


I came to the same conclusion. I find it hard to extrapolate from benchmarks that I run locally on my dev machine.


Its pretty sad given how most cloud providers price out IOPS.

I've had to explain numerous times to our engineers that their laptop is running around 10x the IOPS of our database (~40,000). Table scans are actually more performant on a dev laptop than a replica.


Have you considered shipping your dev laptops to the cloud? That would make deploying easier too.


Now I'm imagining an alternate reality where a Docker Container is like a caddy for laptops lining up the thunderbolt ports for hotswapping. And the delivery manager is the courier driver.


That sounds like a more functional version of blockchain tbh


I'd like to see a fork of SQLite that "supports the PostgreSQL wire protocol and the majority of PostgreSQL syntax" (to quote another popular db).


I dont get why you would want this. Most of sqlite's unique benefits come from its unique design choice of being in process. If you did queries over the wire you'd just get worst of both worlds.


They're not suggesting to go over the wire I don't think. They're suggesting to make SQLite support Postgres linguo


I dont understand how to interpret "support postgres wire protocol" any other wsy than go over a network (even if only localhost)


This blog post might be of interest to you -

https://blog.turso.tech/sqlite-based-databases-on-the-postgr...


I implemented once the MySQL network protocol (in TypeScript) to talk to a local SQLite DB. We used that for our CI environment, since our apps only did very vanilla SQL operations. It worked like a charm. So I can totally see some uses for such a beast -- but probably all of them are pre-production.


Postgres has foreign data wrappers which can kinda achieve this(see also: https://github.com/pgspider/sqlite_fdw ), but ive been surprised by how well sqlite is already supported by many tools : sometimes it comes for free as its used in testing - hooking up stuff like litestream can work really wel l too


FDWs still requires an instance of Postgres to be running. With implementing the Postgres wire protocol, you could theoretically use any Postgres driver to talk to a SQLite database with no client changes and no instance of Postgres


DuckDb supports reading/writing to a SQLite database, while using the improved DuckDB syntax. Sounds too good to be true, but I keep meaning to explore if there are any obvious gotchas.


DuckDBs concurrency model is not ideal for every use case. It's either single read/write process or multi read process (no write).

Other than that it's awesome. I've stuck to its native file format though as scanning SQLite is slower.


But isn't this the point of being able to use both duckdb databases and sqlite databases from the duckdb runtime? You can mix and match read/write use-cases within the same duckdb db runtime?

Perform OLTP actions against sqlite db from within duckdb.

Perform OLAP actions against duckdb db from within duckdb.

Sometimes your OLTP content drives the OLAP query so read from both sqlite db and duckdb db at the same time.



AFAIK SQLite has weird locks, half year ago they weren't fair and writers could get timeout easily. For workaround we implemented external lock and it have been worked quite smoothly since.

SQLite's locking code looks like:

    while lock_is_still_held_by_somebody():
        sleep(3)  # 3 seconds


Did you use WAL?

But in general, funelling writes into one thread is the way to get around it


Yes. It's doesn't matter wal or default behavior.


Shouldn’t there be a random jitter added?


It's SQLite's locking implementation. In real code it includes some backoff but on long timeouts it reaches 3s. Jitter doesn't help BTW it's still unfair. On high contention each worker only has a slim chance to get a lock.


Is there a nodejs SQLite driver wrapper than does that hides the one writer concurrency issue for you without having to think about it? I never used SQLite on a server because of it (many times on desktop and other clients), but I do have reasons to do so, especially with them being so easy to snapshot and move around.


Maybe https://www.npmjs.com/package/better-sqlite3? Hide concurrency issues by blocking!?


SQLite serializes writes for you, there’s no need to think about it apart from handling SQLITE_BUSY


One thing that really confused me with sqlite is how read transactions that are upgraded to write transactions half way through tend to fail very easiltly under high write load and cannot be automatically restarted by sqlite. It all makes sense in their concurrency model but it was surprising to me. The tl;dr is if you know you are going to write data in a transaction but the first operation is a read,use begin immddiate, so you dont have to worry about having to restart things manually.


One of the best write ups I’ve seen on using SQLite in production is Consider SQLite: https://blog.wesleyac.com/posts/consider-sqlite

I also use SQLite in production and it’s been pretty great so far. Serves about 50 requests per second from a single threaded Node app on a low tier DigitalOcean machine. To be fair, most of those are cache hits but it’s still be nice. It’s saved me money by not needing a separate db server and the operational simplicity is great. Biggest speed up I found was turning on memory mapping (which is different from an in-memory database that the article talks about).


> In-memory SQLite is not too exciting

Do they ever call `commit`?


Sqlite being nearly as fast as a hash table might say more about how slow hash tables are than how fast sqlite is.

Sticking your data in a self-balancing BST usually gives you log n lookups, but most of us haven't internalised that log n with a small coefficient can be better than constant time with a huge coefficient on expected data sets.


So can you just run Django with SQLite (the default) now, with no special setup? And no data loss? Years ago I tried that when I made a simple web app for a one-off data collection at work. I had maybe 10 users on it and started to encounter concurrency problems. This was about 5 years ago. What's changed since then?


Works for me, so far no data loss after about 150 migrations across 3 projects. These are the settings:

@receiver(connection_created) def configure_sqlite(sender, connection, *kwargs): if connection.vendor == "sqlite": cursor = connection.cursor() cursor.execute("PRAGMA journal_mode = WAL;") cursor.execute("PRAGMA busy_timeout = 5000;") cursor.execute("PRAGMA synchronous = NORMAL;")

Initially, I was worried that Django might screw up migrations since it's a bit awkward to make certain schema changes in SQLite. Also almost no one seemed to use SQLite with Django in production, so I was a bit worried about how battle tested the SQLite part of the ORM was.

So far so good.


I wrote this about WAL mode with Django and SQLite last year: https://simonwillison.net/2022/Oct/23/datasette-gunicorn/

Your configuration looks right to me.


Thanks for the link.

> I asked Django Fellow Carlton Gibson what it would take to update that advice for 2022.

Glad to see that SQLite is being presented as a valid option in the current version of the docs.

> Even without WAL mode, bumping the SQLite “timeout” option up to 20s solved most of the errors.

Can you say anything about the tradeoffs when increasing the timeout, maybe in the context of Django?


I've not explored it in any more depth than I wrote about in that article and my issue comments.

My hunch is that if you bump it up to 20s while receiving a huge amount of concurrent traffic you may find yourself running out of available Python workers to serve requests.


Here are some notes I recently found online on this by simonw: https://simonwillison.net/2022/Oct/23/datasette-gunicorn/#be....

It seems to me like they removed the warning on SQLite, so I guess they tested it and it works fine. But I didn't research this any further so far.


maybe they found proper SQLite options

enabling WAL helps a lot as it allows concurrent reads and a write

https://www.sqlite.org/wal.html


I'm a bit bummed out that in memory didn't increase performance, any idea why?


I've got a hunch that the data was too small to make a difference: https://stackoverflow.com/a/764743


A rough guess without knowing any better, would be that their SQLite configuration isn't flushing to disk after every write or similar.

Maybe they ran their entire benchmark in a single transaction or something?


The "on-disk" test was really in-memory too, thanks to various layers of caching etc.


Sounds like they were testing it with a small enough amount of data that the disk version could just work from cache and didn't actually have to go to the disk


Can SQLite have multiple processes writing to the same database? Is this safe?


Yes, just not at the same time. If you can you want to enable WAL (that requires some shared memory primitives, so same machine generally), because otherwise writes block reads

https://www.sqlite.org/wal.html


> Yes, just not at the same time.

Does this mean that my processes need some synchronization primitive of their own to not to concurrent writes?


Afaik SQLite in WAL mode takes your writes but still serializes the commits, so no SQLite-specific synchronization primitives needed. You might run into SQLITE_BUSY, in which case you can play with the timeout https://www.sqlite.org/c3ref/busy_timeout.html


Can anyone point me to a good introduction on SQLite and (Vanilla) PHP?

I only tried for half an hour but I had problems everwhyere, sticking to the php.net docs, and wondered if it was me or the docs or the webhost...


can you really test the speed of a db with django?


if it's end goal is to be used by django, isn't that the only thing that matters? Django may very well be the bottleneck though


Absolutely yes as one DB might be used by multiple instances.

You could theoretically get 128 core CPU and use all of the cores for django process..


I do wonder if you can have enough uwsgi workers on a single machine to max out SQLite using Django.


TL;DR Using a slow thing is fine if you use it in conjunction with an even slower thing.


Can't tell if you're missing the point on purpose or just did no research, maybe due to the lack of evidence you cite


That losers write posts with clickbait titles


Another programmer who's not a tester is trying to come up with a benchmark and has no clue how to do that (and doesn't even know they need to know how to do that). Not surprisingly, it's a Python / JavaScript programmer...

That test of memory vs disk I/O performance is ridiculous. Nobody should pay attention to that.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: