Hacker News new | past | comments | ask | show | jobs | submit login
SQLite the only database you will ever need in most cases (unixsheikh.com)
550 points by todsacerdoti on April 15, 2021 | hide | past | favorite | 370 comments



We've been using SQLite as our principal data store for 6 years. Our application services potentially hundreds of simultaneous users at once, each pushing 1-15 megabytes of business state to/from disk 1-2 times per second.

We have not had a single incident involving performance or data integrity issues throughout this time. The trick to this success is as follows:

- Use a single SqliteConnection instance per physical database file and share it responsibly within your application. I have seen some incorrect comments in this thread already regarding the best way to extract performance from SQLite using multiple connections. SQLite (by default for most distributions) is built with serialized mode enabled, so it would be very counterproductive to throw a Parallel.ForEach against one of these.

- Use WAL. Make sure you copy all 3 files if you are grabbing a snapshot of a running system, or moving databases around after an unclean shutdown.

- Batch operations if feasible. Leverage application-level primitives for this. Investigate techniques like LMAX Disruptor and other fancy ring-buffer-like abstractions if you are worried about millions of things per second on a single machine. You can insert many orders of magnitude faster if you have an array of contiguous items you want to put to disk.

- Just snapshot the whole VM if you need a backup. This is dead simple. We've never had a snapshot that wouldn't restore to a perfectly-functional application, and we test it all the time. This is a huge advantage of going all-in with SQLite. One app, one machine, one snapshot, etc...


- Don't run on network-attached or distributed file systems because you will have corruption with WAL


This is only true if you have more than one writing connection to the database. But that said, it is still a good idea to not have the database housed on a network attached store.


Opening connections to SQLite over the network defeats the entire purpose of using this library.


Apparently you didn't read the last sentence.


I definitely did. I was agreeing with you and restating the argument in a more general sense.


This might sound obvious, but it's not always apparent which applications uses SQLite.

I had lots of weird issues with Plex until I found out that uses SQLite, and moved the config directory from a shared NFS directory to a shared iSCSI volume.


You can keep sqlite on NFS if you disable WAL.


If you write the application yourself (or have easy means to modify it to disable WAL), cool

If it’s written by someone else you’d have to maintain a modified fork (if that’s even possible)


What are the disadvantages of disabling WAL? I would love to store a bunch of docker filesystems on NFS but since a lot use SQLite, they all complain.



> Make sure you copy all 3 files if you are grabbing a snapshot of a running system

Or use the ".backup" command in the CLI to take a clean single file snapshot if that's what you need. Or, you can call the checkpoint C API and if it can move all the transactions from WAL to DB, it will remove the WAL file when it's finished.


> Just snapshot the whole VM if you need a backup.

How do you deal with things like updates? Upgrades to your service?

Do you just accept that you have scheduled downtime when your service won't be available?


Yes - We have allowed maintenance windows each day.

That said, we have prototypes of architectures in which we have multiple instances running in the same production environment simultaneously, each with an independent SQLite database and some light-weight replication logic in the application itself. Simple DNS RR or customer-managed LB would be responsible for routing client traffic. There aren't a whole lot of entities that we actually need to synchronously replicate between application servers, so this is far more accessible for us to iterate on than throwing our hands up and jumping to some fully-managed always-on clustered database service and throwing away all of the lessons we've learned with SQLite.


Postgres is the database I use for everything.

I find databases of all sorts extremely interesting and I've tried many of them, of all flavors.

In the end, I always come back to Postgres. It's unbelievably powerful, lightweight and there's not much it can't do.

Why not use sqlite? Here's one example - I like to access my database from my IDE remotely - my understanding is that remote access is not possible with sqlite.

Another example - multitenancy is dead easy in postgres, makes things more secure and makes coding easier.

Another is that Postgres had an extremely rich set of data types, including arrays and json.

Also Postgres is designed for symmetric multiprocessing which matters in today's multicore world.


it's important to note that the author of SQLite wasn't "competing" with RDMS engines -- it's competing with flat file access.

To that goal, I think it's wildly successful, instead of writing files I almost always reach for SQLite first.

When systems share a database, I, like you, reach straight for postgresql. Most people reach for mysql, which I do not prefer.


Firefox uses SQLite for localStorage. When Firefox is force quit, the database becomes corrupted. https://bugzilla.mozilla.org/show_bug.cgi?id=1341070

This bug has occurred to me personally multiple times.

Does it really have advantages over a flat file?


Sqlite is certified for use in planes. This requires a high level of tests.

I think the odds are likely that Firefox is doing something unsuspected.

https://www.sqlite.org/testing.html

- Four independently developed test harnesses

- 100% branch test coverage in an as-deployed configuration

- Millions and millions of test cases

- Out-of-memory tests

- I/O error tests

- Crash and power loss tests

- Fuzz tests

- Boundary value tests

- Disabled optimization tests

- Regression tests

- Malformed database tests

- Extensive use of assert() and run-time checks

- Valgrind analysis

- Undefined behavior checks

- Checklists


"Valgrind is perhaps the most amazing and useful developer tool in the world."

It seems that about once a month I read something that makes me like SQLite even more. I guess that they like Valgrind as much as I do is the reason this month.


Valgrind is fine if you can live with the false positives, limited stack validation and execution slowdown. In a development environment, AddressSanitizer is a far better choice.


I am trolling on the person you to whom you replied...

<<This bug has occurred to me personally multiple times.>>

Hello, I am AirBus engineer (not really, only pretend). Does my plane crash? No, it does not!

Dear Random Internet Person who complains that open source software is broken. Are you joking? We are talking about SQLite!!??

SQLite has whole pages dedicated to "why you should NOT use SQLite". How many open source projects are /so/ good they can do this? Incredible!

The test coverage feels like NASA.


I use Firefox almost exclusively and have never encountered this.

Whatever capabilities SQLLite has to recover corrupted or inconsistent storage, it certainly exceeds what you get with flat files (none, unless you implement it yourself).


Per that bug report, you are likely to encounter it at https://codesandbox.io/s/sentry-js-sdk-unhandled-promise-fuq....

Which suggests an application error to me. I'd give decent odds to the real problem being some bounds check or pointer writing to the wrong place. To prove it, I'd instrument SQLite to log every single query that it was asked to do, then try to run those queries outside of the application. If the queries work fine, then the copy of SQLite in the application is being corrupted somehow.

In which case it really isn't fair to blame SQLite.


What's getting corrupted is Firefox's own SQLite DB.


I'm perfectly aware of that.

The question is how it is getting corrupted. Is it that SQLite has a bug? Or the copy of SQLite in Firefox is getting corrupted first?

Given the complexity and relative levels of testing of those two products, my bet is that the problem is Firefox.


This is quite surprising. SQLite is also used in embedded where you don't have comparatively civil "forced quits" but also straight power offs. SQLite seems to do everything it can to handle this gracefully.


if you're corrupting sqlite files you're for sure going to corrupt other formats in a force quit situation.


Why would that not happen to any other form of flat file storage? When you force quit a process, unless there's some nuance I'm unaware of, you're not only bypassing any graceful cleanup but there could be an FS operation in progress. Although I'm certain there are mitigation techniques, I would never expect a process to guarantee no file corruption when it's forcefully quit.

Interestingly enough, I don't think I've ever had this issue. I've force quit Firefox many times in the past (not often, but I'm sure I've done it countless times in the 10+ years I've been using it) and I still have localStorage data intact. There's old crap in StackEdit.io, which uses localStorage, that I haven't touched in years. (no, I have neither that service or Firefox itself set up to sync with anything)

EDIT: Maybe it's OS specific? I've been on macOS, and maybe there's something about other file systems where corruption is more likely to happen with a force-quit? Just throwing darts here.


I believe for a proper implementation it would be done transactionally. So if the program crashed then that record wouldn’t be written but the DB wouldn’t be corrupted.


I force-quit Firefox regularly (easiest way to get my tabs to stay open), on both Linux and Windows.

I've never seen any error messages from Firefox suggesting that anything has been corrupted.

Are there other symptoms that I would expect to see, if this corruption is happening?


Some sites break due to not expecting the exception being thrown– deviantart.com won't work, for instance.


Even when systems don't share a database, if operational concerns - disaster recovery, failover migrations, metrics and monitoring, anything like that - matter to me, I'm reaching for PostgreSQL or MS SQL Server. I hear I also need to get over my 20 year old grudge against MySQL.

SQLite is really only my first choice for hobby projects and desktop apps that aren't running on the JVM.


> I hear I also need to get over my 20 year old grudge against MySQL.

I hear this too - but silently truncating my data (causing a great deal of data loss and grief for me) is likely going to continue to bias me against MySQL for the foreseeable future.


That was configurable for ages and default is on strict for a long time now ...

Compatibility with old decisions vs. doing the right thing™ is always a pain ...


> That was configurable for ages and default is on strict for a long time now

I'm not saying I'm being rational in holding my grudge. I got bit by that "bug" _bad_ early in my career and the scar tissue lingers.


Ye I believe its best to think of Sqlite not as a database per se but more as a file format, kinda as you would a word doc but a file format that doesn't suck and is absolutely amazing at storing information.


And retrieving it, too.


If it is mostly text, then I would say git is better suited.


git is ACID?

I still believe Sqlite is the appropriate store. Unless.. does Git handle power issues elegantly? What happens if you yank power in the middle of a commit? Sqlite is designed to handle that.


Let's see:

- A - yes, update HEAD/branch text file is the key atomic update

- C - not sure how to apply this in the context of git, because commited files' content is kinda free form by design, no schema, no consistency problems

- I - there should not be concurrent use of commands, so no. practically this is mostly important for pushing to remote repository, and there it's handled by rejecting the push that comes second, if you make two commits from the same base and push them at the same time

- D - yes


that's one of the reasons I love Fossil so much. Way better backend than Git.


one of my team members really pushed fossil, and we did like the integrated wiki/ticket system. not bad at all.

... but it was so slow compared to git! at everything. merging, pushing, pulling ...

you did not experience same?


Fossil has rebasing? I thought they felt strongly against rewriting history, which is the biggest reason I am not considering it.


oops. corrected. thanks.

my main beef with fossil was performance. but i should clarify that this was 5 years ago or so. so maybe it's ok now.


For context, before the rise of SQLite, BerkeleyDB was often used for local file databases.

I never programmed for berkelyDB but I don’t believe it was SQL, so I think that’s what made SQLite so successful. The combination of an SQL api backed by local files


BerkeleyDB was a key/value store. I'm not sure how much uptake it actually had vs other solutions. Prior to widespread availability of SQLite3, it was not unheard of to write your bespoke format if you needed something sophisticated.


My exposure to BDB was through sendmail, spam assassin, and subversion. All three a big chunk of my job in 2006.


BerkleyDB is NoSQL and is internally used in popular No-SQL solution like DynamoDB.


I actually saw a really interesting article (probably on here) a while back about using SQLite files as your applications' document storage model, and I really like it. Like if you wrote a word processor, you could include a table of edits that would let you recreate Google Doc's timeline feature, among other things. I thought that was a really neat idea.


Apple's Core Data persisted-object-graph engine (kind of like Hibernate, I guess?) uses SQLite as its default storage backend.

There is a lot of macOS/iOS software using Core Data who already have "SQLite as an application document storage model" without even realizing it.




If competing with flat files, and assuming you don't need concurrency, I don't think it's better than just using one or more json or csv files. Those are much easier to manipulate and manage (same benefits in terms of admin and backup), and you don't need to use a database driver in your code.


> In the end, I always come back to Postgres. It's unbelievably powerful, lightweight and there's not much it can't do.

Well, depends on how wide your definition is:

Filesystems are also databases. (And eg inside of Google, they are using databases to store filesystems.)

These days, we often think of relational databases. Filesystems are more like the hierarchical databases of yore that Codd (also) talked about in his original papers.


Would love to be able to have a database file system and be able to use standard Linux tools to query and make changes, back up, etc.

For example, having a folder of contacts with each file named after the person and having key/value pairs. Similar to how static site generators use YAML/TOML/JSON.


Try your nearest shell to experiment with this? In my scripts and programs I've stored data just in files many times: simple, transparent and effective. Works everywhere. There's so much in search that you can just brute-force through unless you have a huge amount of data or a lot of concurrent tasks that it doesn't matter much in today's hardware.


The maildir format does something like this.



This reminds me a little bit of the filesystem from BeOS (RIP) and Haiku. You cant attach any metadata to any file you want, and it has a system to create queries on that metadata.


Very much miss this. I love how they used this technique to build their email client. Emails were just files on the filesystem with a filesystem plugin that made their metadata queryable. So the email client was just a file browser window with some pre-saved queries that would let you find all emails by date, sender, etc.


A job for fuse. :) I'd like having a nice sysfs like view on all tables and other objects in a database. Even better if editable.

Or alternatively a midnight commander for databases. :)


OMG yes. Now this would be perfect for SQLite.


If you do this, you can add all the files to git and get historical views, changelogs... and if you distribute the "database", you even can know who made what changes!


Along the lines of what I was thinking. Would be really neat! I already do this to an extent, but with a virtual file system it could automatically sort/categorize/etc like a regular database.


This makes me wonder how efficient filesystems are with millions of file in a single directory? Do they create some sort of index? Are there limits to the number of files a directory can hold? Is that what inodes are for? I remember seeing “inode” limits on some VPS I was using a while back.


Indeed, putting too many files in a single directory is inefficient, both for read and write. When writing a file (or changing metadata like permissions), the entire directory inode may have to be rewritten. When searching for a file or opening it, the entire file list for the directory needs to be read (in the worst case where the file is at the end of the list).

When you need to store lots of files on disk, it's a common pattern to spread them out in subdirectories. For instance, instead of `files/2d8af74bcb29ad84`, you would have `files/2d/8a/f74bcb29ad84`.


> When writing a file (or changing metadata like permissions), the entire directory inode may have to be rewritten.

That isn't how inode filesystems work -- if you change a file's permissions, it's just an inode update on the file -- not the containing directory.

Even in DOS-type filesystems (FAT/exFAT), it's just a record update in the corresponding dirent for that file.

If you add a new file to a directory, that causes an mtime update on the directory's inode.

The rest is accurate -- many older filesystems have lookup performance that scales poorly with directory size (for DOS filesystems and BSD UFS, you have to do a full directory scan). Also ls defaults to sorting output, which is O(N log N) and can be slow in large directories.


Large directory sizes suck on NFS and parallel file systems like Panasas, Lustre, GPFS, and the like. Python and Rust's Cargo also suck on networked file systems and would be greatly improved by pushing things into a sqlite file.


most things suck on networked file systems. I still have trauma from having to use NFS more than a decade ago, on systems that would freeze on boot because the network couldn't be reached or the NFS server was down.

I'd also never put sqlite on NFS. Locking is often broken on NFS. Unless you can guarantee a heterogeneous environment. I can imagine some Excel guy in marketing is going to launch his SQLite UI on Windows and completely hose it all.


Nfs locking broken.. excel? Sqlite ui on Windows? Are you sure you're not confusing SMB with nfs? I've had SMB cache files that ducks everything up. Not nfs.


That's sometimes true but it depends a lot on the filesystem used, and sometimes on the options used when building it. ReiserFS, ext4 and FAT 32 won't have the same performance profile.

I think breaking down large collections in subdirectories is mainly done in order to ensure that it'll work even with FS that don't deal well with very large directories and also because it makes inspecting the files manually a little more convenient given that many file explorers (and especially the GUI ones) can have trouble with large directories.


Yes, though I think there's some POSIX compatibility stuff that also makes big directories a hassle.


also ext4 (then 3 i think) used to look for dir entries linearly (it was a linked list of entries).


What about on the other end? Why not have each hex digit be a directory along a path? Then you have very, very few files per directory at the cost of deeper hierarchy. What's the practical downside?


Directories are just another type of file. So, if you do this, and your filenames are n characters long, you'll end up needing to do n file accesses just to find the file you're looking for. Unless the underlying file system does something to make that particular access pattern fast, well... it's going to be stupidly slow after a certain point.


> Unless the underlying file system does something to make that particular access pattern fast

I don't know exactly how Linux does it. Windows hands off whole paths to the filesystem, so this idea is possible there.

In FreeBSD, there is a generic routine (lookup(9)) that goes component by component, so at each step the filesystem is only asked to resolve a single component to a vnode. I think a clever filesystem implementation (in FreeBSD) could look at the remaining path and kick off asynchronous prefetch... but I am not aware of anything doing this.


I've played around with fuse a bit.

There are two modes you can implement for your filesystem: in the so-called 'high level' mode you get the whole path. In the 'low level' mode the filesystem asks you for one piece of the path at a time.

The low level mode seemed faster in my tests, and I think it's also closer to how Linux kernel works internally?


You then need to do more stat calls — one per hierarchy level — and you waste more inodes.


[epistemic status: amateur speculation]

I’d expect it to take up slightly more disk space. More importantly, the more indirections, the longer it’ll take to get to the file.


More overhead for each extra layer. Kernel serialization/locking, random seeks (dirent and inode per level), maybe other concerns.


The random seeks aren't a given: your filesystem could do something clever with memory or disk layout to avoid most of them.


Anything particular in mind? You could probably get a good academic paper or patent out of this.


if this was the case - even if rewriting the inode was O(n) - you would have noticed. try ls on a large directory, now touch a file on same directory.


"Modern" file systems (say, ext3 and onwards) scales pretty well.

The problem is that most tools that operate on file systems doesn't. Things like readdir() is a linear scan and takes a long time on a million files.

So in practice it's not optimal. A thousand files, no problem. A million, start looking at doing it in-memory (or use some database tool).


> Are there limits to the number of files a directory can hold?

Yes, depending on the file system.

For example, ext4 with default settings uses 32-bit hashes. Upon the first collision, you can no longer add more files to the directory (ENOSPC error).

Source: https://blog.merovius.de/2013/10/20/ext4-mysterious-no-space...


WinFS was an attempt at this. It'd be interesting to learn why it was cancelled.

https://en.wikipedia.org/wiki/WinFS


Because it tries to make people work in a way they don’t want to work. People want to organize their files in folders. They don’t want to tag them all with labels or tags and they don’t want to input all that metadata.

Perhaps they want to tag a few, and it’s useful to have some autodetected metadata but they don’t want to tag them all and they don’t want a gigantic ‘Untagged files’ list. They want folders and they want more than a flat folder list, they want nested folders.

You can implement it, it’s not hard and most modern file systems have all the features you need. But users will hate it and won’t use it the way you want.


You are right that users don't want to do that busy work.

But I am less sure users actually want folders.

Some power-users, sure. But most normal people don't want to deal with folders, either.

For evidence: look at the guy who saves everything on his overflowing desktop.

Any system that allows people to find their stuff, and perhaps make a few annotations, will be good for them.

Google Photos is almost a good example: I don't have to do annotate anything, yet I can search for eg pictures of snow or by location.

(I say only 'almost', because while impressive, that system isn't good enough yet to find obscure stuff or to work on contextual cues like 'those pictures I took at home after we came back from shopping sometime in the last few months'.)


That’s all very nice but a filesystem needs to be able to deal with every type of file on the planet. Which means you can’t automatically detect the contents.

And really, a lot of users don’t want an interface that doesn’t allow them to do what they want just because someone else just dumps all their files on the desktop.

Apple tried this on iCloud and had to go back. Because, while it makes for nice presentation and usability, there’s a lot of users that it can’t cater for.


you can :) even across nfs ... maildir etc ...


But filesystems are secure (mostly, ignoring unicode issues) whilst sqlite is an insecure hack.

You'll have hard time to harden sqlite, removing all the insecure defaults, fix the broken and exploitable full text search apis, but esp. its built-in hacks. Like explained here https://github.com/rurban/hardsqlite or here https://research.checkpoint.com/2019/select-code_execution-f...


> inside of Google, they are using databases to store filesystems.

I'm struggling to picture this, do you have any links I could read?


Googler here, I think OP is referring to the relationship between colossus and bigtable [1]. Basically, what happens is that colossus uses bigtable for storing metadata. So, when you issue a command like `ls /some-cluster-directory/myfolder/mysubfolder` you are really querying bigtable instead of a distributed ls on a cluster fs.

It is much more complicated than that, but the idea is that using bigtable you create a resemblance of a fs that feels like a fs to use for the most part.

[1] http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-Google-Ke...


Yes, that's something like I was going for.

(I'm an ex-Googler, but I am glad that my spotty memory gives me ample protection against accidentally giving out company secrets.)

I suspect Google Drive is also backed by something that's not a traditional file system, even thought Google Drive tries to look a bit like a file system to the user.


From the outside one clue is that you can have two files with the same name, another is the url to a file doesn't change when you move it. Also it's dog slow to list a directory, like 30sec for 50 files.


You can experiment with something kind of like this yourself pretty cheaply. Here's an example [0] user-space filesystem that stores its data in an in-memory sqlite file.

You could just as easily replace those in-memory calls with a networked DB (perhaps with speculative pre-fetching or something, I dunno, I probably wouldn't try to make a python filesystem too performant).

The salient detail here is that as far as your kernel is concerned a filesystem is an API for interacting with data (whether that's with a daemon process like the linked example or with raw function calls built into the kernel). Those APIs can and often do interact with structures physically stored on a local disk, but that isn't a requirement.

[0] http://www.rath.org/pyfuse3-docs/example.html#in-memory-file...


You used to be able to store your own filesystems on google's databases [0]. No idea if this still works, though.

[0] http://sr71.net/projects/gmailfs/


Filesystem is like a database. Except for transactions, which is very important when multiple processes can access it.


I think the difference of a Database and a Filesystem is:

In a Filesystem you know where the Data lives, in a Database you know that it lives...but yeah with stuff like Ceph that view gets a little bit foggy.


What do you mean by `you know where'?

For me, all I know is that my filesystem store its data on disk somewhere, I don't really have any clue how that's organised. That's pretty much exactly the same level of knowledge I have about my db.

(And yes, I could look into both of them to learn more. And yes, we also have databases and filesystems that get accessed over the network..)

File system paths perhaps sound like a location, but there are no more and no less a location than eg tablenames to me. Or URLs.


> Here's one example - I like to access my database from my IDE remotely - my understanding is that remote access is not possible with sqlite

Not sure exactly what you mean by 'remote access', but if it's to debug something on a remote DB from an SQL IDE, I do this quite often. I use sshfs to mount the remote filesystem and open the SQLITE file in DBeaver.

Even in case of Postgres, you would have to connect to a remote DB via SSH or via VPN. It's just that you will be using sshfs to connect to a remote SQLITE file.


Note that using SQLite over SSHFS is dangerous if you're not the only user of the DB, because SSHFS does not support file locking.

According to: https://www.fossil-scm.org/forum/forumpost/8749496886

> There's a very real possibility that you could corrupt the SQLite DB by running it over SSHFS.


If you're just using sshfs for RO queries for debugging, it should be harmless.


It will try to replay the journal on open, though, even without running a query if I'm not mistaken. (There's a read only flag for open, just have to be sure you're using it.)


I wasn't aware of this. Thanks for the information. Is this true even when you use a read-only sshfs mount?


No. Read-only cannot mutate the database. I don't know if Sqlite ensures you won't observe inconsistent results if there is a concurrent (single) writer.


The docs [1] and this [2] suggest that you'll probably observe inconsistent results.

WAL mode supports it ("reading and writing can proceed concurrently"), but "does not work over a network filesystem" like SSHFS [3] because it either requires shared memory on the same machine, or exclusive locking [4] (which SSHFS does not provide, and which again does not allow readers concurrent with a writer).

Thus overall, you can only rely on your read queries over SSHFS if you know that there is no concurrent writer.

[1]: https://www.sqlite.org/lockingv3.html [2]: https://stackoverflow.com/questions/1005206/does-sqlite-lock... [3]: https://www.sqlite.org/wal.html [4]: https://www.sqlite.org/wal.html#noshm


I just scp that son of a b down and run all the queries on the SQLite file locally.


Sounds like a pain for larger databases.


Well it depends how large you are talking about. With todays internet speeds, anything under 10GB is a less than a few minutes.

I once accidentally locked a production MySQL database with some kind of recursive sub query.

Running queries locally on a copy avoids lock/mutation issues, and I have a snapshot to re-run queries on if I need to go back and see the source of the data.


I guess there's a lot of cases I just want to peek into a table live without pulling it all down. I've only used postgres so it's weird to me you can't talk to any MySql style DB like it does.

Anything that needs to be coded I'll pull it down though, it's nice to have options.


> In the end, I always come back to Postgres. It's unbelievably powerful, lightweight and there's not much it can't do.

How lightweight?

SQLite itself is around 600 KB. That's with all features enabled. I believe you can get it it down to about half that if you disable features you don't need. RAM usage is typically under a dozen or so KB.

It doesn't even require that you have an operating system. All you need for a minimal build is a C runtime that includes memcmp, memcpy, memmove, memset, strcmp, strlen, and strncmp, and by default also malloc, realloc, and free although it has provisions for providing different memory allocators, and you have to provide some code to access storage if you are running on a system that doesn't have something close to open, read, write, and the like.

This means it can even be used on many embedded systems far too small to run a Unix.

Heck, it is not even too unreasonable to compile SQLite to webasm and include it on a web page if for some reason you need client side SQL on the web.


Depends on what you need. It does not seem like a great idea to run Postgres in a browser, but it is usually less resource hungry than the JVM-based app using it to return HTML or JSON, so I can call it lightweight for server use.


> Heck, it is not even too unreasonable to compile SQLite to webasm and include it on a web page if for some reason you need client side SQL on the web.

With the deprecation of Web SQL, I highly doubt this is a use-case anyone cares for anymore.

Anyhow, I see your main point: SQLite is indeed "Lite", so it makes sense to use in embedded or small single-consumer cases. I just don't think those are very interesting tools to make in 2021, we expect collaboration and concurrency, so PostgreSQL wins in my imagination-space. I am a web developer though, so I am inherently biased.

I also am not the person you were just discussing with


> With the deprecation of Web SQL, I highly doubt this is a use-case anyone cares for anymore.

WebSQL didn’t die as a standard because noone wanted client SQL on the web, it died because SQLite was the only reasonable way of providing it, preventing multiple really-independent implementations.


There are people compiling sqlite to webasm, so it's a use-case some care for.


> This means it can even be used on many embedded systems far too small to run a Unix.

A good example is that SQLite is used inside many video games. You wouldn't put Postgres in your game engine.


Is using something with that small a foot print even worth it in today's age? Surely postgres and other don't have THAT big of an overhead, at least for the features they provide.


I like everything about Postgres except how arcane it is. It feels like 1980s Unix software, and everything about it is manual. It seems like there are multiple ways to do many things and I never feel like I am sure I am doing it the "right" way (as in both correct and modern and what you should do).

I've been watching CockroachDB as a more modern alternative that can auto-scale easily, do master/master easily, etc. Those are all painful on Postgres to the point that hosted and managed pgsql is a cash cow for cloud vendors.


It can’t run embedded. Having no live parts is a real paradigm shift for so many scenarios.


What does "can't run embedded" mean exactly?

I've run Postgres in a pure RAM only configuration, never touching the disk except to start.

If you mean "it can't be compiled in to another application as a library", then yes that's true - it's not an embedded database.


"can't run embedded" means you have to have a dedicated DB server process. For anything I would use a database for this is a non issue, but if you want a DB in your mobile app (for example) it might be.


So if SQLite is typically run embedded in the client application, does this mean it's not supported to have multiple clients operating on the same SQLite DB concurrently, or is there some support for this?


It uses filesystem locking mechanisms to synchronize concurrent access from multiple processes. One writer, multiple readers.


This is covered early on in the article.


It's mentioned briefly in the article, but no instructions are given on how to properly configure busy_timeout and WAL to scale above more than a few concurrent writes per second.


It’s sad when most people discuss things based on a one line title of something they haven’t read. Many questions here were answered in that article. I recently asked someone on a different site “didn’t you read the article?”. The obvious answer I alluded to was “No, I don’t have time.” Yet they managed to post about 15 different comments asking more questions about the article (title). I really wish there was a way to have a public forum with proper discourse with the caveat that people must read the material before they comment on it.


The technical blog of the public broadcaster of Norway has implemented exactly this: https://nrkbeta.no/2017/08/10/with-a-quiz-to-comment-readers...

The quiz module is open-source (https://nrkbeta.no/2017/03/06/our-comment-quiz-module-is-now...) and available on github https://github.com/nrkbeta/nrkbetaquiz


Thanks for the links. :-)


Kind of - your web application needs to partially handle that, and SQLite also depends on the OS filesystem to handle concurrent access.


Can it be contained within the application, like SQLite?

Or has it to run as a separate process?


But if the metric is "need in most cases," then that's fine. You don't need an embedded database in most cases.


The other way around. For most problems you don't need a DB. Only for the subset of those that do you need a dedicated process/container/machine.

We have learnt to use full-fledged RDBMSs as a default because they proved really flexible and powerful over the last 30+ years. But they do have limitations and cost.


And Postgress can be used as an event store + document store.

It also has javascript V8 functions to update data for migrations.

( eg. migrating your events to a new model, instead of keeping the different versions)


You can store files/documents within SQLite dbs as well.


There is JSON support, yeah.

But while you need to do everything with a JSON1 extension for managing the json ( https://www.sqlite.org/json1.html ).

Postgress has build-in v8 javascript functions to manage this ( as stated before). Which makes it much more useable.

There's a good library for .net that handles postgress documents, which has an insane amount of usefull functions ( https://martendb.io/documentation/documents/ ). I think halve of those wouldn't be possible on another database.


You don't really need v8 JavaScript functions built-in in the database when you use SQLite, because SQLite is running in the same process than your application. The round trips between your application and SQLite cost almost nothing.

Also see "Many Small Queries Are Efficient In SQLite": https://sqlite.org/np1queryprob.html


I was literally talking about using v8 functions when managing data ( eg. updating event versions should be a clear use-case)

Round-trips and small queries don't have anything related to my comment?


Since SQLite is embedded in your program, why not use your programming language directly. SQLite can even callback in custom functions you define. For example in Python: https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne....

The main reason why main client-server databases have so many built-in functions is because of the round-trips. Let's say you want to filter rows based on the result of some function, if your function is implemented client-side, then the database would have to send each row to the client.

"Relational database engines act as bandwidth-reducing data filters." (https://www.sqlite.org/whentouse.html)


I didn't consider it that way, thanks.


My initial comment was a bit cryptic to be honest :)


You resolved it perfectly.

My experience with flatfiles dbs was not enough to interpret it correctly :p


It's unbelievably powerful, lightweight and there's not much it can't do.

I agree, but there is one thing that it cannot do, one very important, critical thing: synchronous multimaster replication.

Well, vanilla PostgreSQL cannot. Vertica can.


What makes multi-tenancy easier in Postgres than in any other DB?


I wrote a full multitenancy row level security system in 3 short Python functions in Django. Simple, complete, compatible with Django's architecture. All the other multitenancy solutions I researched were complex and fiddly and heavyweight and required all sorts of considerations and caveats to implement and run.


One of postgres's most underrated features. RLS is amazing, can be unseen/basically work silently if your programming language-side tools are good enough, and is documented well (like everything else):

https://www.postgresql.org/docs/current/ddl-rowsecurity.html

But the power of PG is that it doesn't stop there, if you combine this with a plugin like temporal_tables and you can segment by user and time:

https://github.com/arkhipov/temporal_tables

All of this mostly unknown to the thing that's accessing the DB. If that's not enough for you, why not add some auditing with pgaudit:

https://www.pgaudit.org/#section_three

All this value is just out there. There's even more if you can stomach browsing "ugly" sites like PGXN. Most of it works out of the box, though you may need to tinker for performance and some edge cases but it's there.

I think it might not actually be hyperbole to say that Postgres is the greatest RDBMS database that has ever existed.


I’d love to see the code if you can share.


I'm preparing it into a minimal open source project over the next couple of weeks. I'll post to HN when it's ready.

The simple explanation is that you set a postgres environment variable prior to each query. The Postgres row level security system looks at this variable and returns nothing if the ID in the variable does not match an ID in the table.

So I wrote a function in Django which intercepts every database query and prefixes each query with a postgres environment variable set command.

That's pretty much it.


if you dont mind sharing, I'm curious what the methods + method signatures are!


I've seen mysql used in multi-master ha configuration (percona-based solution) successfully, do you have anything to recommend for a true multi-master postgresql solution (ideally open source)?


Postgres-XL and Postgres-BDR are both multi-master. I don't think they're open-source.


> Why not use sqlite? Here's one example - I like to access my database from my IDE remotely - my understanding is that remote access is not possible with sqlite.

I found https://kristaps.bsd.lv/sqlbox/ to be an interesting approach.


Can you share more on how you do multitenancy? Are you using it for web/mobile? What language/Library do you use to access the database?

I’m building a multi tenant web application on node/sequlize/feathers and implementing logic in code. I’ve seen table row level security but haven’t looked deeply.


Refer to this comment elsewhere in this thread: https://news.ycombinator.com/item?id=26817468

I use Python/Django.

I use Postgres row level security.

Read this entire comment thread for more info from other posters on row level security.


By multi tenancy is dead easy in Postgres, are you talking about schemas?

If so, have you run into any scaling issues with the number of tenants you can support. I looked into using schemas for multi tenancy but the various reports I found said around 50 tenants or so things started breaking down very quickly.


Refer to this comment elsewhere in this thread: https://news.ycombinator.com/item?id=26817468

No I am not using schemas - I found them to be brittle and complex as a solution for multitenancy.

I use Postgres row level security.

I use Python/Django.

Read this entire comment thread for more info from other posters on row level security.


A lot of IDEs and other graphical tools can point towards a SQLite database. Does it really have to be local though? I assumed you could hit it from any external file share? Or did you mean something else?


You can remote access sqlite database with sshfs or curlftpfs. Or simply sqlite cli over ssh


What's wrong with MySQL? Genuine question.


There’s nothing “wrong” with MySQL/MariaDB - it powers huge portions of the web.

Postgres has a larger feature set. Whether or not you need it depends on your use case. There are ample pages out there that will give you a side by side comparison if you’re curious.


Cool article. I definitely agree that SQLite is the way to go in 99% of the cases you're making some kind of app.

I have often seen Postgres, MySQL and MongoDB in places where they were really overkill and added unnecessary network and deployment complexity. And upon asking the designers it turns out they simply didn't know about SQLite...

Small gripe in case the poster reads this: There's a malopropism in the opening bit:

> SQLite is properly the only database you will ever need in most cases.

Should be:

> SQLite is probably


I'm not sure I understand why PostgreSQL would be "overkill" for something? It's very easy to set up and operate, almost anyone in the industry either knows it or they should be willing to learn it as it's the de facto standard for almost any modern company using an SQL database.

Also it's very easy to buy PostgreSQL from various DBaaS providers. There's a docker one liner to set it up on any development machine and you need to run it just once.

Sure, SQLite is a little bit easier but it's so much less powerful as a database. Why not just go to Postgres directly and leave SQLite for what it's intended: embedded programming like mobile apps or in-car entertainment systems etc.


> I'm not sure I understand why PostgreSQL would be "overkill" for something?

Because its a separate server process and install rather than an embedded library. On disk, in memory, in CPU, in config and admin workload, and in pretty much every other conceivable way its heavier than SQLite. It offers a lot extra , too, but if you don’t need the extra that it offers, its overkill.

> Also it's very easy to buy PostgreSQL from various DBaaS providers.

Yes, it is. Which reduces some burdens and increases others. Nonlocality is a cost (also, paying for a DBaaS is a cost.)


> I'm not sure I understand why PostgreSQL would be "overkill" for something?

For one it requires more expertise to set it up and monitor. SQLlite is way easier to use. And if the app is coded right, it can be easy to replace SQLite with a more powerful client-server DB like Postgres when necessary.


Malopropism! That has just become my word of the day. Thanks.


Then note that the correct spelling is "malapropism"


I'm just gonna pretend the misspelling was intentional :)


"Statues of Limitations," was always my favorite malopropism.


"You should of done something"

"This has no affect on me"

They're everywhere.


After cleaning up some spilled toner, someone put a very indignant notice under the photocopier:

"It would be who of you to be cautious openening a new toner cartridge."


I recently heard someone say "cow tailed" instead of "kowtowed"


One example is Ubiquiti Unifi management software using MongoDB, you end up with 4GB+ of space usage just to run some basic software because of the prealloc files.


> malopropism

Malapropism? Can just call it a typo (which is incorrect but people get it).


If you wanted to recommend a less descriptive yet more commonly known word, why not use "error" or "mistake"?

Typo literally means the opposite of what the OP wants to convey.

A typo is when you pick the right word, but type it wrong.

A malopropism is when you pick the wrong word, but type it correctly.


In 2021 a typo means an autocorrect fir free.


Always fun to see this sort of comment under an article about using the right tool for the job.


Are you imploring that 'malapropism' is too sesquipedalian and that one should generally eschew obfuscationism?


I think this is a bit of a case of "Why use many word when few word do trick?"


I use Gitea(Github clone) locally with SQLite running on ZFS where I can take atomic snapshots of both the SQLite database and git repositories.

This would not be possible if I used a remote database, as there is no way to do a atomic snapshot of both the remote database on another filesystem(even if its ZFS) and the git repository.

Developers of multiple services, please support SQLite and don't lock yourself to a single database like PostgreSQL and MySQL. Even if the service is embedded in a Docker Container it is still painful to manage.


> I use Gitea(Github clone) locally with SQLite running on ZFS where I can take atomic snapshots of both the SQLite database and git repositories.

ZFS snapshots may not result in a consistent databases backup of sqlite. You should use VACUUM INTO and then do a snapshot or use the sqlite Backup API.

See my other comment: https://www.sqlite.org/backup.html

Essentially, while you have a proper atomic snapshot of changes on disk the in flight transactions won't be there and you will be on the mercy of having a sucessful recovery from the journal/wal if you have that.


When I back up a database, I expect to get a consistent snapshot of the database, which includes all committed transactions. Uncommitted transactions can't be included in this, since there is no guarantee they'd ever commit.

From your answer it sounds like I'd get such a snapshot (even if it relies on recovery from a journal internally). So I don't understand why you see a problem here. Is the recovery process unreliable?


Here's what SQLite has to say about the matter in their article "How To Corrupt An SQLite Database File" [0]:

> 1.2. Backup or restore while a transaction is active

> Systems that run automatic backups in the background might try to make a backup copy of an SQLite database file while it is in the middle of a transaction. The backup copy then might contain some old and some new content, and thus be corrupt.

> The best approach to make reliable backup copies of an SQLite database is to make use of the backup API that is part of the SQLite library. Failing that, it is safe to make a copy of an SQLite database file as long as there are no transactions in progress by any process. If the previous transaction failed, then it is important that any rollback journal (the -journal file) or write-ahead log (the -wal file) be copied together with the database file itself.

So no, it doesn't appear that it is safe in general to just copy the file whenever. If there are transactions in progress, things can go wrong. I don't quite understand why (isn't this the same as if a power failure happens, and SQLite is resistant to that?), but this is what the docs say.

It might be different with ZFS, since the backups are atomic (and this doc might assume that they are not), but I'm not 100% sure I would rely on it.

[0]: https://www.sqlite.org/howtocorrupt.html


I think this warning only applies to non-atomic copies. Basic copy tools will copy a file piece-by-piece, where different pieces can have different age if the file is modified concurrently, which results in corrupt output.


The only problem with atomic filesystem snapshots is the time required to perform recovery which can be avoided. The recovery process is of course reliable.


Does recovery take a long time? Do you have any numbers?

I'd have expected recovery to take time proportional to the size of recent/uncommitted transactions, which should be quick, even for large databases.


I don't have experience with large SQLite databases. I had PostgreSQL take over 40 minutes for recovery on a 2 TB database set to checkpoint every 30 minutes. How long it takes will depend on the WAL size, the frequency of checkpoints and the operations that were performed before a crash/snapshot.


That's not how database backup works: either you use a tool like RMAN, which knows about internal Oracle database structures, or you must shut a database down in order to do a cold backup. As there is no such thing as database shutdown in SQLite's case, if you want a clean backup before doing a ZFS snapshot (and shipping it off, else it's not a backup), you must shut down the application which is writing to the SQLite database.


Since power-loss is (hopefully) equivalent to an atomic storage snapshot, a database which doesn't produce a consistent backup via snapshots can't be safe against power loss and thus lacks the durability property of ACID.

What's unsafe is using a naive file copy tool (e.g. `cp`), which non atomically copies a running database.


Without taking the aforementioned steps I specified, power loss is not an atomic event, because most databases nowadays rely on the fsync(2) system call to tell them when the I/O has completed. If the fsync(2) call is unreliable because it is lying about completing the requested I/O, the RDBMS stands no chance of guaranteeing atomicity. I am sorry.


Are you sure? Since for PostgreSQL an atomic snapshot is enough. It is only when you do not have access to atomic snapshots that you strictly need to use the online backup api. And, yes, it will have to do a crash recovery when you first use the backup but if you cannot trust crash recovery you can't really trust your database at all.


You are correct and what I had in mind was the required recovery. I do trust recovery but performing it can be very time consuming.


It is important to mention that a "Remote" database could run on the same host ...

I run PostgreSQL on my VPS alongside gitea and some other services, and I can take snapshots of it just fine (assuming I don't take a snapshot as it is writing to disk, but that problem exists with SQLite as well).


1. You can run Postgres locally using the same ZFS for storage you'd use for the sqlite database. This should come with the same backup properties as sqlite.

2. I doubt that the git part of your setup can be snapshotted correctly while an operation is in progress, since git doesn't use transactions.


Git does use transactions, in a pretty clever way: the main storage is immutable -- so when you do a new commit, it creates all new files, and as a last step, updates HEAD/branch, using "rename" call which is atomic on all filesystems.

So you can do pretty bad things to git and it would still work. Atomic snapshots are absolutely no problem. Copying the directory while in progress is fine.

If you try very hard, you can still make a bad backup -- say you were copying files from remote host, and backup got interrupted after HEAD but before the new data. In this case, there is reflog -- append-only file which keeps HEAD history. Using this file, you would be able to manually recover into previous state.


I've deployed Bitbucket and Postgres as a docker stack under the same subvolume and done snapshots on that. As long as you have WAL enabled in Postgres there shouldn't be an issue, and I've restored multiple times without problems.


This article says you can back up your database by just copying it, as it’s a single file (and apparently copying it while it’s in use is fine?). It also suggests enabling WAL mode. But WAL mode means you now have two files. Does the use of WAL introduce any concerns while backing up?


This is untrue and a sure way to corrupt your database[1].

From sqlite.org on how to corrupt your database:

> 1.2. Backup or restore while a transaction is active Systems that run automatic backups in the background might try to make a backup copy of an SQLite database file while it is in the middle of a transaction. The backup copy then might contain some old and some new content, and thus be corrupt.

> The best approach to make reliable backup copies of an SQLite database is to make use of the backup API that is part of the SQLite library. Failing that, it is safe to make a copy of an SQLite database file as long as there are no transactions in progress by any process. If the previous transaction failed, then it is important that any rollback journal (the -journal file) or write-ahead log (the -wal file) be copied together with the database file itself.

What you need is to follow the guide[2] and use the Backup API or the VACUUM INTO[3] statement to create a new database on the side.

[1] https://www.sqlite.org/howtocorrupt.html

[2] https://www.sqlite.org/backup.html

[3] https://www.sqlite.org/lang_vacuum.html#vacuuminto


If WAL is enabled it's easy to backup your sqlite db by using VACUUM INTO <file>.

More detailed docs: https://www.sqlite.org/lang_vacuum.html#vacuuminto


The .wal file comes and goes, so occasionally things like rsync will fail if the file no longer exists by the time it tries to transfer.


SQLITE_FCNTL_PERSIST_WAL can be used to avoid the .wal file being completely deleted. Also useful for when you need clients without write access to the database directory to always be able to open it read-only.


The .wal file shouldn't be backed up though.


If the .wal file is not backed up, then any information in it will not be backed up. Depending on the settings for the WAL, this can be quite a lot. You would need to do something like the VACUUM INTO trick described above if you want to safely backup the sqlite database from a live filesystem, because there is a risk of the WAL and database being out of sync. I’m not sure that SQLite, good though it is, can get around the WAL being taken at a different time to the rest of the database.

The problem here isn’t the .wal file, but doing backups from a moving filesystem. The solution is to use snapshots. ZFS supports snapshots natively, and Linux’s device mapper supports snapshots of any block device. [0]

[0] https://www.kernel.org/doc/html/latest/admin-guide/device-ma...


Why? An atomic filesystem snapshot is very similar to power loss. Since WAL is required to handle that scenario, it's just as important when backing up through snapshots.


Not just WAL, there are other files like SHM too.


https://stackoverflow.com/a/55424784 it's saying you have to copy 3 files to backup everything


Over the last few weeks I built a Rails app, after a ~10 yr hiatus. I love Rails.

Rails switched to SQLite for at least the development environment.

As soon as I started to test the app with multiple clients ( browsers, websockets, embedded websockets ), I was hit with concurrency errors.

So I switched to PostgreSQL ( which I love ). But I had to.


SQLite can handle concurrent access but it needs to be used correctly. Each connection should only be used by one thread.


In my experience SQLite is fine for single process concurrency (just push everything onto a worker thread). However multiprocess concurrency support is very primitive even with WAL support enabled.

Internally SQLite relies on file locks to mediate access and this seems to necessitate very coarse locking (lots of table locks).

Comparatively the client server model of other databases like PostgreSQL seems to allow better coordination between writes and much finer grained (row level) locking.


In both the worker-thread and table-locking models writing is essentially single-threaded.

What matters is the performance. Have you seen a difference between these two approaches?


My experience is that this doesn’t necessarily hold true in practice. The concurrency primitives being used are different. I’ve put a war story below :) but it’s kind of beside the point.

My main point was that other DBs like PostgreSQL implement row level locking so can handle concurrent writes much better than SQLite.

As for my war story I hit on false deadlock detection problems writing to two separate tables from two processes with an fkey between them. Fun fact if it thinks there’s a deadlock SQLite will not call your busy callback. It will just immediately fail. I know the deadlocks were false positives because my solution was to move my backoff logic out of my busy handler and instead catch SQLITE_BUSY and call sleep myself. Identical backoff values but the “problem” disappeared.


You can run into problems if you have two threads handling two requests. Each of them might have a shared read-lock as they are getting data, and when they both try to update to a write-lock to do some update, one of them will be canceled (throw an exception: SQLITE_BUSY).

This seems to me like a Rails bug, it should be serializing transactions when running against SQLite.


Can you elaborate/other (maybe lesser known?)/best practices related to this?


SQLite has good documentation. It covers multithreading and multi-process access too:

https://www.sqlite.org/faq.html#q5

https://www.sqlite.org/faq.html#q6

https://www.sqlite.org/threadsafe.html

TL;DR:

- Multiple processes can read the database, but only one can write at any given time; this is enforced using FS locking.

- For threading, there are three modes SQLite can operate in: single-threaded (unsafe for use in multiple threads), multi-threaded (safe to use in multiple threads, as long as each thread establishes its own connection) and serialized (do whatever you like, however you like).

- The caveat for threading is that if SQLite was compiled with -DSQLITE_THREADSAFE=0 (i.e. single-threaded), then multi-threaded and serialized modes cannot be enabled at runtime, because locking code gets compiled out of the binary.


By "concurrency errors", do you mean you got SQLITE_BUSY errors on your transactions?


Uhhh, yes, I believe those were the errors ( not 100% sure ).


So I'd be interested in the HN hive-mind on this one... I've never done DB programming before a few years ago, and have basically only done it with SQLite. So when I ran into this sort of error, I just figured it was normal, and wrote code to check for the BUSY and LOCKED errors and just retried the whole transaction in a loop. TFA also mentions `PRAGMA busy_timeout`, which looks like it would do something similar (but with a timeout).

Is that not normal for databases? Can't you get a transaction conflict which needs to be retried for Postgres as well?


No, databases that support multiple connections "properly" make sure that the connections are isolated, a connection only sees committed data and can run any valid transaction at any time. It's actually a large part of database complexity, dealing with concurrent access performantly and correctly.

The same thing would happen with PostgreSQL if you used a single connection in multiple threads.


As I said, I'm not a DB expert, but "any valid transaction at any time" is clearly impossible. Suppose you have two transactions T1 and T2. T1 reads records A and B and then writes record B; T2 reads records A and B and writes record A. Then suppose they're interleaved as follows:

    T1: Read A, B
    T2: Read A, B
    --- Point X ---
    ?
Once point X has been reached, there is no way to order the subsequent writes such that transactional consistency can be maintained: one way or another, one of those sets of reads must be re-issued. Maybe a good DB can detect simple cases like this and make sure they don't happen; but in the general case, the only way to make that happen is to restart the whole transaction.

So for the first post in this thread, is SQLite is causing concurrency issues because it fails a second write transaction immediately, or is it exposing concurrency issues already in the code, which is papered over by Postgres managing to avoid conflicts in many situations?


No your intuition is right. At its highest levels of consistency settings Postgres will also throw an exception which application code is supposed to listen for and retry the transaction as needed. This is functionally the same as what you're doing with Sqlite.

The main difference is that this can happen more often for Sqlite because it has far coarser write-locking (where only one writer can proceed globally at a time) whereas Postgres is much finer so in practice this is less of an issue (that is B has to be the same database row in Postgres whereas B could be totally different bits of data in Sqlite). Postgres can also relax various consistency guarantees if they're not required, but that's effectively "papering over the problem" as you say.

Although I would say that the presence of these exceptions isn't necessary a concurrency bug. Just automatically retrying the transaction is usually perfectly fine from a correctness point of view (this is optimistic concurrency vs pessimistically locking things). The only reason Postgres doesn't do that for you automatically is that done blindly you can waste a lot of performance on retries and the application may want to control what it does there to minimize performance impact (aborting after n tries, exponential backoff, etc.).


I think you are wrong about this, see my sibling comment. Postgres solves this by using MVCC.

If you were right, there would be tons of example code retrying database transactions. In popular webframeworks, in Wordpress, Joomla, etc.

And it is just not there.


> If you were right, there would be tons of example code retrying database transactions. In popular webframeworks, in Wordpress, Joomla, etc. And it is just not there.

This is one thing that seemed really strange to me when I was learning this -- all the guides talk about (say) parameterized queries, and people mention transactions, but nobody says what to do if there's a conflict. I basically had to work it out myself.

I'm not a DB person, but I am an Operating Systems person, and I am absolutely confident that there is no clever system that can work around the scenario I described above without 1) restarting one of the transactions, 2) failing one of the transactions, or 3) putting the DB in an inconsistent state.

All I can think of is that in the vast majority of cases, people are getting lucky: It just happens that people almost never modify the same data at the same time; and if one out of every million operations fail, but do so in a safe manner (not corrupting data), people just chalk it up to your website being quirky or something.

EDIT OK, I followed the link of the sibling, and read up about the SQL standard's various levels of transaction isolation. I'm not sure what my example would do in the "Read Committed" case (apparently the default for Posgresql); and I can't imagine programming anything important when I didn't actually know how the default consistency guarantee would behave.


From https://www.postgresql.org/docs/12/transaction-iso.html

For the Repeatable Read isolation level:

> Applications using this level must be prepared to retry transactions due to serialization failures.

For the Serializable isolation level:

> However, like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures.

I haven't looked into the codebases of Wordpress or Joomla to figure out how they do things, but I've certainly written code myself to listen and retry for these Postgres exceptions (and that code does occasionally fire).

That beings said these errors are fairly rare in practice because of how fine-grained Postgres's concurrency is and depend pretty heavily on your access patterns. It's possible for many production applications with certain access patterns and without too much load to go almost their entire lifetime without seeing these errors (or only seeing them when their users do something pathological).

MVCC ensures that concurrent reading isn't blocked by both reading and writing (not both concurrent reading and writing) and that retrying is fine. If you concurrently read and write to the same piece of data MVCC can't help you without some sort of retry (because that would require basically omniscience in figuring out what the proper merge strategy is).


That is because nobody is running in these isolation levels.

The only time I have dabbled with those levels was with a work queue and concurrent workers competing for the work ( rows ).


* shrugs * I mean I know teams that have run in those isolation levels for production apps but sure you generally don't need it if your queries aren't particularly complex (and indeed they are higher than the default level set by Postgres).

But then you really are "papering over" the problem as gwd says and the Sqlite transaction is giving you higher consistency guarantees than the Postgres transaction (in particular successive SELECT statements in your Postgres transaction are not guaranteed to be consistent with one another).


> That is because nobody is running in these isolation levels.

Any isolation level below Serializable isn’t fully ACID. (Particularly, its not “I”.)


Postgres always presents a consistent state and in particular does this with MVCC [0]

It is SQLite which papers over concurrency in favor of simplicity ( the Lite part of the name is no coincidence ).

[0] https://en.wikipedia.org/wiki/Multiversion_concurrency_contr...


> Can't you get a transaction conflict which needs to be retried for Postgres as well?

Yes.


Ugh a dynamically typed SQL database, no thanks. I always find I have to build quite a bit of a "data layer" on top of SQLite for example when using it in mobile apps.

Also you have to build some basic stuff yourself like Enum or JSON support. It's not hard but honestly I would feel much better using PostgreSQL in those situations.


It's not really dynamically typed in the sense that there isn't a schema.


However the schema does look like

type schema = { field1: any, field2: any, ... }

AFAIK the column types are not really enforced and you can even leave them out.

https://stackoverflow.com/questions/2489936/sqlite3s-dynamic...


AFAIK it's even worse than that. The field types can trigger lossy transformation of the input data, but aren't enforced.

https://www.sqlite.org/datatype3.html#type_affinity


Take a look at https://cgsql.dev/. It's supposed to be a lightweight and strict type-checked data layer on top of sqlite.


For JSON support, there seems to be an extension:

https://www.sqlite.org/json1.html

Haven't used it, was just wondering if there's something as I found myself quite happy with JSON support in mysql/aurora.


If you want an super easy way to back up your sqlite database, check out https://litestream.io/

It will continuously backup your database to an S3 compatible service. There are very nice detailed instructions for various backends (AWS, Minio, B2, Digital Ocean) as well as instructions on how to run as a service.


Hmm, that could be a good solution for backing up a bitwardenrs install, which I think may have more than a few users as of late.


If anyone is looking for an open source Windows/MacOS GUI for SQLite, this one has worked well for me:

https://github.com/sqlitebrowser/sqlitebrowser


I'm too late to edit but this client also supports Linux and has a FreeBSD port.


I really like SQLite but I also like postgres because it supports more complex data types. Sure you can do the same in SQLite by destructuring the data over multiple tables but it's just not as easy or straightforward.

This is why developing with MongoDB is so fast and easy in the beginning. You just go, store shit like you structure your data in code which makes the mapping extremely easy in most cases.


I really agree with this article. I actually built an App with Sqlite as the backend that works locally, on kubernetes, on the mac app store , on Raspberry PIs, and almost anywhere and it has performed fantastically for years now. The biggest thing is that you have to make sure you always wrap your SQLite queries in a transaction, otherwise you can get corruption: https://github.com/yazz/visualjavascript


SQLite is great! But if you have mainly analytical queries over your data, you might want to give DuckDB [1] a go. It's an embedded database that uses a columnar representation similar to MonetDB (it's from the same group).

[1] https://duckdb.org/


I completely agree!! We see 20-100x performance from DuckDB over SQLite for OLAP style queries.


Every other day there's a "Use SQLite for everything" article on HN, yet like all RDBMS it's good for certain things and not so much for other things.


SQLite is cool and all, but using ever with the caveat most cases really is indicative of that it is not ever. Additionally, I would contest not even most cases. The post is correct in that SQLite often is fine though.


> SQLite is cool and all, but using ever with the caveat most cases really is indicative of that it is not ever.

I think the "ever" is about "will I ever need to upgrade to a server-based database in this particular case". I.e., in cases A and B, you might do OK for a year or two with SQLite, but if your project grows, eventually you'll have to upgrade to Postgres; in these cases, SQLite is not the only database you'll ever need. But in cases C, D, E, and F, you'll never have to upgrade; in these cases, SQLite is the only database you'll ever need. The cases where SQLite is the only database you'll ever need outnumber the cases where SQLite is not the only database you'll ever need; thus, "SQLite is the only database you'll ever need in most cases."


However, in many of your cases C, D, E, and F perhaps it is not even a data base you need, but a simple .csv or .json file...


If the data changes, it's nearly unconditionally a better idea to use SQLite over those things. CSV and JSON are serialization formats for data exchange between systems, not something suitable as application storage.

SQLite is also much more convenient for anyone who later wants to explore your application's data than JSON structures.


I love SQLite. Used it to set up a small PHP tool which allowed users to check if their data was present in the latest Facebook leak. Took 30 mins of coding, incl. data import. The tool got picked up by the national media and the tiny 1GB VPS got quite hammered with traffic (±50 lookups a second, 5.5M records). But it didn't break a sweat. The only snag I ran into was running out of file descriptors/sockets, which was easily solved.


I personally think that SQL and tabular data should be built into languages in a manner similar to text and regular expressions. I get his (partially) in python by using pandas and pandasql, which (to my understanding) initiates sqlite in the background. There are a few other modules that do this as well.

Recreating relational table query operations as a set of operations unique to a specific programming language instead of integrating SQL makes, to me, as much conceptual sense as recreating a unique text pattern matching search for each language rather than integrating regular expressions.

This is only a small sliver of the topic here, but I do think that SQLite is probably a good backend for this.


I'm not sure I completely get what you're after.

So let's say a concept of SQL was "built-in," the language now understands the syntax: So what? Without access to the underlying data the SQL is about it is just as meaningless as a raw string (i.e. you cannot know if a query is valid without the underlying data to validate it against).

If your language now needs a persistent connection to some underlying SQL data-source (with all the problems that entails) building it into the language is barely better than just executing SQL during your tests.

So I'm not really sure what you want to accomplish or what value you believe this would add.


I think the link I put below probably does a better job explaining what I'm looking for. This has mainly to do with dataframes (not unstructured text). Here's the link again in case this thread gets long and you don't know what I'm referring to:

https://pandas.pydata.org/pandas-docs/stable/getting_started...

Generally, I vastly prefer the SQL operations to the pandas ones, though (and this is very important) only when pandas is essentially recreating what is in SQL's sweet spot. For example, you can use pandas operations to do joins, aggregations, filters, and so forth. I would rather write that code in sql.

I would not prefer to generate summary statistics in SQL, find correlations between columns, or do other things that are in the realm of scientific or statistical programming. There will be a grey area in there, for sure. I also find that many things that require SQL trickery (such as self-joins) often have a very very simple pandas solution such was cumulative sums on a column. So I go back and forth between SQL and pandas quit a bit (as each operation returns a data frame).

Just to be clear again, some people just can't stand SQL and want to stay away from it as much as possible. Other people, like me, greatly prefer it, but even for us there are scenarios where we'd much rather use pandas than get into leetcode style SQL trickery.


That's basically what linq is. It's of course not identical to sql, but one big reason for that is sql is difficult to autocomplete. Linq purposely flipped things around to help with that (ie 'from foo select something' instead of 'select something from foo')


That's basically what an ORM is. In fact Django uses SQLite out of the box and their ORM overrides lots of Python's operators so there we are!


I haven't used an ORM for some time (back when I programmed in Rails), but my experience and understanding is that an ORM largely replaces hand written SQL for a number of CRUD-style queries to manage object relational mapping for a persistence tier, and addresses a different issue than native vs SQL based operations in tabular environment such as as Pandas or R-dataframes.

What I'm talking about is this:

https://pandas.pydata.org/pandas-docs/stable/getting_started...

I personally am not especially interested in recreating relational set operation using pandas operations. There are things I'd much rather do in pandas (such as summary statistics), and some things I'd much rather do in SQL elaborate JOINs and aggregations. I do admit there will be a grey area.

Interestingly, there are a lot of people who absolutely can't stand SQL, whereas I (and a lot of people) vastly prefer it.

As for ORMs - I actually did like them back when I did this sort of programming (managing the back and forth between objects and tables was honestly very boring), though once I was into reports, I often went straight to raw SQL. I no longer do that sort of work, and almost all code I now write is for analytical purposes, so I don't do any CRUD.


I have used pandas and the way it is used in Python is quite similar to how you would interface with the Django ORM.

Btw I've written lots of analytical types of queries using Django ORM (to power the backend for a dashboard API, for example). It is quite powerful. You can even do window functions and such directly with the ORM.


I'm having a little trouble understanding this - are you writing SQL to do the window function, or does the ORM provide a non-sql way to do the window function?


I'm using the ORM to do a window function, without writing any SQL directly. In fact I've done queries with several window functions, aggregations, and joins in a single query all from the ORM without writing any SQL. It is much easier to read and maintain than raw SQL too imo.


Ah. Well, that clearly works for a lot of people and sounds similar to how pandas works, though it's actually the opposite of what I'm describing here, which is the option to write SQL directly against a tabular data frames. For now, this is possible in Python through what I'd describe as out of the mainstream open source modules that are reputable and written by good programmers but may not be actively maintained. I get the impression that a sqldf in R is a bit more mainstream among R programmers, though I'm not sure of this.


Site gives a "App Error: Error: read ECONNRESET". Is this site powered by SQLite ;).


This isn't an error code from SQLite. Usually ECONNRESET is related to a TCP/IP connection.


Of course it is.


Sqlite is awesome and just enough for many many uses. The thing I wonder if it will come or if its deeply embedded in the DB design are column types. Having everything as a string is the main difference with a standard DB for me. Maybe some optional typing could be enforced ? Anyway, I'm not criticising, being more than grateful for this awesome under the radar but cornerstone piece if tech, just like curl


Everything is not stored as string. You set a column to number and pass a number, it's stored as a number.

SQLite doesn't enforce it. But people forget the major reason for DB enforcement is multiple application on one database.

This pattern has fallen out of favor in general for all databases (we prefer one service layer in front of the DB, and then multiple apps use that service layer).

And when you have one app or one service layer, that's where the enforcement can easily come from.


> But people forget the major reason for DB enforcement is multiple application on one database.

While that may be the major reason there are other reasons that are just as valid. Such as multiple developers working on a single application, loading "wrong" values into the database for example.

If SQLite enforced typing it would be a great alternative for me, but the way it is right now it is "nice", but not nice enough to beat PostgreSQL in any way unless I need an embedded DB. At least in my opinion.


All you have to do is put a column constraint on there. CREATE TABLE t ( intcol CHECK (typeof(intcol) = ‘integer’));


> If SQLite enforced typing it would be a great alternative for me, but the way it is right now it is "nice", but not nice enough to beat PostgreSQL in any way unless I need an embedded DB. At least in my opinion.

You still have the option of embedding Firebird if you want enforced types AND embedded operation.


You could say that for applicative code static typing : it's only useful for multiple callers of your apis. But it can be useful. More generally, the multiple applications can be: the one I write, the one I will refactor to, the one my colleague writes, sqlite3 when manually called for misc. operations and/or sqlite3 when called by admin scripts.


It is actually the same in static typing. JavaScript didn't need formal classes and argument and return types, until Node, NPM, and other large scale projects made those necessary (ergo, TypeScript).

Even when using TypeScript, typing your local variables isn't THAT useful. Some do it out of sense of diligence, but the odds are you know what's the type of a local variable in a 30 line method, just by looking at the code.

But for libraries to cooperate, or even people to cooperate within a single large project, that represents objects and functions used by multiple callers who don't know that code by heart, and aren't looking at its implementation while using it. Hence, types.


In "most cases"? I take it by most cases it means, non enterprise applications with no load and basic use-cases? I can't think of a single application for SQLite at any company I've ever worked for, perhaps my first job for a small e-commerce company. Every database I work with has hundreds of GBs of data at the minimum and has to be highly available (i.e. tolerate any DB server dying with zero downtime) and in most cases, extremely low latency. I.e. respond to hundreds or usual thousands of requests per second with a response time ranging from 100ms to 1ms. Recently most of the databases I'm working with are responding to 10000+ requests per second at an average request time of less than 1ms.

Sure, for embedded databases with not much data and a single user, use SQLite, but for actual serious applications it's a toy.


> but for actual serious applications it's a toy

I tend to disagree. As do a lot of others [1].

[1]: https://www.sqlite.org/famous.html


Again, these are not distributed high throughput applications. This is a database as a bank-end for single user applications or otherwise small-scale usage.


There's a huge range between "a toy" and "high throughput applications", most of which are still serious business.


If you need HA then SQLite is out of the question, but query latency is usually lower than for MySQL/Postgres - with an embedded DB you are avoiding network latency. See also https://sqlite.org/np1queryprob.html


How can you have an embedded database if you have tens or hundreds of application servers which all need to be able to access the same dataset?


Of course it is not a good fit for SQLite. I was pointing out only that its latency is not bad (at least for for read mostly workload). There is common misconception that SQLite is OK only if the load is very low (<1rps), but in practice you can use SQLite for much higher load if you app stays within one host.


what would be the tail latency when 1000s of requests waiting to write?


> Even if you start out small and later need to upscale, as long as your web application can run on the same machine as the database, which it can in 99% of the time...

It's not really "upscaling" if it still can be handled by one machine though isn't it? And it might be much cheaper to handle the same load on a number of weaker "machines" than a single powerful "machine" (replace "machine" with cloud VMs).

And for such "small-scale" problems, it's also not a big deal to run a "proper" DB server process on the same machine (but with the advantage that it's absolutely trivial to switch to multiple machines).


Yeah, I don't get some of these "advantages", I have a standalone server running MySQL and another running Postgres. No problems for years. Being a DBA for these things isn't even a 1hr a month job these days, I'm not sure I did any DBA work since 2019 unless you count the migrations.


I’m no expert, but isn’t having your entire DB as a single file on the same machine as the application a security risk for a web service?


That doesn't seem very different to a client-server setup. If your DB is on a different server and your app is compromised the attacker has similar access to the DB as if it was on the same disk.


If you have access to the application machine, you will likely have access to DB credentials, no?

Actually if anyone has some more to add to this discussion, I would be interested to hear, in case I am missing something. I am currently using an SQLite database for a small internal Django app.


You will need to have the database accessible for the process serving the web page and there's no way around that.

When using a database stored elsewhere, the config file will still contain the user name and password, but the user's permissions could be restricted in order to prevent access to parts of the data in the database.

In the real world though, 99% of all web sites run with a user with full permissions anyway, so if the site gets compromised, you're screwed regardless.


> 99% of all web sites run with a user with full permissions

In my professional career it's 0% for public facing apps, the simplest custom CMSs based systems I maintain has seperation between database users for a 100MB DB. Many of the newer ones have a sync to read only copies for most of the actions that the app needs to do. 99% sounds horrible, I can't even imagine how you could get to such a state!

Sure considering how bad login is handled every where (even in SAML), it's not that surprising.


> I can't even imagine how you could get to such a state!

1-click-installers for Joomla, WordPress, Drupal, etc. Most shared hosting providers give the database user full access to the entire database as default.


> In the real world though, 99% of all web sites run with a user with full permissions anyway, so if the site gets compromised, you're screwed regardless.

Yeah, I was kind of working under that assumption.


It makes things very convenient for hackers.


But also for the ops, as it means securing one place instead of two.


Is SQLite a good choice for a user facing file format?

I.e. were the user can access the database file in Finder/Explorer and make a copy at any point in time to share, backup, etc. Would that copy be consistent?


Using SQLite databases as an application file format is encouraged by the SQLite team:

https://sqlite.org/appfileformat.html


As long as it’s not actively being written to, yes. Even enabling WAL won’t save you from inconsistent backups without use of the backup API or atomic file system snapshots when writers are active.


Well, I'm thinking of an app that lets the user edit some kind of document. At some point the changes have to be saved.

Could this happen to the "live" file or would the workaround be to keep a temporary copy somewhere away from the user and copy that back and forth?


Yes, using the backup API you could load on on-disk database into memory and then do the reverse to a temp file and atomically move it to replace the saved copy. There’s even an example of using the backup API for this on the SQLite website

https://www.sqlite.org/backup.html

This wouldn’t be a great solution for large databases, but if you’re using it as a document format up to the couple dozen MB range it’s the perfectly acceptable.

Alternatively you could just go the working copy route - make a copy of the database on open and move it back in place when closed. This would also let you recover unsaved changes if you crash or are terminated before the user hits save, as well as the obvious benefit of not needing to keep the entire database loaded in memory and needing to write the entire thing out on save.


Thanks for the details. I'll take a closer look. The documents of my app can get quite large, up to 300 to 400 MB (lots of small images). I was thinking to use SQLite as an alternative to the ZIP based format that I'm using at the moment, hoping to avoid the working copy route that I use today, but looks like that would be tricky.


It's a nice wrap up of why SQLite might be more powerful than one might think. And actually is suitable for more usecases than testing or client application storage. But imho the title is a clickbaity and not very accurate. The featureset of postgres has so much convenience that I do use in most web applications, like spatial queries and indexes (via postgis), trigram similarity, queryable JSON fields etc. Wouldn't wanna miss all that niceness.


SQLite does have a spatial extension in Spatialite. Whether it's enough for you is up to you to decide. FTS5 does seem to have some trigram matching function that again may or may not be sufficient for you.


> SQLite is a relational database management system contained in a C library.

This got me thinking, had someone done a rewrite in Rust yet?



Yes, and discussed here about a week ago too: https://news.ycombinator.com/item?id=26749737


SQLlite is awesome right up to the point where it corrupts everything and leaves you in an unrecoverable state. At that point you need to have working backups and a plan to migrate to a real DB server. Actually, if "working backups" is a thing that makes sense for your use case just figure out how to use PostgreSQL now.


I've never experienced this within my 10+ years of working with SQLite. I've seen it a fair bunch with MariaDB though.


I've been using SQLite in a desktop product for around 10 years, and I guess there are around 1-2k deployments of it.

Therr have been a couple (literally) of occasions where customers have contacted me and it turned out their database was corrupt, though I don't know the circumstances under which it occurred. Presumably there are instances where it wasn't reported to me too.


This happened to me. I never did figure out the cause. One day customers of www.websequencediagrams.com started emailing me saying they couldn't access their files. Turns out it was corrupted and would just error when accessing certain records. Also, for mysterious reasons, there was a single open transaction that had been accepting all the data for several days, so I had to be very careful when restarting the app...

Coincidentally, the backups had stopped working a couple of months ago.

Fortunately I was able to copy the data to my machine, write some python to try to retrieve each customer's data individually, verify its consistency and merge with the older backup so most people didn't notice.

Afterwards I upgraded to the latest sqlite, as the one I had been using was six years old, and I have not had a problem since.


> SQLlite is awesome right up to the point where it corrupts everything ... At that point you need to have working backups

Whatever DB you use, a backup plan is prudent for it. No DB can magically recover from some of corruption. And unless the user does something stupid, out of ignorance, SQLite is extremely resilient. Do you have any experience with it that makes you suspect otherwise?


Has this happened often? Can you reproduce it?

I was under the impression that it’s very difficult to corrupt a DB file if you are using the SQLite API


call fork()


The docs are quite specific where you can and cannot move connection pointers over threads.

You should probably keep each connection owned by a single thread.

This is a general “do not share memory” issue, you could fork any non-thread safe C code and see undefined behaviour.

Any other issues?


Mistakes happen.

But only on sqlite, the "undefined behavior" means "all your data is gone". In postgres, you can crash or fail or get invalid results, but you are not going to lose all your data at once.


But you cannot compare products based on the mistakes programmers make. We are all human. Any tool can be misused though.

I think "do not share memory with multiple writers" is in the same category as "do not use a default user/pass, only allow access from the LAN".

Both are programmer errors not related to the specific products they are implemented on top of.


Sure you can! One of the big selling points of Rust, for example, is that handing of programmers' mistakes -- if you look at original announcement [1], the first bullet means "when programmers make mistakes, they do not turn into security or crashing issues".

That said, the "do not share" issue is not important for everyone. My Python code never calls fork, so that's not an issue at all. But I can easily imagine programs which do fork a lot.

[1] https://lkml.org/lkml/2021/4/14/1023


Ok you can compare products, but you still have to use/compare them without making any obvious mistakes.

Using the Rust example, you could say “Rust is not safe because I can wrap my code in unsafe{} and that corrupts my data when I fork”.

The OPs point is “when I run two threads that write to the same memory I get corrupt data”.

The first point of call is not “well it’s Samsung memory, so Samsung make terrible memory modules, I shall run my incorrect programs on Sony henceforth”.

It’s “why are you expecting your incorrect program to even work”.


Actually fork does not create two threads it creates two processes which should "share memory" only in the "copy on write" sense. There is no undefined behavior implied here. When developers misunderstand the difference between thread and process boundaries (as in the case of the SQLite devs) things ca go to total shit real fast.

If you replace "Samsung" with Unix and "Sony" with Microsoft your other statements are correct. That's the problem.


You need to be running the database on a correctly working operating system. Since the database relies on the system calls like fsync() working correctly, the filesystem must also work correctly, the stable storage must have write caching turned off and write requests must not return before they are complete. Also, the operating system must not overcommit memory.

So in order to avoid SQLite database corruption, you need:

- hardware RAID disabled or reconfigured in JBOD ("IT") mode;

- RAID controller write cache disabled;

- RAID battery back-up cache disabled;

- individual drives' write caches disabled;

- ZFS;

- if using GNU/Linux, OOM turned off.

Even with turning off OOM, GNU/Linux's fsync() will still lie about I/O having completed, when it is in fact in transit. Therefore, if you want a reliable database, you must switch to a real UNIX, like SmartOS.

Only when all of these are done exactly as I have specified will you have a system ready for a relational database management system, and only then will a database be able to actually provide transactions.


I love SQLite but it's really a pain to do concurrent writes from multiple long-lived threads. Even in WAL mode, having multiple writers leads to tons of "database locked" errors and applications fall over. The writes don't even have to be complicated, single-row atomic updates with no table locking, and SQLite is not able to serialize them and handle them gracefully without triggering errors, that's up to the application.

The extra work of having to manually implement inter-thread coordination to prevent hitting the DB at the same time is a lot of work that never gets mentioned. When people talk about SQLite being able to handle tens of thousands of writes per second, there's a big asterisk, that's only if the application takes care of not making them write at the same time.


It's a bummer that DBeaver can't just open a sqlite file and do the right thing. Anyone know an app that makes it easy to just "double-click" a sqlite file and get it pulled up in a convenient db GUI?


Haven't tried it with SQLite databases yet, but you could give JetBrains DataGrip a try.


Beekeeper Studio


Wow, looks very slick! Surprised I hadn't heard of this one.

Doesn't have the functionality I was asking for, though – filed an issue here: https://github.com/beekeeper-studio/beekeeper-studio/issues/...


What’s the portability between SQLite and something like Postgres or MySQL? Can you easily start with SQLite and then go to one of the others easily? If so I don’t see why you wouldn’t always start noodling with SQLite.


How many concurrent write connections and read operations will a SQLite handle?


One write, infinite read. Per file (which matters, since you can have many files).

Writes can wait for one another. So you can still handle multiple writers, but serialized.

BTW, most people vastly overestimate their concurrency and write frequency needs. For example something like a CMS is 99.999% reads and 0.001% writes (yes, I pulled this out of my behind, but it's close).


It does not handle multiple writers well even in serialized mode with WAL turned on, most systems will throw "database locked" errors. It's up to the developer to coordinate between threads to prevent concurrent writes from getting blocked.


How does it handle infinite reads?

For example: I used to work with a Couchbase cluster of 6 nodes, billions of documents (> 1TB of data) and doing roughly 25k-50k reads per second and every read was returned in < 1ms.

Are you telling me we could have used SQLite on a single instance and it would have also have an average end to end latency of < 1ms. I.e. from App server to DB and back was < 1ms? For 25k+ OPS/sec?


Obviously a tiny embedded library that reads off a file on disk won't replace a "Couchbase cluster of 6 nodes etc. etc."

But SQLite has no hard limits on number of readers. There are no sockets, or Unix pipes. It reads off file cache in RAM, or if your data is bigger, it reads of disk. Like you would from a file. How'd you answer how many reads you can do from RAM? Kind of... depends on everything else, doesn't it?

I'd say it can have comparable performance to a single MySQL instance for simple read queries. If that helps.


50k reads per second? And you need how many servers for that? Sqlite can't do infinite reads but pretty close

https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...


CMS would be low writes assuming no tracking or authentication audit trails or maybe buffered copies of all those.


I wouldn't store web logs in an SQLite DB, but I wouldn't do that in any RDBMS typically. Easiest solution is a flat file, which you can import and analyze offsite (in SQLite, or Excel, or whatever :-).


You have a design where logs are added to a queue (something like ZMQ perhaps) and have a process/thread writing the logs to a separate SQLite database.

I'm not saying it's a good idea, especially for logging, where a flat file might be better, but it is possible.


I was gonna say a queue, but I didn't want to get too fancy, but you're absolutely right. If you can use an in-memory/socket queue, even a basic one, which serializes the writes to a dedicated SQLite file, SQLite becomes as performant as any database (you're limited by disk I/O).


Interesting, thank you!


sqlite is amazing and everything.

I still use Postgresql for most things - the tool support is genuinely a lot better. I also find the explain output much easier to understand.

There are also features which I occasionally use which Postgresql support such as partitioning, functional indexes, partial indexes. These can be emulated in some way with sqlite I'm sure, but they are fairly nice.

I have implemented some really critical stuff in sqlite and it's been amazing though. Anyone using sqlite in production needs to really understand the concurrency limits though.


The article kind of glosses over the lack of common column data types. I love SQLite and use it on everything, but parsing and formatting dates (for instance) gets really old after a while.


I know this potentially turns into a wildfire debate about future meeting dates across timezones, but for us we still have not had any problems storing all datetimes as 64 bit unix timestamp values.

There is only 1 situation where we actually care about the timezone of something, and it is stored on the basis of the user profile, not any specific timestamps. Time is just time. The TZ info is usually part of something entirely separate. Any timestamp can be interpreted in any tz, as long as you have a consistent UTC basis to operate from and knowledge of the desired tz.

If the concern is the final formatting, then maybe consider writing some UDFs in your app to quickly format fields. Comparison and sorting of 64 bit integers is trivial, so none of this should be a problem at all...


I think I need help understanding the use case. Like others, I'm quite happily using a PostgreSQL server, when I need to query a DB using SQL. This clearly doesn't work in the embedded case, but why would I need SQL there? Why wouldn't be an embedded key/value DB like Berkeley DB (which also offers ACID, transactions and write-ahead-logging) be sufficient there? I for one am quite surprised to learn that a web browser would have a SQL DB built-in.


I use SQLite for many of my applications, but definitely not for all of them.

SQLite is probably not the right tool if you have concurrent access to the database, nor if you want to enforce database consistency through triggers.

It is the perfect application if you need a database that you can easily move around, access from a single application and your use case is simply to load and dump data with no further transformation. Anything more than that requires an RDBMS.


> Several processes or threads may access the same database concurrently and read accesses can be run in parallel. Because the database consists of a single file, a write access can only be satisfied if no other accesses are currently writing to the database.

Which is perfectly fine for many of my sideprojects. I use SQLite and quite happy with that, but when I need to write to the DB I still prefer PostgreSQL


Not in its current state - it becomes entirely unsuitable as soon as you need to have multiple instances (be it for scalability or availability).

Sqlite is inherently single instance. What really makes this an issue is that it doesn't work safely/reliably on distributed filesystems like GlusterFS and Ceph.

See here for example of a project that needs to be forked in order to not have data corruption on remote filesystems (it depends on the WAL mode AIUI): https://github.com/Sonarr/Sonarr/issues/1886

I really do hope that this will be addressed in a future version of Sqlite, which would at least allow running it on redundant network-attached storage.

But even if it is, you will still have issues once you want to scale horizontally for performance.

If you're building strictly in-house proprietary software, none of this really matters as you have full control. But if it's either FLOSS or otherwise to be operated by anyone else than the developer and their internal organization, SQLite is not suitable.

I die inside a bit every time I am expected to take responsibility for a software built on Sqlite.


From TFA:

The only time you need to consider a client-server setup is:

- Where you have multiple physical machines accessing the same database server over a network. In this setup you have a shared database between multiple clients.

- If your machine is extremely write busy, like accepting thousand upon thousands of simultaneous write requests every second, then you also need a client-server setup because a client-server database is specifically build to handle that.

- If you're working with very big datasets, like in the terabytes size. A client-server approach is better suited for large datasets because the database will split files up into smaller files whereas SQLite only works with a single file.

I have run SQLite as a web application database with thousands concurrent writes every second, coming from different HTTP requests, without any delays or issues. This is because even on a very busy site, the hardware is extremely fast and fully capable of handling that.

The claim is that most of the time you don't actually need multiple instances.


Not covered: Single application server with network-attached highly available storage


I love SQLite, I do all of my projects prototyping before I move to postgres which is also awesome. One if the major problems in facing is that heroku for example in every build will override/delete the SQLite DB since it treats it the same as every file - which is a shame.


I wish someone would write a set of shell scripts that overcome the hardships when dealing with SQLite databases.

For example, deleting a column is a pain in the ass.

A single purpose script "rename_column.sh" would be nice, which combines all necessary steps and gives some guidance regarding edge cases.


The latest version (3.35.4) adds `ALTER TABLE DROP COLUMN` https://sqlite.org/releaselog/3_35_4.html


That is cool.

What is a good way to use 3.35.4 on a system with a Debian version that comes with an older SQLite version?


Compiling it yourself from source is quite easy, you can find instruction here: https://www.sqlite.org/howtocompile.html

Here is an article from Julia Evans explaining it: https://jvns.ca/blog/2019/10/28/sqlite-is-really-easy-to-com...

Or if you are familiar with Docker, you can use a more recent Debian in a container and install sqlite inside it.


If you compile it yourself, how do security updates get to your system?

As for a more recent Debian version: The latest stable Debian has sqlite3 3.27.2 in the repos, so that is not an option.


Since sqlite is local to your application, you can just bundle a newer version.

Speaking generally, if the Debian version is the way to go, the easiest way is often to grab the source package from sid and build it. It's literally just one command. Sometimes new dependencies will cause trouble, in which case some manual tinkering is required.


I've gotten some good mileage out of sqlite-utils[0] lately. It and it's parent project datasette[1] are a wonderful contribution to the sqlite community.

[0] https://sqlite-utils.datasette.io/en/stable/cli.html#transfo... [1] https://datasette.io/


1ms write latency means max 1000 writes per second. That seems low for a "medium sized app".


> 1ms

Where do you get this figure? It seems quite high.


From the article:

> and no lock lasts for more than a few milliseconds


That's a maximum value, not an average.


Postgres is better in the I-need-concurrency case -- I think it's the greatest RDBMS that's ever been made and would like someone to prove me wrong some day.

SQLite's amazing too though, when you don't need concurrency (and most websites don't really -- especially the ones that should be scaling vertically instead of horizontally).

Anyway here's some cool SQLite stuff:

- https://github.com/CanonicalLtd/dqlite

- https://github.com/rqlite/rqlite

- https://datasette.readthedocs.io/en/stable/

- https://www.sqlite.org/rtree.html

- https://github.com/sqlcipher/sqlcipher

- https://github.com/benbjohnson/litestream

- https://github.com/aergoio/aergolite

- https://sqlite.org/lang_with.html#rcex3

- https://github.com/sql-js/sql.js

- https://www.gaia-gis.it/fossil/libspatialite/index

- https://github.com/h3rald/litestore

- https://github.com/adamlouis/squirrelbyte

- https://github.com/chunky/sqlite3todot

- https://github.com/nalgeon/sqlite-plus/

- https://www.sqlite.org/json1.html#jsonpath


If you only ever have 1 database server and 1 application server, then SQLite is deterministically faster in the embedded single-server scenario. This is our configuration model for our application. I acknowledge this is not everyone's configuration model (today), but if you entertain the insanity for a moment you might find the performance uplift of a single-machine app to be staggering.

Query execution latency for SQLite on the same machine running on NVMe disk can be measured in microseconds. You will never see this kind of performance across the network, or even loopback. Thus, there is literally no way from an information theory perspective, that Postgres, SQL Server, Oracle, DB2, MongoDB, Dynamo, Aurora, et. al. could ever hope to beat the throughput of (properly-configured) SQLite operating embedded in the application itself.


You're right -- a few hours ago I thought it was worth editing and noting the scale-up-instead-of-out route, but thought better of it, figuring no one would bring it up. Postgres almost certainly can't beat SQLite when it's on one machine and concurrent writers are not an issue, and you've got some seriously fast hardware to make use of.

Need network? Postgres. Don't need network? SQLite.


Yes, but then you have to have a DB embedded in your application, which is a crap design for serious distributed systems with HA requirements and high load.


As in all things, it depends on your specific application. To make a broad statement that SQLite is crap for serious distributed systems is extremely presumptive about all potential applications.

Distributed systems are largely a mistake and amount to a lack of understanding regarding what is actually possible to extract from a single x86 server.


I think the file system is the best database we have to date, and it's what I use for the "base" data store in my most recent applications.

However, for querying, SQLite makes a nice caching layer.


Are these consumer apps?

Are you doing this in LocalStorage?


No.

No.


> you can just upgrade the hardware to a beefier machine and keep business as usual

One of the troubles with having a single machine is that your service will become unavailable when you upgrade it.


we just slurp/dump clojure maps to disk. clojure is our query language.


As a clojure user, that's an apples-to-oranges comparison. Out of the box clojure does not provide you with out-of-order slurp/dump info against a file, does not provide you with indexes, etc. You can load everything into memory and create an efficient structure in your EDN but a) you need to implement it yourself; b) if your machine crashes, you lose it, because by default you work in memory, so you need to manually manage saving to disk.

So basically I can just say the same thing you said with C: "I just fopen/fgetc strings to disk. C is my query language."


it's not that apples to oranges. op is saying they use sqlite, i'm saying we use edn files. just offering another point of view.

if by out-of-order you mean loading parts of a file, of course we don't do that.

we don't load everything into memory (there's more than one file). just like you wouldn't load your whole sqlite db into memory.

if the machine crashes we lose what we didn't save. same with a db.

  > So basically I can just say the same thing you said with
  > C: "I just fopen/fgetc strings to disk. C is my query
  > language."
as a clojure user you know this is not a good analogy. c has macros too.

the main problem with our approach is that the server needs to have access to the edn files. that's not always doable ... but same problem with sqlite.

and it is oh so nice to investigate stuff in the repl:

  (:prio (load-file (filter #(= "secret-id" (:id %)) index))) 
btw, i loved to kill a mocking bird.


How is not apples to oranges? You are comparing a library that implements a database (with a specific binary file format that any sqlite instance understands, and a query language, among other functionalities) to a programming language!

With sqlite, creating an index is a one-liner. How do you cover that in clojure? With sqlite, a whole database fits in one file that can be read by almost any macos / linux machine. With clojure, you need your specific program created on top of clojure installed in the machine. Etc.

At the end of the day, you can reimplement sqlite on clojure, which basically proves that we are talking about different abstraction levels, thus apples to oranges.


Nicely summed up


Oh good we've met our monthly quota for wildly exaggerating the capabilities of SQLite.


I feel like most of the hyperbole would be resolved if people would differentiate between SQLite the storage engine (which is indeed excellent, especially if you're working in a language like C or Rust that allows plugging into the virtual tables API) and SQLite the RDBMS, which is very much not and is actually guilty of most of the things people accuse MySQL of. There are plenty of reasons not to use SQLite that have nothing to do with its performance for your workload.


SQLite's RDBMS is fucking incredible because it doesnt make a bunch of assumptions. I don't think developers are paying attention to the most important feature.

SQLite is embedded in your application. You can write user-defined functions in your application logic that are then bound to SQL functions. These functions are then treated just like first-class functions built-in to the dialect:

https://www.sqlite.org/appfunc.html

The way Microsoft exposes this in their provider is really wonderful. Using aggregate UDFs feels like cheating. We have also done some next-level stuff where we can call a UDF that executes arbitrary SQL command (i.e. thus invoking subsequent UDFs recursively). We found that UDFs don't necessarily need to be deterministic and can mutate the outside world...

https://docs.microsoft.com/en-us/dotnet/standard/data/sqlite...

If you think along this axis, you would find SQLite as more a blank canvas that you can construct a very domain-specific view of the world upon and then leverage it for implementing complex business logic. You can get from "lots-of-code" to "no-code" applications pretty damn fast if you play this game right.


I usually try not to use the stuff that's strictly RDBMS-specific so that if I need to move it's not a huge hassle, but DAMN:

   connection.CreateFunction(
    "regexp",
    (string pattern, string input)
     => Regex.IsMatch(input, pattern));

   var command = connection.CreateCommand();
   command.CommandText =
   @"
    SELECT count()
    FROM user
    WHERE bio REGEXP '\w\. {2,}\w'
   ";
   var count = command.ExecuteScalar();


Yes. Also consider the implications of:

    var myUdfInstance = new MyUdfInstance(connection);
    
    connection.CreateFunction(
      "Execute",
      (string command)
      => myUdfInstance.ExecuteSql(command));
I can confirm this actually works. It stacks up nicely in the debugger too.


> There are plenty of reasons not to use SQLite that have nothing to do with its performance for your workload.

Go on


If you like foreign key constraints to actually be enforced automatically (without turning them on in a pragma and/or expecting that everyone who connects to your database does the same or uses a version compiled with custom flags), transactions not to allow an accidental commit after a syntax error, richer indices than btree ones, enforced typechecking without a CHECK constraint, DML commands other than SELECT telling you what rows were returned, etc., the functionality is simply missing in SQLite. Some of it is for backwards compatibility, some not, but at the end of the day it's just not as pleasant to use as something like Postgres as a SQL database. These have all caused issues I actually ran into while using the database, some of which led to integrity violations that took a long time to fix. Sure, they're documented behaviors, but that does not mean they're good behaviors, and they substantially restrict the situations in which I'm willing to deploy it far more than any scalability issues do.

If, on the other hand, you're just using it as a transactional storage engine, none of this stuff really matters and I understand the praise--hence why I think the two modes of use should be differentiated. The other thread responding to me is basically all about extending the very barebones provided interface with custom, application-level semantics. I agree that SQLite is excellent for this, but you have to remember that this is not the only thing people want out of a database.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: