Hacker News new | past | comments | ask | show | jobs | submit login
SQLite Internals: Pages and B-trees (fly.io)
667 points by eatonphil on July 27, 2022 | hide | past | favorite | 142 comments



I love this.

I have been a fan of SQLite for years, and it makes me genuinely happy to see other people have the same enthusiasm for the software.

A couple of years ago, during a job search, I came across a couple of companies that I thought were a good match. I went through the interview process with all of them and, due to similarities in the interview process, we ended up talking about the same thing: databases. Oh the horror in the interviewers’ eyes whenever I mentioned SQLite. Experienced software engineers from The New York Times, for example, went as far as to mock my engineering choices in 3 out of 5 interviews, despite the success of the products my team built on top of SQLite.

The experience made me feel awful and I stopped talking about SQLite for a couple of years. Instead, whenever someone asked a question about databases, I would answer with the PostgreSQL equivalent, knowing that the basic features are all the same.

That is why, whenever I see projects like Litestream [1][2] used by excellent products like Tailscale [3], it brings me joy.

A friend of SQLite is a friend of mine :-)

[1] https://litestream.io

[2] https://fly.io/blog/all-in-on-sqlite-litestream/

[3] https://tailscale.com/blog/database-for-2022/


Mockery is never OK.

That said, their experience probably reflects my own: using SQLite on a personal project for config or something, loving the simplicity, loving the "ditch all that complexity" hype, then trying it on a more demanding project that according to their mental model it should have excelled at (a central job tracking process), watching it completely faceplant under load (20TPS? Surely it just needs an index or something? Maybe a ramdisk? Err.....), panicking when they realize that perf and monitoring and diagnostics and optimized backends were some of those "bells and whistles that nobody needs," deciding "fine I'll do it myself," throwing far too much time and effort at co-debugging python and C, recompiling SQLite with debug symbols, sprinkling printfs, running profilers, reading the source, and tracking down the issue... only to find that it was a bad default which people had been unsuccessfully begging the maintainers to change for years, which was guaranteed to torpedo perf under a wide variety of common workloads, and which the maintainers had been resisting based entirely on inertia and the misguided idea that this setting was easily discoverable. Yikes.

That's how someone picks up rules of thumb like "SQLite is for preferences, not perf, and definitely not prod."

Now, this experience is a decade stale. SQLite bit me in 2012 and the particular bug was fixed in 2015 IIRC. The people begging to fix the bad default finally won. Time passed, SQLite got better, and now, based on these new stories of SQLite doing well under load, I think it's time to lift my "once bit, twice shy" sanctions regarding perf. I had them in place for a reason, though, and if you had YOLO'd when I had YOLO'd, you would have faceplanted when I faceplanted. I fully defend the use of sanctions like this in general. They are the only tool we have against overenthusiastic marketing.

To be fair, the SQLite site is pretty good, it's the fans that tend to take things too far.


Speaking of potentially obsolete sanctions, conferences have been on hold so I might be hanging on to a few more of them past their due date. Does anyone have fresh experiences with these?

------------------

AMD GPUs (2014): Waiting for GPGPU support from Adobe and Blender that was supposed to be operational (Adobe) or getting there (Blender) but never landed.

AMD GPUs (2016): I burned an entire semester trying to port CUDA code to OpenCL only to run into bug after bug after bug. Lockups, black screens (no framebuffers were supposed to be involved but bugs find a way), leaks, stalls which were difficult to attribute, unscruitable errors leading to abandoned threads with pleading users from years past with no reply. I finally cracked when after a long day of "I swear this should work I have no idea how the changes I make correspond to the bugs I see" I ran my ailing OpenCL code on a weaker NVidia card, and it Just Worked. Not only did it just work, it was 10x faster. Even though the card was like 2 generations older, and I was using OpenCL, which NVidia had every reason to sandbag, but they were still winning at. Oh, and the debugging tools actually worked, too. I was persuaded to sell my red cards, eat the ebay tax, eat the nvidia tax, and buy green from then on. Time has passed since 2016, though. ROCm looks good and AMD has money now, so maybe they can afford to pay people to fix bugs. I want to see someone running Blender and a modern ML framework or two before I retire the sanctions, though. Anyone have fresh intuition here?

Numba CUDA (2018): I spent two weeks working through a succession of problems that I eventually resolved in two days by just using CUDA C++. Numba CUDA only worked with year-stale CUDA, it copied float8s over a float4 array despite having access to triply redundant type specs, the mechanism to pass buffers between kernels always forced copy off GPU, and one more that I forget. I made patches for the first three and gave up on the last one before fixing. Has anyone used it recently? Is it in better shape these days?


I was hired by AMD to work on ROCm in 2020. I'm not exactly an unbiased source, but IMO it has improved significantly. There's a number of working frameworks and Blender was recently ported [1]. Just be sure to choose appropriate hardware.

ROCm still has lots of rough edges, but we're slowly filing them down. The documentation is a bit patchy too. Overall, though, it's been a steady progression in the right direction.

[1]: https://www.phoronix.com/review/blender32-hip-cuda


For OLAP applications (heavy on reads, pretty much no writes) I find a local SQLite has far superior performance to eith cloud based Postgres or SQL Server instances. YMMV.


> went as far as to mock my engineering choices

That's not acceptable.

> Experienced software engineers

Measured in years, probably, not by ability.

I've heard supposedly 'experienced' engineers saying things like "No-one uses Lua in production!".


It’s amazing how otherwise competent engineers will look down on technology that so many use all around the world when generally their contribution to society is a few buggy Python classes in some internal codebase.


For a counterexample:

Some pretty big businesses have been built on the back of PHP. That doesn't make the language any less ridiculous.


Why?


Because there are objective measures by which a programming language can be judged and assesed to be better or worse than others of its peers, and those measures, like a powerful (not necessarily static) type system and consistency of the language's core design and libraries, put PHP at the bottom of a hierarchy of languages.

This need not imply condesenction to the language users or designers, not as individuals at any rate.


Yes. And you might quibble about the exact criteria. But by almost any metric, old-style PHP was an awful language.

(Facebook's re-invented Hack isn't nearly as bad. It's about the best language they could have made starting from PHP.)


There was an article about PHP that summarized some of its issues back in the day: https://eev.ee/blog/2012/04/09/php-a-fractal-of-bad-design/

I think that it was excellently written and actually mentioned concrete issues, rather than vague personal dislikes. That is good, because you can revisit those in 10 years or so and see how many of those have been fixed in the actual language.

Someone actually tried to address a few of those: http://maettig.com/2020-09-16-revisiting-a-fractal-of-bad-de...


Thanks for the second link! It's good to see someone (sympathetic to PHP) address the issues reasonably objectively.

That's part of why I carefully wrote 'old-style PHP'.


Are they competent, or have they only written a few buggy classes? Your strawman needs work :)

That said, I'd agree that mockery is rarely ever appropriate, and is often undertaken by those without the credentials to do so.

It's also possible that someone is expressing hard won knowledge/battle scars in an unskilful way. And it takes investigation to know the difference.


When I hear that sort of mockery, especially from so-called "thought leaders" and other loud voices, in conference talks, in long-winded YouTube videos (rants), etc., it's such a big red flag to me. Even though I might agree with the individual in a general sense, it makes me wonder what they're missing or getting wrong or biased against in the areas where I don't have the (supposed) expertise that they do.


I sometimes wonder how much of this objection to SQLite is purely technical, and how much is due to the fact that Dr. Richard Hipp would not exactly be a great "cultural fit" at places like the New York Times or Mozilla.


I would guess that at most 1 in 50 people familiar with SQLite even know Dr. Hipp's name, much less anything about him that might influence their perception of his work.


SQLite is great. But the simple fact that it runs locally makes it a poor technical fit in many scenarios.


For every technology there are scenarios in which it is a poor fit.

You don't want to install Postgresql or Oracle on an embedded system nor would you want the extra weight of administering them for a system that has few users.

The SQLite site runs on SQLite and it has considerable traffic.


> The SQLite site runs on SQLite and it has considerable traffic.

The forum and source code trees run on sqlite (via the Fossil SCM, authored by sqlite's Richard Hipp). The main sqlite site is static HTML with a tiny sprinking of JS.


"But the simple fact that it runs locally makes it a poor technical fit in many scenarios"

MySQL and PostgreSQL can run locally too. I assume you mean that SQLite doesn't provide a network server interface out of the box?


I've looked at his homepage, and only saw something about bible translations. Is there anything else that's iconically non-NYT about him?


FWIW, i just almost wet my pants laughing that that :).


Never underestimate programmer machismo. So many people seem to rule out SQLite because of programmer lore that it's not fast enough or doesn't scale. But ofc Postgres is a real database and therefore is the automatic correct choice.


PostgreSQL is now facing that too. Since the current fad is building everything as distributed micro-services(no matter the scale!) and PostgreSQL encourages a centralized database, it tends to get dismissed for other solutions (that often aren't anywhere near as good).

"Doesn't scale" gets thrown around often. Yeah sure, but sometimes it doesn't _have_ to. I've seen large fleets of NoSQL databases (won't name to avoid flamewars) running workloads they are not really optimal for. Or PostgreSQL databases completely bogged down and vertically scaled to ridiculous levels because teams don't bother to optimize their queries.

Query performance is not part of OKRs, but new releases usually are.


"Mongo DB is a web scale DB and doesn't use SQL or Joins so it's high-performance. Everybody knows that relational DBs don't scale because they use Joins and Write-To-Disk."

https://www.youtube.com/watch?v=b2F-DItXtZs


I always love watching this.


I've seen highly scalable distributed DBs used as poor choices, and have used Postgres as a replacement in some cases.

It really depends on the kind of problem and the scale, but a good implementation that matches the problem will sing compared to a mediocre implementation on inappropriate yet highly scalable infrastructure. Even if it's supposedly inferior because it's centralized, etc.

Knowing the constraints of the engineering challenge and choosing wisely within that is key. You can sometimes leverage constraints and simplifications in ways that can stack and remove many layers of complexity. Immediate consistency is a huge one, as are transactions.

I avoided relational tech early in my career, when I was really just turned off by examples of poorly tuned implementations, and badly written schemas and queries. Once I finally opened my mind, I was better able to appreciate its role as compared to other DB models.

I think that's a natural maturation as you progress in your career/get more exposure. It's too bad most of us seemsto have to walk the same path though.


A lot of things don't need to scale (and never will) so using a simpler solution is more economical and efficient.


I'm looking to develop a self-hosted, open source Node.js web app mainly to upgrade my resume (as well as to scratch an itch). I considered SQLite to be the perfect choice - the software will be mostly used locally in a single user setup, so that saves RAM and even if someone wanted to host it for many users it would still be able to handle a lot. Plus, the package will be so much more portable, with SQLite it can be distributed via npm install, rather than a Docker image. I initially wanted to use one of the "real" databases concerned that an interviewer may look down on it, but decided to scratch that idea and use the right tool for the job. Your comment makes me reconsider that choice. I think I will do SQLite anyway, because it's something I want to use myself and I also will be applying as a front-end, rather than full-stack.


Crazy to hear that.

Those conversations are my favorite when interviewing candidates because it gives you a view into how their brains work.

I would’ve nerded out about how so many macOS apps use SQLite under the hood because it’s so solid.

SQLite is even used in the Airbus A350 and they signed (approx something like) a 35 year support contact with Airbus for it, I remember hearing in a podcast a long time ago. (Please correct if I’m way wrong here)


> Experienced software engineers from The New York Times, for example,

I can think of a couple of scenarios where they wouldn’t want to use SQLite, such as high availability database to support their subscription system, or the back-end for their interactive website (which was a mean stack at one point, so not even SQL).


For people who enjoy this article I recommend reading the two-parter “How does SQLite work?” [1] [2] by Julia Evans. It too is an exploration into the inner workings but diving deeper into code. And for those interested in more details on b-trees, there’s “The Ubiquitous B-Tree” by Douglas Cramer which I enjoyed a lot.

[1] https://jvns.ca/blog/2014/09/27/how-does-sqlite-work-part-1-... [2] https://jvns.ca/blog/2014/10/02/how-does-sqlite-work-part-2-... [3] http://carlosproal.com/ir/papers/p121-comer.pdf


I dream of a SQLite-like embeddable database engine based on Datomic’s data model and queryable with Datalog. Written in something like C, Rust, or Zig. I’m toying around with the idea of hacking something up, but it’ll likely stay in the dream realm until I have Heaps of Free Time on My Hands (tm).

Looking into SQLite’s innards is a great source of inspiration. Thanks for this post.


I've been doing something very similar, but based on extendible hashing algorithms and LSM trees. I've come to the conclusion that a DAG of constraints, triggers and validations on a flat KV + entity model is probably the ideal data structure for 99% of projects I've worked on. You can get the benefits of the Datomic-like history by... skipping compaction and including a TX instant next to all record/assertions. I've found SQLite, Postgres, the LSM papers, Bitcask, and many other papers to be very helpful in terms of inspiration.

Edit: I'm prototyping in Python and implementing in Rust with intent to create a C API and embedding a Scheme runtime for "server"-side query and constraint parsing.


I've had some append-only tables in Postgres and only recently realised that Postgres' system columns (https://www.postgresql.org/docs/14/ddl-system-columns.html) already effectively enabled some Datomic-like structure for such append-only tables! Specifically the xmin column allows me to identify the rows to treat atomically as a unit, and to ignore if I'm querying for a historical view.


You could probably do it with BRIN indexes similar to how TimescaleDB handles their time-series hypertables


TimescaleDB is packaged as a postgres extension, there's a GitHub project here if anyone is interested to check in on that https://github.com/timescale/timescaledb


Indeed! That seems like quite the ideal use-case for BRIN indexes.


Do you have a public repo for that yet? (Assuming you're planning to open-source)


It's got a LICENSE but not publicly listed yet. It's very rough, and has a ton of work left. Once I get it to a workable state I'll open it up under AGPL, though probably with a CLA because I'd like to turn it into a marketable product in the long-run. If I make significant progress on it, I'll reply to this thread with updates :)


I hope using an embeddable database will also free us from delegating the choice of query language to the database library. It should be possible to have a general purpose low level persistence API and many different query engines built on top of it.


I think the problem with querying is efficient query-planning requires understanding the indexes on the dataset, so you at least need to be able to expose indexes and the their properties in an API.


Imho the 90% of query planning is not that hard at all in practice. If it's your data, and your query you'll probably have a pretty good idea which table you'll want to filter first, and what to join the same you would with any data structures.

The hard part is getting all of that consistent with concurrent writes. Can rows change while you scan? can indexes? How do you check that your write is valid immediately before committing, etc. things like that.

I think SQL makes that pretty hard already, but in a "database-as-a-bag-of-data-structures" mode I think that's going to get even harder.


IMNSHO query planning is pretty hard. I recently found exponential behavior in SELECT query processing that depends on the depth of subselects. This happened with pretty seasoned database system, let me say.

To have good query optimization, you need to implement, at the very least, some form of dynamic programming, otherwise you will not be able to optimize queries that have more than half a dozen tables in selects. Then you have to implement selection of the best plan or approximation to it, which would make you implement beam search through space of all solutions you generated, and that's simplest case. For guaranteed optimization, you need to implement or utilize pseudoboolean optimization engine.

I am a database engine developer right now. ;)


If you just pick a worst case optimal join algorithm with a limited table cardinality, i.e. triples like in the case of OP, so that you can materialise every possible index, you can perform a dynamic _instance optimal_ search (i.e. you can ignore skew), if you choose the best possible variable every time you have to pick a variable. This can be done by estimating the cardinality and jaccard index of the variables, which is pretty straightforward with the right datastructure. If you don't want to limit yourself to just three columns, you can also go for cutting edge succinct data-structure magic. https://aidanhogan.com/docs/wco-ring.pdf

Either way, using a WCO join combined with a data-structure that allows for efficient range estimate and dynamic variable ordering, completely obliviates the need for query planning.


"Estimate the cardinality", "jaccard index", "succinct data structure" and, finally, some unknown abbreviation "WCO" (I guess, it stands for "worst case optimal").

Yes, of course, having implementation of all that completely obliviates the need for query planning. ;)

I can't help being sarcastic for a moment, sorry.

In my opinion, in your comment above you clearly demonstrated that even avoidance of query planning is hard, using as example (multi)set of triples for which it is possible to realize all indices.

If what you described is simpler than query planning, then query planning is hard.


While the technical terms may be unfamiliar, it's all pretty straightforward, and requires between 1-4kloc depending on how fast and fancy you want things to be. (This includes everything from the basic copy on write data-structures to the query language.)

Building an immutable path compressed radix tree is pretty straightforward and requires around 1-2kloc, and it's easy to keep track of the n-smallest hashes of leaf values, as well as the total count of leafs in the nodes. The sampling done by the min-hashes give you a good indication of two nodes overlap which the jaccard index is just a different name for.

The query engine itself is like 0.5kloc, and is just walking the different radix-trie indices simultaneously. The basic insight of worst case optimal (WCO) joins, is that it's a lot cheaper to join everything at once and treat it as a constraint propagation problem.

A LINQ style query parser takes up another 1-2kloc and is just a bunch of ol' boilerplate.

In total that's about as much code as your average large C++ codebase CMake file.

You could sketch the entire thing on a napkin and build it in a week if you've build something like it before.


"If you build something like that before". I think I can add that to the list of famous last words. ;)

Please, excuse my sarcasm again. But, tell me what to do if I didn't built something like this before? What if I built something like equality saturation engine, pseudoboolean optimization using SAT solver and/or beam search? Will it help me somehow?

You estimated code size at 4.5KLOC max. Given that C++ programmer delivers 20-25 debugged lines of code per hour in the long run (IBM's stats), it would take 225 hours of work. Given that PSP/TSP recommends planning for 4 hours-on-task per day, it will take 56 work days. My calculation suggests 2.5 work months to implement all that in the worst case of 4.5KLOC. Even the best case of 2.5KLOC would take a month and a half of work.

Yours' proposition is not a week's project. Not at all, you can't squeeze it that much.

Query planning is hard. Even if you try your best to avoid it.

And we have not even started talking about WHERE, GROUP BY and ORDER BY clauses' optimizations.

[1] shows the use of loop nest optimization combined with beam search. SQLite uses translation of joins into loop nests, and it transforms loop nests into a graph, the path in the graph represents a solution. The [1] shows simplified nesting graph that is linear, and in general it will be quadratic to the number of tables.

[1] https://www.sqlite.org/queryplanner-ng.html

I really like that approach. This is exactly an equality saturation [2] (saturate loop nesting through loop nesting commutativity) with the beam search as a selection phase.

[2] https://rosstate.org/publications/eqsat/

I think that equality saturation with beam search is a week long project if you already have built something like that. The difference? It will work for arbitrary jons.

Of course I am making fun of your statements.

Equality saturation requires quite careful planning and will not work for couple of months, you will keep finding something that fails. Beam search is just as hard. You can encode optimal solution selection problem as a pseudoboolean optimization problem, which is more straightforward than beam search, and [2] shows that Pueblo is no slouch there.

This is not to say that query planning is easy when you do it my way. Query planning is hard. NP-hard, actually. Sometimes you can get away with a simpler less general solution, but it will bite you sooner than you expect.


C++ is probably the slowest programming language in terms of development speed there is, but even if it takes you 3 months to build something like that from scratch it'll still be an order of magnitude less time than building a clone of another SQL database.

`WHERE, GROUP BY and ORDER BY` are relatively straightforward to tack on into the variable ordering.

Having written a sat solver would kinda help you, because modern sat solving algorithms are fundamentally very similar to worst case optimal join algorithms.

The query plan produced by combining binary joins is always going to be off by up to an exponential factor when compared to a WCO join. If you're fine throwing all that complexity onto your problem to generate sub-par query plans, be my guest.


And how modern SAT solving algorithms are fundamentally very similar to worst case optimal join algorithms?

There are at least two different types of them, conflict-derived clause learning and variants and stochastic search and variants.

Which one is more similar to WCO join algorithm?



The paper you provided has two important omissions: 1) it does not mention Tseitin transformation to encode Disjunctive Normal Form into a Conjunctive Normal Form and 2) does not look at decision diagrams, especially, zero-suppressed decision diagrams for set representation.

Tseitin transformation prevents exponential expansion in conversion from DNF to CNF.

The fact that paper's algorithm employs disjunctive normal form suggests the use of binary decision diagrams. ROBDDs represent DNFs naturally, for one example. ZDDs represent sets of sets and were relatively successfully used in non-trivial approaches to the SAT solving problem like [1].

[1] https://web.eecs.umich.edu/~imarkov/pubs/book/b002.pdf

Also, the paper you mentioned has this right in abstract: "However, there is still the quest of making the new worst-case optimal join algorithms truly practical in terms of (1) ease of implementation and (2) secondary index efficiency in terms of number of indexes created to answer a query."

WCO is hard to implement and it might be computation-wise prohibitive.

Yet, it's quite interesting, thank you very much.


>The query plan produced by combining binary joins is always going to be off by up to an exponential factor when compared to a WCO join.

Citation greatly needed. Why is it so?


The intuition is that if your join is a triangle you might be forced to join every relation only to discover that the result is empty on the last join, which requires m^n intermediary results, where m is the size of the relations and n is the number of relations. A WCO join algorithm will figure out quite quickly that the output is empty and is in practice much more dependent on the actual output size, than the size of the input relations.

See:

https://www.cs.stanford.edu/people/chrismre/papers/paper49.N...

https://justinjaffray.com/a-gentle-ish-introduction-to-worst...


Thank you, second link of yours starts with the explanation of benefits expected, which is great.

The difference is not exponential, if I may nitpick. It is sublinear in the case of the "triangles example" - WCO would produce O(n^(1.5)), binary join will produce O(n^2), the difference is O(n^(0.5)) or O(sqrt(n)). The difference is big, but not exponential.


That blogpost is a godsend, it provides a great explanation and intuition, and resolved a few questions that I was left with after reading a lot of WCO join papers.

Good nitpick, I think you can construct larger rings where the difference becomes larger than one, but I might be wrong. The dynamic variable ordering trick makes a huge difference in practice, especially when skew is in play. (https://arxiv.org/pdf/1310.3314.pdf)

In general you're right, WCO joins are a relatively young field of study and sometimes struggle with large constant factors, but they are maturing quickly and in a limited (triple) setting like the one OP "wished for", a lot more feasible than for the general case.

Thanks for the interesting discussions, looking forward to reading the references you provided in depth!

Edit: I just remembered this paper, which might be of interest to you. They seem to recover WCO bounds in a pairwise setting, by choosing very smart intermediary join representations: http://www.cs.ox.ac.uk/dan.olteanu/papers/co-tr16.pdf


Exactly what I needed, thank you again very much!

My old idea was to perform planning after some of the work has been done, because remaining statistics can be different. These papers are of great help!


>you'll probably have a pretty good idea which table you'll want to filter first, and what to join

Until the day you load a bunch of new data and it gets skewed, or you delete a bunch of data without shrinking and oops, your join order and method is not efficient anymore.


The API should definitely either allow directly managing indexes or provide even lower level primitives that let the query engine create its own indexes.


I mean, if you have a KV-like store that supports enumeration, you can pretty much always index the data yourself.


Have you had a look at arrow? It has those capabilities


> I dream of a SQLite-like embeddable database engine based on Datomic’s data model and queryable with Datalog.

You can have this today by running XTDB[1] on top of SQLite via JDBC.

> Written in something like C, Rust, or Zig.

And then compiling your application into native executables with GraalVM Native Image[2].

[1] https://xtdb.com/

[2] https://www.graalvm.org/native-image/


Sounds like Datalevin https://github.com/juji-io/datalevin

Embeddable, check. Datomic data model and Datalog query, check. Storage written in C, check.


Embeddable... in the java ecosystem. I often see comments about datalog/datomic on HN and it seems interesting but I never see it anywhere else, is it because it's mostly known and used in the java and clojure ecosystem? Do you know of any free database with similar models usable from e.g. python?


Ooh, this looks good. LMDB backend though, meh.

Edit: it's written in Clojure, so JVM. Extra bleh



Obligatory link to Project Mentat: https://github.com/mozilla/mentat

No longer actively maintained, but maybe a nice starting point for hacking on your dream!


mentat was archived by mozilla back in 2017, but there are a bunch of forks. Because github is dumb and has a terrible interface for exploring forks [0], I used the Active GitHub Forks tool [1] that helped to find:

qpdb/mentat [2] seems to be the largest (+131 commits) and most recently modified (May this year) fork of mozilla/mentat.

[0]: https://github.com/mozilla/mentat/network/members - Seriously, how am I supposed to use this? Hundreds of entries, but no counts for stars, contributors, or commits, no details about recent commits. Just click every one?

[1]: https://techgaun.github.io/active-forks/index.html

[2]: https://github.com/qpdb/mentat


Your dream sounds very nice.


if litestream ever enables logical replication, I think you could do `SQLite logical replication --> embedded materialize-db --> back to SQLite`


What about DataScript/Datahike?

The obvious issue is that they're fairly deeply embedded in the Clojure(Script) ecosystem.


I’m surprised something like this doesn’t exist yet - I wonder if it’s possible to build it on top of SQLite somehow?


I tried. It's not easy because of how limiting SQLites indexes are. You have to build your own indexes using `TRIGGER`s or in a software wrapper and tables.

You can see me prototype here: https://git.sr.ht/~chiefnoah/quark


Commendable attempt! I've considered writing a datalog storage backend on sqlite just like your prototype. Thank you for sharing, now I can lazily study your prototype instead of doing the hard work myself. :) I'm curious, what kinds of limitations of SQLite indexes are you referring to?


Sparse indexes are pretty limited and it only supports B-tree, which make implementing AVET and VAET difficult. Further efficiently finding the current value for a E + A is difficult to do in SQL in a way that doesn't require maintaining a whole copy of the data. I actually bumped up against what I believe are weird edge-case bugs in the SQLite query planner when dealing with sparse indexes as well.

I think I gave up when trying to implement one-many relationships because the SQL was getting too gnarly.


There's a bunch of resources on r/databasedesign.


I can't seem to find this subreddit. Do you have a link?



Interesting article. When I implemented a no-sql database using the sqlite backend for https://github.com/rochus-keller/DoorScope and https://github.com/rochus-keller/CrossLine 15 years ago there was little literature about it (actually also the referenced article is rather light on the pages and b-tree implementation, but e.g. section 9 of https://www.amazon.com/Definitive-Guide-SQLite-Experts-Sourc... was helpful). There was no lmdb or leveldb yet; though there was bekreley db and I did prototypes with it, but the sqlite backend turned out to be much leaner and faster for the task at hand.


A bit of a tangent, but the Corecursive podcast has a really interesting interview with the creator of SQLite: [0]

[0] https://corecursive.com/066-sqlite-with-richard-hipp/


It’s always good for this kind of articles, to refer to source code. Provide links to the source code responsible of some of the implementation details


Here's a header file that basically mirrors some of what the article is talking about, the layout of pages and the btree and so on (~lines 100-200)

https://github.com/sqlite/sqlite/blob/master/src/btreeInt.h

The code for the btree functions is here and is a bit over my head TBH with all the locks and permissions and so on but it's a nice example of how to comment code I think:

https://github.com/sqlite/sqlite/blob/master/src/btree.c


Yeah that would be awesome! I'd rephrase your comment as a suggestion rather than a demand though personally. :)


GP's account page says From Morocco With Love, and I'm sure their Arabic and French are far beyond mine.

This reads to me like an accidental slip into the imperative, breaking a sentence in two parts, and not adding a phrase such, "For example,".

In fact, changing the period to a colon would defuse the imperative. It's subtle!


That's a good idea. There's a careful balance though since once you start talking about specific functions in SQLite then the post becomes very low-level and starts to become less accessible. I'll try to integrate that more in the future though.


Yeah maybe it would force you to get too close to the code, I don't know SQLite well enough.

When I do surveys of software I try to provide links to source (mostly just the relevant file or directory) so folks can verify for themselves. For example when looking into the parsers behind Postgres-compatible DBs [0] and parser techniques in a few major language implementations [1]. But both those posts are at a higher-level than what you're doing here.

I'm sure you'll have good reason either way for this series.

[0] https://datastation.multiprocess.io/blog/2022-02-08-the-worl...

[1] https://notes.eatonphil.com/parser-generators-vs-handwritten...


> https://github.com/sqlite/sqlite/blob/master/src/btreeInt.h

Woah, Ben! I'm so glad the above link was shared above; thanks to the poster as well. There's a very good amount of in-depth knowledge shared via comments and a reference to Donald knuth's book


There's a lot of goodies in the comments. I like this one about how POSIX locks are broken: https://github.com/sqlite/sqlite/blob/master/src/os_unix.c#L...


Thanks! Enough catalyst for me to read through all of SQLite source files


The SQLite documentation is pretty good as well: https://sqlite.org/fileformat.html


I'll second that. I was able to use that documentation to write javascript code that reads and queries a sqlite database. (Just as a learning exercise, I don't have a practical use for this.)


SQLite is a great choice as an in-memory database or when Postgres is clearly an overkill, e.g. storing some configs. I'm using it myself

However, given that so many hosted Postgres solutions exist and that Postgres has so many essential features that SQLite doesn't, there really is no reason to choose a new hosted SQLite over one of the mature hosted Postgres services. Especially typesystem and mvcc come to my mind, but there's much more Postgres offers over SQLite.


Off topic: does it bother anyone else when author By Line and Date Published is displayed at the very end of the blog/article as opposed to the start/top?

I’m not even a journalist / communication person but I do immediately look at those fields to get a quick sense of relevance (is this an old article possibly out-of-date) and authority of author on said topic.


A date published anywhere at all is fortunate.


If it's accurate. Went searching for something the other day, Google listed is as being written the middle of May. I figured that would contain pretty recent and relevant info. Nope. The actual article was written in 2012, when the tech in question barely existed.


There are so many "evergreen" articles in the world which do not even include a post date that it's hard to be annoyed about this.


You can get hands on with this at codecrafters.io. They have a ‘do it from scratch’ challenge that takes you from the .dbinfo command to implementing a full table scan with a WHERE clause. I got up to implementing a basic SELECT over multiple columns.


> Simplicity leads to reliability and I don't know of a more reliable database than SQLite.

JSON file on disk might be a reasonable competitor here. This scales poorly, but sometimes you don't need to scale up to something as crazy as full-blown SQLite.


The funny thing is that SQLite is faster for storing JSON blobs than the file system is: https://www.sqlite.org/fasterthanfs.html

I would also say that the second you need any sort of writing or mutating in any sort of production environment SQLite is also simpler than the file system!

Race conditions in file system operations are easy to hit, and files have a lot more moving pieces (directory entries, filenames, file descriptors, advisory locks, permissions, the read/write API, etc) than a simple SQL schema

Whenever I see “full-blown” next to “SQLite” there is usually some sort of misconception. It’s often simpler and more reliable than the next best option!


It looks like sqlite is not much faster than fs on Linux.

I've also found that ripgrep performs surprisingly well with folders containing large numbers of text files - this makes raw json on disk more convenient for many simple usecases which don't require complex queries.


It's on write that using SQLite really shines.

Imagining a moderately large JSON object, a lot of them actually, not hard to read or write exactly but they do carry some common references.

SQLite can: only rewrite the changed fields with a query, and transact the update across the objects. The filesystem can do atomic swaps of each file, usually, if nothing goes wrong. SQLite would also be faster in this case, but I'm more motivated by the ACID guarantees.


JSON file (or any file you directly write to) is simple, but has few of the guarantees SQLite gives you with regards to persistence. SQLite abstracts away all of that for you by handling the fancy platform-specific I/O stuff under the hood.

SQLite does increase complexity in the sense that it's a new dependency. But as far as dependencies go, SQLite has bindings in almost all languages. IMO even for simple usecases, it's worth using it from the start. Refactors are easier down the line than with a bespoke JSON/yaml/flat encoding.

Read only configuration, sure, JSON (etc) file makes sense. The second you start writing to it, probably not.


Could you elaborate more on those things abstracted by SQLite? or point to further reading - You got me interested


Not OP, but I remember reading an interesting article about issues that simple write() might face: https://danluu.com/deconstruct-files/

And here's what SQLite does to ensure that in case of application or computer crash database contains either new or old data, but not their mix or some other garbage: https://sqlite.org/atomiccommit.html


Implied is “as simple as possible but no simpler”. We used JSON on disk at one job BUT.

1. We knew what we were doing with respect to fsync+rename to guarantee transactions. I bet you the vast majority of people who go to do that won’t (SQLite abstract you from needing to understand that)

2. We ended up switching to SQLite anyway and that was a painful migration.

Just pick SQLite to manage data and bypass headaches. There’s a huge ecosystem of SQL-based tools that can help you manage your growth too (eg if you need to migrate from SQLite to Postgres or something)


At what point does one “need” to move to Postgres / MySQL? It feels like much of DBA is all style points - not necessarily objective.


Does SQLite have perf tools these days? I got bit badly by SQLite perf a decade ago (a bad default tanked performance and took a week of autotools, profiling, and mucking around in source to fix). Now I just use Postgres whenever I need anything resembling performance or scale. A decade is a long time, though -- do the tools exist now?

If not, some jobs will benefit from a better perf insurance policy than "I could probably get this into a profiler if I needed to."


If your database is write heavy, and receives a bunch of traffic, then PostgreSQL is probably a better fit.

Also, if the SQL that's needed in an application is more "enterprise" style, then SQLite isn't there (yet). ;)


Also if you want things like advanced datatypes or indexes, or features sqlite doesn't have (of which there are many, though in the last few years the support for "advanced" SQL has really taken a leap forward making it much more enjoyable).


Anyone looking at hand-rolled disks on files isn't needing anything sqlite won't give them. + getting people comfortable with sql & then migrating from sqlite to postgres isn't a bad tradeoff.


If at least one of the following are true (non-exuastive list):

- More than one machine needs to read from the database

- More than one machine needs to write to the database

- The database is too large to hold on one machine


It becomes necessary at some (low) value of concurrent write usage basically. Or if you want to lean on logic built into one of the "proper" RDBMS (GIS, granular access managment, ...).


> It becomes necessary at some (low) value of concurrent write usage basically.

Much less so since the introduction of WAL mode 12 years ago, tho.


I often see the discussed, but I've not seen any benchmarks testing throughput of concurrent writes in SQLite compared to Postgres, so it's hard to know exactly how far WAL mode make it scale before it's simply the wrong tool for the job. I'm quite curious to know, so at some point I'll get around to doing my own comparison. If you are able to give some indication then that would be very handy. I also think it can be made to scale much further if you have a Multi-Tenant architecture where you essentially do a database per tenant, though that has it's drawbacks along other dimensions (namely schema changes). At some point it's going to come down to disk throughput.


SQLite will not scale writes anywhere near Postgres as there can still only be one writer at a time.

However much of the lore around writes remains from the “rwlock” mode where writers would block not just other writers but other readers as well.

That would kill your system at very low write throughputs, since it would stop the system entirely for however long it took for the write transaction to complete.

If the system is concurrent-writes-heavy it remains an issue (since they’ll be serialised)


What is considered a "very low write throughput"? Are we talking 10s of transactions per second? 100s of transactions per section? 1000s of transactions per second? 10,000s of transactions per second?

https://stackoverflow.com/questions/35804884/sqlite-concurre... has some pretty interesting numbers in it. Particularly the difference between using WAL mode vs not. The other aspect to consider is that the concurrent writes limitations apply per database file. Presumably having a database file per tenant improves aggregate concurrent write throughput across tenants? Though at the end of the day the raw disk itself will become a bottleneck, and at least the sqlite client library I'm using does state that it can interact with many different database files concurrently, but that it does have limitations to around ~150 database files. Again, depending on the architecture you can queue up writes so they aren't happening concurrently. I'd be very interested to know given all the possible tools/techniques you could throw at it just how far you could truly push it and would that be woefully underpowered for a typical OLTP system, or would it absolutely blow people's minds at how much it could actually handle?

When it comes to SQLite performance claims I tend to see two categories of people that make quite different claims. One tends to be people who used it without knowing its quirks, got terrible performance out of it as a result, and then just wrote it off after that. The other tends to be people who for some reason or other learned the proper incantations required to make it perform at it's maximum throughput and were generally satisfied with its performance. The latter group appear to be people who initially made most of the same mistakes as the former, but simply stuck it out long enough to figure it out like this guy: https://www.youtube.com/watch?v=j7WnQhwBwqA&ab_channel=Xamar...

I'm building an app using SQLite at the moment, but am not quite up to the point in the process where I've got the time to spent swapping it out for Postgres and then benchmarking the two, but I dare say I will, as I have a hard time trusting a lot of the claims people make about SQLite vs Postgres and I'm mainly doing it as a learning exercise.


SQLite is not good at OLTP workloads. If that is what you are doing then SQLite runs out of steam pretty quickly.


> multiple-user OLTP

It works fine for single user applications.


if you see "database locked" more often then you'd like


JSON is a pretty poor storage format though, isn't it? I use it for the wordle clone I made, but the code would have been much simpler with SQL statements. And that was without any kind of complicated relational data https://github.com/schwartzworld/schwardle


We did have a big system that used JSON file on disk. The problem is over time we ended up with files with half a JSON, and needed a "cleanup" pass. Except of course the cleanup might find a file in the process of being written, and life slowly got more and more complicated.


Worked at a place that used on-disk XML as their database with multiple (10s of) writers. Started off fragile, ended up hideously complex and fragile. Unfortunately it was originally written by one of the powers that be and his word held sway over every technical decision, good or disastrous.



One of the projects I worked on was an embedded system running on 200MHz with 8MB RAM and micro SD for storage, single data channel. SQLite was chosen because of the power to preform report calculations in SQL and is perfect for configuration storage in simple Key / Value relation, like Firefox about:config.

SQLite is my go-to database for application configuration and even non-enterprise web applications.


> something as crazy as full-blown SQLite.

The sqlite library is tiny. Often smaller than some JSON parsers!


> The sqlite library is tiny. Often smaller than some JSON parsers!

Stripped of all comments, the current "amalgamation" build of sqlite (which is the one most projects use) has 178k lines and 4.7MB of code. If your JSON parsers are anywhere near 1/10th of that size then they're seriously overengineered ;).


I cannot imagine such a json parser. You can parse json in only hundreds of lines of code.


I wrote a new, general-purpose data management system that was originally designed to compete with file systems for storing large amounts of unstructured data (my background is in disk utilities like PartitionMagic and Drive Image). It can be used to convert 100M+ files (photos, documents, videos, etc.) into data objects and attach searchable, contextual meta-data tags to them.

The tags I invented were essentially highly-optimized key-value stores that worked really well. I then discovered that these same KV stores could also be used to form relational tables (a columnar store). It could be used to store both highly structured data or the more semi-structured data found in things like Json documents where every value could be an array. Queries were lightning fast and it could perform analytic operations while still performing transactions at a high speed.

My hobby turned into a massive project as I figured out how to import/export CSV, Json, Json lines, XML, etc. files to/from tables easily and quickly. It still has a lot of work to go before it is a 'full-blown' database, but it is now a great tool for cleaning and analyzing some pretty big tables (tested to 2500 columns and 200M rows).

It is in open beta at https://didgets.com/download and there are a bunch of short videos that show some of the things it can do on my youtube channel: https://www.youtube.com/channel/UC-L1oTcH0ocMXShifCt4JQQ


Yeah, I like JSON on disk too. There's some "gotchas" to avoid to make it safe though like atomic renaming or fsyncing the file and its parent directory. But as long as you avoid those and don't have a lot of data then JSON-on-disk works great.


For reliability, SQLite probably still wins. Even if you write the JSON all at once, what happens when the power goes out at that instant?


It looks like SQLite is more popular in China than other regions: https://ossinsight.io/analyze/sqlite/sqlite


This map looks like a surprisingly good proxy for GDP, aside from China being really high (mostly attributed to users somehow associated with three Chinese companies?)


Good catch! Actually I was thinking the popularity of SQLite somehow relates to the adoption rate of mobile devices where majority of these devices embeds SQLite to manage data like pictures.


You should be careful what kind of links you post in your articles. Once I've clicked on /r/eatsandwiches I was too distracted to read the rest of the blog post.


SQLite is 21 years old, and for me it just keeps getting better and better. More performant, more relevant features, all while being just as easy to embed and administer.


How much of SQLite is D. Richard Hipp versus companies that are contributing to the code?


I can't directly answer your question because I don't know, but I will note that the SQLite Copyright page states

> Contributed Code

> In order to keep SQLite completely free and unencumbered by copyright, the project does not accept patches.

https://www.sqlite.org/copyright.html

I think the number of people who have worked directly on the code for the canonical SQLite distribution is quite small.


I talked with the SQLite team about a year or two ago and IIRC there were four people working on it (including Dr. Hipp). I believe they were all full time but I didn't ask.


Good read.

I wonder how the underlying physical disk or file systems handle the writes or updates


A bit off topic but fly related:

- How can i forward range of ports instead of generating 1000s of lines? This is a major blocker for me.

edit: not sure why this is being downvoted, I've received no response from fly.io about this and this is the only way to get their attention on this shortcoming. you can't really expect people to write port 100,00 to 50,000 individually thats like 40,000 lines. Fly.io really needs to support port ranges like 10,000-50,000


community.fly.io is the best place to ask about this.

We don't support tcp port ranges yet. We will someday: https://community.fly.io/t/new-feature-every-public-port-now...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: