Hacker News new | past | comments | ask | show | jobs | submit login
Limbo: A complete rewrite of SQLite in Rust (turso.tech)
373 points by avinassh 19 days ago | hide | past | favorite | 232 comments



Given the code quality and rigid testing, SQLite is probably the last project that should be rewritten. It'd be great to see all other C code rewritten first!


That was my take when LibSQL was announced. And it still is and would be my take if LibSQL remains C-coded. But a Rust-coded rewrite of SQLite3 or LibSQL is a different story.

The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary, and they don't accept contributions to any of either. This incentivizes anyone who needs support and/or new features in SQLite3 to join the SQLite Consortium. It's a great business model -- I love it. But there are many users who want more of a say than even being a consortium member would grant them, and they want to contribute. For those users only a fork would make sense. But a fork would never gain much traction given that test suite being proprietary, and the SQLite3 team being so awesome.

However, a memory-safe language re-implementation of SQLite3 is a very different story. The U.S. government wants everyone to abandon C/C++ -- how will they do this if they depend on SQLite3? Apart from that there's also just a general interest and need to use memory-safe languages.

That said, you're right that there are many other projects that call for a rewrite in Rust way before SQLite3. The thing is: if you have the need and the funding, why wouldn't you rewrite the things you need first? And if SQLite3 is the first thing you need rewritten, why not?


>> The SQLite3 business model is that SQLite3 is open source

This is going to sound pedantic, but SQLite is not Open Source. It's Public Domain. The distinction is subtle, but it is important.


>This is going to sound pedantic, but SQLite is not Open Source. It's Public Domain.

Well, there are 2 different modes of communication:

(1) official language-lawyer pedantic communication: "open source" != "public domain"

(2) conversational casual chitchat : "open source" includes "public domain"

Yes, the SQLite home page does say "public domain". However, when people interview SQLite create, Richard Hipp, he himself calls it "open source". He also doesn't correct others when they also call it "open source". Excerpt of R Hipp:

  So, I thought, well, why can't I have a database that just
  reads directly off the disc? And I looked around and
  there were none available. I thought, “oh, I'll just write
  my own, how hard can that be?” Well, it turns out to be
  harder than you might think at first, but I didn't know
  that at the time. But we got it out there and I just put it
  out as open source. And before long, I started getting
  these phone calls from the big tech companies of the
  day, like Motorola and AOL, and, “Hey, can you
  support this?”, and “Sure!” And it's like, wow, you can
  make money by supporting open source software?
https://sigmodrecord.org/publications/sigmodRecord/1906/pdfs...


> (2) conversational casual chitchat : "open source" includes "public domain"

it's wrong though. like, can't be more wrong than that. you can't do whatever you want with open source software, the license tells what you can and cannot do.

with public domain software you can do most things.


Open source means just that: that the source is open. The OSI and co. re-defining the term to suit their ideological preferences doesn’t really change that. SQLite is open source, even if it’s not Open Source.

Edit: FSF should have been OSI, I think. Fixed.


> The OSI and co. re-defining the term

I don't know where you got this idea but it's not true. The OSI is simply defending the definition as it has been generally understood since the start of its usage in the 1980s by Stallman and others.

The only group of people "re-defining" -- quite successfully I suppose, which you are an example of -- what open source software means are those that have a profit motive to use the term to gain traction during the initial phase where a proprietary model would not have benefited them.

I don't think I need to provide concrete examples of companies that begin with an open source licensing model, only to rug-pull their users as soon as they feel it might benefit them financially, these re-licensing discussions show up on HN quite often.


In the 1980s we had Shareware, Beerware, Postware, whateverWare, Public Domain, "send me a coffee", "I don't care" open source, magazine and book listings under their own copyright licenses (free for typing, not distribution).

Most of us on 8 and 16 bit home computers didn't even knew "Stallman and others" were.

Additionally, GCC only took off after Sun became the first UNIX vendor to split UNIX into two SKUs, making the whole development tools its own product. Others quickly followed suit.

Also, in regards to Ada adoption hurdles, when they made an Ada compiler, it was its own SKU, not included on the UNIX SDK base package.


I don't really understand what your point is, but shareware has never been "open source".

Nobody's arguing that public domain code, or the MIT, or whatever is not open source; it's obviously open source because it's _more_ free than the GPL.

Sure, devs can call any "source available" project "open source" because it gets people interested even though you have zero interest in using an open source development model or allowing others to make changes to the code. Devs can also expect well deserved flak from people who understand that "open source" is not marketing speak.


I don't understand why OSI didn't pick an actually trademarkable term and license its use to projects that meet its ideals of open-sourceness. OSI knows it has no right to redefine common language and police its usage, any more than a grammar pedant has the right to levy fines against those of us who split infinitives.

(To be fair to OSI, I've never seen any of their representatives do this. But the internet vigilante squad they've spawned feels quite empowered to let us know we've broken the rules.)


> conversational casual chitchat : "open source" includes "public domain"

No. What are you talking about? They are not related... other than for people virtually completely new to, well, open source.

You are also completely confused, here, too:

> Yes, the SQLite home page does say "public domain". However, when people interview SQLite create, Richard Hipp, he himself calls it "open source". He also doesn't correct others when they also call it "open source".

They are different things. A project can be both; a person can talk about these two aspects of one project.


This quickly gets into the details of definitions, but I think by most people's definitions of 'open source', something that is 'public domain' qualifies as such (see also 'source available' or 'copyleft/free software', one of which is not quite open source and the other is a more restrictive kind of open source. 'permissive' licenses like MIT and similar are closer to public domain but are different to varying degrees of technicality: one of the main problems with 'public domain' is that it's not universally accepted that there's any means to deliberately place a copyrightable work into it, so something like sqlite where the authors are not long dead is not actually public domain according to many jusrisdictions)


^ is a confused demonstration of my point:

> They are different things. A project can be both; a person can talk about these two aspects of one project.

BTW, your pouring on of qualifiers (elsewhere "weasel words") shows your (correct) lack of conviction:

> the details of definitions, but I think by most people's definitions of 'open source', something that is 'public domain' qualifies as such


It's a difference only insofar that in many jurisdictions their claim that it's public domain has no legal value. If it was truly public domain (e.g. if the authors were long dead) it would be open source. But far from all places allow you to arbitrarily put things in the public domain.

I'm a bit puzzled why SQLite doesn't solve this trivial issue by claiming the code is CC0-licensed. CC0 is made just for that: a very wordy way to make it as close to public domain as possible in each jurisdiction.

On the other hand, hobbyists won't care. As long as you trust them in their intention to have it open source they won't sue you for infringement either. And if as a company you need more assurance than "it's public domain" they are so nice to sell you a fancy legally-satisfying piece of paper for an undisclosed price. It's a subtle but clever way to get income from users with too much money


SQLite didn't just say, "it's public domain."

They explicitly state, "Anyone is free to copy, modify, publish, use, compile, sell, or distribute the original SQLite code, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means."

One can buy a "license" if one's company is run by idle lawyers: https://www.sqlite.org/purchase/license


> They explicitly state, "Anyone is free to copy, modify, publish, use, compile, sell, or distribute the original SQLite code, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means."

It's not clear this is a license grant rather than legal advice (which would be correct legal advice if the code were public domain, but it is not).


> sell you a fancy legally-satisfying piece of paper for an undisclosed price

It's $6,000 https://sqlite.org/prosupport.html


> It's Public Domain.

Is it though? The website does say "All of the code and documentation in SQLite has been dedicated to the public domain by the authors" but copyright law has no exception for "dedications" to the public domain. At best the authors are estopped from bringing suit but even that is unclear.


Companies can buy licences if they're uncomfortable with the Public Domain dedication:

[quote]

Licenses are available to satisfy the following needs:

    * You want indemnity against claims of copyright infringement.
    * You are using SQLite in a jurisdiction that does not recognize the public domain.
    * You are using SQLite in a jurisdiction that does not recognize the right of authors to dedicate their work to the public domain.
    * You want to hold a tangible legal document as evidence that you have the legal right to use and distribute SQLite.
    * Your legal department tells you that you have to purchase a license.
[end quote]

https://www.sqlite.org/purchase/license


They could have CC0 licensed the code or they could have said they would not enforce their copyright. They did neither. SQLite is closed source. The "dedication" (which has no legal effect, what does it even mean?) encourages widespread adoption and big players are spooked into paying for a license (or "warranty of title"). That's quite a strategy.


open source != Open Source. If I had meant the latter I would have written Open Source, but I wrote open source because I meant the former.

How's that for being pedantic?


not helpful.

capitalization is not bearing meaning in these contexts.

open source means OSI compliant, broadly speaking, and licensed as such.

in contrast, public domain doesn't exist in some jurisdictions, which is why sqlite as a company had to create an option to provide an official license. which they found so annoying that they charged a sweet fee to send a signed printed letter...



They don't own the words "open source" no matter how much they might like to.

> “Open Source” describes a subset of free software that is made available under a copyright license approved by the Open Source Initiative as conforming with the Open Source Definition.

No it doesn't. It describes software whose source is "open" which is generally understood to mean that you can read, modify and reuse the code for free.

Public domain definitely fits that. The "public domain doesn't exist in some countries" arguments are spurious as far as I can tell.


Public domain is a form of open source.


No. At least according to the Open Source Initiative, public domain is not open source: https://opensource.org/blog/public-domain-is-not-open-source


It is absolutely true that a work can be in the public domain and not have source available (or even contributable). But that doesn't really matter to most people. The question for most people is not whether something is open source, but whether they can copy and make use of a work without being held liable for copyright infringement. SQLite happens to be both public domain and open-source to an extent (i.e., source available).

Conversely, open source doesn't necessarily mean "free to use without encumbrance." There are many open-source licenses that forbid certain uses (e.g. Business Source License). On the other hand, a work in the public domain is free to be used by all without restriction.

A better analysis of open source vs. public domain would be in the form of a square, where one dimension would be the right to use the work, and the other dimension would be the ability to obtain and contribute source code.


The Business Source License is not an open source license. Open source does mean "free to use without encumbrance" - see points 5 and 6 of the Open Source Definition at https://opensource.org/osd


Yea - software released under the BSL is “source available”, not open source.


Approximately zero people who make real business decisions care what the OSI considers a "real open-source license" to be. They care what the text of the license says.

Also, many licenses, such as the GPL (one of the very first "open source" licenses), have certain encumbrances; you cannot redistribute GPL-licensed software without either including its source code or making it readily available.


No one's saying public domain isn't useful. You're replying to a comment that's specifically and solely combatting the idea that public domain means open source.


Any definition of open source that doesn't include the public domain is out of touch with how real people use the words "open source" and is therefore useless. You can make up any definition you want, but if you insist on calling elephants "bananas", I'm not going to take you seriously


The problem with your analogy is that open source has a definition. As does public domain. As do elephants and bananas.

In your analogy we're not the ones calling elephants bananas, you are. We want to keep calling one bananas and the other elephants. You are suggesting that since elephants are similar to bananas you can simply use either word.

Legally, Open Source and Public Domain are -very- different animals. Open Source comes eith a copyright, and a license (which has requirements), public domain does not.

Of course public domain and open source are both "shipped as source code". Then again so is a fair bit of proprietary software. That doesn't make it open source either.


How people use the term "fair use" is out of touch with the legal definition. That doesn't change the legal definition, it means people use the term incorrectly.


It means the common use of "fair use" is different to the legal definition. It doesn't mean either are wrong. It isn't wrong to say a tomato is a vegetable. In common use it is.

Similarly the common use of "open source" is different to the OSI's preferred definition. Note that the OSI's preferred definition is not a legal definition. It's just what they prefer.


The linked article mentions public domain.

Please note that public domain laws vary depending on the country. What you call a banana might mean something different elsewhere.


You should read the text, it's not about calling elephants bananas but real issues with software in the public domain


I read the text: it's license hermenuetics at best and FUD at worst. Has there been a single instance in recorded history of the author of a public domain work trying to enforce usage, modification, or distribution permissions. Sure, you can point to theoretical variation in the precise semantics of the public domain in various jurisdictions, but it feels like a bar exam puzzle, not a real world practical concern. In the real world, you can safely do whatever you want with public domain software. It counts as free software. That half the planet nowadays uses SQLite and treats it as free software is testament to this reality. Obscure license pedanticism just doesn't inform the choices of anyone actually building.


Public Domain is not Free Software (in the FSF sense) because it has none of the encumbrances of a Free Software license.

In other words you don't use PD software "like Free Software". You can use it in many places where Free Software would not be permissible.

In terms of -developer- freedom, public domain is top of the pile, the Open Source, then Free Software.

In terms of -user- freedoms Free Software is top of the pile, OSS in the middle, public domain is similar to commercial software.


Open source and Free software have different philosophies, but in practice they are essentially the same. You are thinking about copyleft vs non-copyleft. BSD, MIT, CC0 are all Free Software licenses but not copyleft.


You’re making the common mistake of confusing the copyleft vs. permissive distinction with the free software vs. open source distinction.

GPL is copyleft. MIT, BSD etc. are permissive. But all of those are both free software and open source, which are essentially synonyms.

The reason so many people get confused by this is that some of the people who prefer copyleft licenses (notably the FSF) also tend to prefer the term “free software”, for philosophical reasons.


It might seem really unlikely any acquirer would ever sue, but if your big company has compliance auditors they will need to see something in black and white.


Lots of big companies somehow manage to use SQLite. I've never heard of a company prohibiting it on license grounds.


Technically true, though this distinction only matters in a few countries.


> public domain software may be free software but is not certain to be.

Open Source relies on copyright and contract law (which are somewhat standardized or at least understood due to their importance in commerce). Public domain relies on other laws that can vary significantly.

https://opensource.org/blog/public-domain-is-not-open-source


Yes, public domain is a form of open source, but not of Open Source (for reasons that IMO are silly).


Bugs are fixed along with regression tests. Here's a recent example: https://www.sqlite.org/src/info/289daf6cee39625e

As far as I can see, these tests come with the same public domain dedication as the rest of the code.

You may be referring to the TH3 tests (https://sqlite.org/th3.html). The main goal (100% branch coverage, 100% MC/DC) would not be achievable for a Rust implementation (or at least an idiomatic Rust implementation …) because of the remaining dynamic run-time checks Rust requires for safety.


sqlite also has some runtime checks that are expected to be always true or always false, and solves that by using a custom macro that removes these branches during branch coverage test.

The same would be possible in Rust. Everything that could panic has a non-panicking alternative, and you could conditionally insert `unreachable_unchecked()` to error handling branches to remove them. That wouldn't be most idiomatic, but SQLite's solution is also a custom one.


> The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary

no.

the business model is services, and a red phone to companies who use sqlite in production. like nokia back in the days when we had these little flip phones, or desk phones had a "rolodesk" built in, or many other embedded uses of a little lovely dependable data store.

the services include porting to and "certification" on specifically requested hardware and OS combinations, with indeed proprietary test suites. now these are not owned by sqlite, but by third parties. which license them to sqlite (the company).

and it started with being paid by the likes of nokia or IBM to make sqlite production ready, add mc/dc coverage, implement fuzzing, etc etc etc.,

their license asks you to do good not evil. and they take that serious and try their best to do the same. their own stuff is to an extreme extend in the public domain.


> companies who use sqlite in production

It's not just old Nokias or desktop phones, nor just embedded sytsems. sqlite is almost everywhere. Adobe, Apple, Microsoft, Google, Mozilla and many other companies use it in very widely deployed software.


> > The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary

> no.

> the business model is services, and a red phone to companies who use sqlite in production. like nokia back in the days when we had these little flip phones, or desk phones had a "rolodesk" built in, or many other embedded uses of a little lovely dependable data store.

Members of the SQLite Consortium surely have this "red phone" you speak of. So in what way was my characterization of their business model wrong?


> The U.S. government wants everyone to abandon C/C++

That's the position of two federal agencies, namely, FBI and CISA. They don't describe how this change will reduce CVEs or why the languages they prefer still produce projects with CVEs.

I don't particularly hold the technical or social acumen of FBI or CISA in particularly high regard and I'm not sure why anyone would by default either. Mostly because they say things like "switch to python!" without once accounting for the fact that python is written in C.

It's an absurd point to invoke as a defense of this idea.


Well Google's Rust report is already out with great results.. so..


Why does the fork have to gain traction?

You keep and maintain your local fork that does what you need it to do. perhaps if you are charitable you share it with others. but you don't need to do this. and it just adds support burden.


If you don't care about traction then you wouldn't seek press for your fork to begin with. TFA is clearly seeking traction.


Also the very rigid testing makes the rewrite a lot easier to validate.


...as long as you can persuade the keepers of the proprietary test suite to agree to run it against your code.


Even without that, it’s helpful. It means there is less (no?) undefined behavior that you will need to emulate to maintain compatibility. You can just follow the spec.


If you can not run the test suite, then how do you know that you properly followed the spec? And did so securely? And in a performant manner? Even for edge cases? On obscure hardware, filesystems, and OSes? Even if the power cuts out? Or the cable to the hard drive (transactions)? Even if a stray cosmic ray flips a bit?

By the way, SQLite itself does not meet one of these criteria. Know which one? ))


I’m not sure what your point is? Yes, it would be better if they would run their tests against your fork. But they won’t. Still, it’s better for the fork writer that they exist.


> The U.S. government wants everyone to abandon C/C++ -- how will they do this if they depend on SQLite3?

ABI, the same way you don't need the Linux kernel to be rewritten to remove your app dependency on C/C++


Just stumbled onto this forum. Really, appreciated such a thoughtful and insightful comment. Nice corner of the internet you have here.


Welcome to HN! (I'm a mod here.) I'm curious—how did you find us?


> Given the code quality and rigid testing, SQLite is probably the last project that should be rewritten.

That was my take for many years but I have come around 180 degree on this. I think at this point it's very likely and most likely mandatory to eventually rewrite SQLite. In parts because of what is called out in the blog post: the tests are not public. More importantly, the entire project is not really open. And to be clear: that is okay. The folks that are building it, want to have it that way, and that's the contract we have as users.

But that does make certain things really tricky that are quite exciting. So yes, I do think that SQLite could need some competition. Even just for finding new ways to influence the original project.


This reminds me of VIM - and after quite some time I believe that all VIM users will agree that adding NeoVIM to the ecosystem improved VIM itself. VIM 8 addressed over half the issues that led to the NeoVIM fork in the first place - with the exception of the issue of user contributions, of course.


A company that works with SQLite and prefers to write Rust has the expertise needed to rewrite SQLite in Rust. That’s what they’re doing.

All the other C code could be rewritten, this doesn’t stop or slow down any such effort. But for sure it was never going to be possible for a database provider to start making a memory safe implementation of libpng or something.


From https://news.ycombinator.com/item?id=42379402

> It uses Asynchronous I/O, considers WASM as first class, and has Deterministic Simulation Testing support from the beginning.

These are all very hard to do in straight C if your goal is to program in Rust.


Seems like a potentially interesting project to get rid of sqlite's compatibility baggage e.g. non-strict tables, opt-in foreign keys, the oddities around rowid tables, etc... as well as progress the dialect a bit (types and domains for instance).


But the article mentions that they intend to have full compatibility:

  > Our goal is to build a reimplementation of SQLite from scratch, fully compatible at the language and file format level, with the same or higher reliability SQLite is known for, but with full memory safety and on a new, modern architecture.


author here: fully compatible at the language and file format level.

Further down the post I actually call out explicitly that we do intend to get rid of some of the baggage.


If you "intend to get rid of some of the baggage" you won't be fully compatible.

libSQL already isn't fully compatible: as soon as you add a RANDOM ROWID table, you get "malformed database schema" when using the (e.g.) sqlite3 shell to open your file (also Litestream doesn't work, etc).

And that's fine, as there probably is no better way of doing what you needed to do. But it's also taking what SQLite offers and breaking the ecosystem, under the covers of "we're compatible" without ever calling out what compromises are being made.

Note how the SQLite documentation that introduces STRICT tables very clearly documents the backwards compatibility issues of the feature and how to get around them: https://sqlite.org/stricttables.html#backwards_compatibility

You also never got round to documenting the internal Virtual WAL APIs you exposed. This is something where SQLite is lacking, where you could've made an impact without any compatibility issues, and pressure upstream to release something by doing it first/better. Alas, you did it for Turso's exclusive benefit.


That's fine though. Full compatibility doesn't have to mean full backwards compatibility. I think of it as what's Typescript to Javascript.


Once you compile your Typescript to Javascript, Javascript runtimes can run it, Javascript code can call it, etc. Even source maps work.

Once you start using libSQL features, SQLite tools will simply stop working with your databases.

That means the sqlite3 shell stops working, backup solutions like Litestream and sqlite-rsync stop working, SQLite GUIs like SQLiteStudio stop working, forensic and data recovery tools start giving will have a harder time working, etc.

Maybe it's all worth it, but it's not full compatibility, and it should at least be documented.


i would guess "full memory safety" is going to be impossible, at least at compile time. I'd guess that if for no other reason than performance SQLite uses data oriented techniques that effectively reduces pointers to indices, which will no longer have ownership or lifetime tracking in the rust compiler.


As a counterpoint, doing a rewrite of an example of the best C codebases gives you a much more interesting comparison between the languages. Rewriting a crappy C codebase in a modern, memory safe language is virtually guaranteed to result in something better. If a carefully executed rewrite of SQLite in Rust doesn't produce improvements (or some difficult tradeoffs), that's very informative about the relative virtues of C and Rust.


Code quality is not the only thing to consider. Some people would love to see something like SQLite with 2 important changes: referential integrity that respects the DDL and strict tables that also respects the DDL.


I might be missing something—is there a reason why rewriting it in Rust would be a prerequisite to adding these features, vs just starting a fork?

And in this case the project intends to be fully compatible, so they wouldn't be able to unilaterally adopt or drop features anyway.


An SQLite fork will have a hard time being compelling enough to draw users away from the main project. Being written in Rust is the most compelling reason that I could think of. SQLite has many annoying quirks (foreign key constraints disabled by default and non-strongly-typed columns are my two pain points) but a fork that addresses them would still not pull me away from the original project that I have so much trust in.


If I were to fork SQLite, drawing users away from the main project would be a non-goal. The goal would be to get strict tables and foreign key constraints enforced 100% of the time.


Yeah, I would assume that any project like this would strive to be a soft fork that just has a few minimal patches to address specific needs, not something that actually tries to compete with the original.


If that was the case, they wouldn't introduce cross incompatibilities in the changes they made (or would at least discuss compatibility in the docs), and they'd make any added features useful to others by properly documenting them.

Compatibility for libSQL is a one way street. I don't expect Limbo to be any different.


Agreed! If I were to fork sqlite, that would be my aim.


Agreed! Rewriting in Rust (or any other language) is not required for those features. A fork and modifying the existing C code could also result in those features (and I might do just that if it doesn't come around soon).


They did add support for strict type checking fairly recently, and you can turn on foreign key checking I think.


Here is the STRICT table type page: https://www.sqlite.org/stricttables.html It is fairly straightforward: you just have to add STRICT to your table definition and you have it.

And the FOREIGN KEY support is here: https://www.sqlite.org/foreignkeys.html The two requirements are that your build not have it disabled, and that you execute `PRAGMA foreign_keys = ON;` when you open the database (every time you open the database).


I don't view opt-in as a very good defaults for these.


Then build with SQLITE_DEFAULT_FOREIGN_KEYS=1 to make it opt-out (and to opt-out you'd need to inject SQL).

As for STRICT: if you make your tables STRICT, there's no opt-out.

So why is this an issue? Do you want them to break the file format to say "from this version forward, all tables are STRICT"? What does that really buy you?

It's an embed database: anyone who can mess with your database and circumvert integrity can also open the file and corrupt it.


> What does that really buy you

It removes a footgun for new users. That's not an insignificant benefit.

Probably not worth the backwards compatibility cost, but it definitely is an issue.


My builds of SQLite do enable foreign keys by default.

This is easy to do for any project using the amalgamation.


I agree on a level that SQLIte is a master class in testing and quality. However, considering how widely used it is (essentially every client application on the planet) and that it does get several memory safety CVEs every year there is some merit in a rewrite in a memory safe language.


While I agree with you on one level, that code rigidity and testing means that a port of SQLite is much more viable than most other C-based projects. And I'm intrigued by what this would enable, e.g. the WASM stuff the authors mention. It's not that it couldn't be done in C but it'll be easier for a wider range of contributors to do it in Rust.


It'd be cool if it was possible to run user provided SQL queries safely without sandboxing/wasm.


When the initial SQLite3->LibSQL fork was announced I was pretty negative about it because SQLite3 has a wonderful, 100% branch coverage test suite that is proprietary, and so without access to that any fork would be bound to fail.

However, if there's a big product behind the fork, and even better, a rewrite to a memory-safe language, then the fork begins to make a lot of sense. So, hats off to y'all for pulling this off!


Good luck for sure, but browsing their compatibility matrix, it looks like they are a LONG way off. By the looks of it, they have mostly read compatibility with little write capabilities (no alter table, for example).


That's fully in line with what they're announcing here. It's the announcement of a new project that has passed the prototyping stage, but one that has not reached the 1.0 stage.


correct, this is just the project being moved from fun personal side project from the company's CTO to an experimental stage as a company project.

There isn't a long term roadmap, or anything like that. I got pretty excited when I saw the results, though. It's less about the number of github stars - who the hell cares about those - but the contributors. Limbo already has a very nice list of contributors, which led me to believe there is something here!


And it never will

The first 90% is easy, it's the second 90% that is very hard.


dunno, we rewrote a database before, much larger and harder than sqlite, and it was pretty successful.

In my experience the 10% that doesn't get done is the 10% that people don't care too much about it anyway.


Which one?


Someone has done a rewrite before…


As long as they have the funding I'm sure they can get there.


All this talk of “SQLite is not open contribution” never seems to consider that a project being “open contribution” doesn't mean the maintainers will accept your contributions.

They have a process for contributions to follow: you suggest a feature, they implement it. It's far from the only project to take such a stance.

Just in the SQLite “ecosystem” see the contribution policies of Litestream and LiteFS. I don't see people brandishing the ”not open contribution” to Ben's projects.

https://github.com/superfly/litefs?tab=readme-ov-file#contri...

https://github.com/benbjohnson/litestream?tab=readme-ov-file...


> SQLite’s test suite is proprietary

This is literally the first time I've ever heard of this, for any project anywhere. I suppose Android is built a bit in this way, but that's a whole other can of worms.


Well, Java during its initial life was controlled to a degree through the control of the tests: https://en.wikipedia.org/wiki/Technology_Compatibility_Kit .


Huh, I wasn't aware that Java was initially open source


I think Java/JDK was closed source initially, then went open source in 2006/2007 (?), but without the TCK. The TCK was never open sourced but the JCK is now kind of "open": https://openjdk.org/groups/conformance/JckAccess/


They have a test suite that is part of SQLite3 then public domain product, and they have a much bigger and better test suite that is proprietary.


they do not fully own said proprietary sql test suite. they've licensed it. that's why they can _run_ it but not publish it or share it. That's at least how I remember Richard Hick describing the situation at a talk.


It could be simply to prevent forks, but if it really is 100% branch coverage, why do they still have memory safety related CVE coming out? With asan turned on, and full static analysis, that should make such errors exceedingly rare. Part of the benefit of rust is that it makes coverage both easier to get due to its type system, and less necessary because of the guarantees it makes. But if they really went all the way to 100% branch coverage that should be almost as good if all the samitizers are running.


They claim 100% on https://en.wikipedia.org/wiki/Modified_condition/decision_co...

However, unless you can guarantee that every branch tested has been covered for all possibly relevant application states, that does not preclude CVEs.


Large chunks of the test suite are open source, committed to the repo and easy to run with a `make test`.

Everytime a bug is reported in the forums, the open source tests are updated as part of the bug fix for everyone to see.

There's a separate test suite that offers 100% coverage, that is proprietary, and which was created for certification for use in safety critical environments.

HN loves to discuss business models for open source, but apparently has a problem with this one. Why?


> For maximum performance, users have to choose WAL mode over journal mode, disable POSIX advisory locks, etc.

Speaking of "wal" mode, is "wal2" mode [1] on your radar for this project to prevent wal files from growing indefinitely in a busy system?

[1] https://sqlite.org/cgi/src/doc/wal2/doc/wal2.md


Wondering about this as well :) That would be a game changer.


You can have a memory safe SQLite today if you compile it with Fil-C. Only tiny changes required. I almost have it passing the test suite (only two test failures left, both of which look specious).


Did a little reading on Fil-C and… “Also, it's slow – about 1.5x-5x slower than legacy C.”

So that’s dead on arrival.


Clearly still in very early days:

    uv run --with pylimbo --python 3.13 python
Then:

    >>> import limbo
    >>> con = limbo.connect("/tmp/content.db")
    thread '<unnamed>' panicked at core/schema.rs:186:18:
    not yet implemented: Expected CREATE TABLE statement
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
With that environment variable:

    stack backtrace:
    0: _rust_begin_unwind
    1: core::panicking::panic_fmt
    2: limbo_core::util::parse_schema_rows
    3: _limbo::__pyfunction_connect
    4: pyo3::impl_::trampoline::trampoline


Hey Simon! However early you think it is, I can guarantee it is even earlier =)

If this is just a standard sqlite database that you are trying to open, though, I'd have expected it to work.


It's this database from here: https://datasette.io/content.db - it uses the SQLite FTS extension though so it's not surprising there was something in there that caused problems!


I'm not buying the rationale in the "async IO" section.

First, there's no need to rewrite anything to add an async interface to sqlite if you want (many clients do, whether local or remote).

The issue with sqlite's synchronous interface is leaving a thread idle while you wait for IO. But I wonder how much of an issue that really is. sqlite is designed to run very locally to the storage, and can make use of native file caching, etc, which makes IO blocking very short if not zero. You wonder if applications have enough idling sqlite threads to justify the switching. (It's not free and would be at quite a fine-grained level.)

The section does mention remote storage, but in that case you're much better off with an async client talking to compute running sqlite, sync interface and all, that is very local to the storage. AKA, a client/server database.

Also, in the WASM section, we're still talking about something that would best be implemented as a sqlite client/wrapper, with no need at all to rewrite it.


> The issue with sqlite's synchronous interface is leaving a thread idle while you wait for IO

That's not the only issue. waiting for the result of every read to be able to queue the next read is also an issue, particularily for a VFS that exists on a network (which is a target of theirs, they explicitly mention S3).

I'm not sure if they also are doing work on improving this, but I'm sure that theoretically many reads and writes that SQLite does do not depend on all previous reads and writes, which means you could queue many of them earlier. If your latency to storage is large, this can be a huge performance difference.


You can get more total IO throughput (at the cost of latency) by queueing up multiple reads and writes concurrently. You can do this with threads, but io_uring should theoretically go faster (but don't take my word for it, let's wait for benchmarks).

I'm personally interested in the potential for async bindings for Python. Making fast async wrappers for blocking APIs in Python-land is painful (although it might improve in the future with nogil).


They had been talking about making the high-level interface to sqlite async (sqlite3_step()).

With io_uring you're talking about the low-level, where blocks are actually read and written.

As-is, sqlite is agnostic on that point. It doesn't do I/O directly, but uses an OS abstraction layer, called VFS. VFS implementations for common platforms are built-in, but you can create your own that handles storage IO any way you like, including queuing reads and writes concurrently using io_uring.

So that's not a reason to rewrite sqlite.

(In fact, I'd be surprised if they weren't looking at io_uring, and, if it seemed likely to generally improve performance, to provide an option to use it, either in the existing linux-vfs or in some other way.)

> I'm personally interested in the potential for async bindings for Python.

Well, it's perfectly possible to do that with the current sqlite. It may be painful, as you say, but not even remotely at the level of pain a complete rewrite entails.


The VFS interface is synchronous, I don't see how a custom VFS could meaningfully implement asynchronous IO.

> Well, it's perfectly possible to do that with the current sqlite.

If you want to wrap a blocking API in python, with actual parallelism, you have to use multiple processes with communication between them. The main advantage of sqlite in the first place is that it's in-process, and you'd lose that.


> The VFS interface is synchronous

On a single thread. There can be multiple threads.

Of course leaving a thread idle while waiting for IO isn't great. That's why I noted it at the beginning. But it doesn't seem idling threads has proven to be much of a problem with sqlite, so it wouldn't be much justification for a rewrite.

> If you want to wrap a blocking API in python, with actual parallelism, you have to use multiple processes

You can use multiple threads in the same process.

(Python has some limitations in that respect, but that's not a sqlite issue and can't be fixed by a sqlite rewrite.)


In case anyone else was wondering, SQLite is about 156k lines of code with 92,000k lines of test code.


Do you have a source for that?


Why do you need a source? You can clone SQLite3 and count the lines yourself.


The test code of sqlite is not public.


Yes and no. Part of it is public, just not the "best" part: https://www.sqlite.org/testing.html


Thanks for the link. It looks like the public part is 27k lines of code (vs the 92,000k lines of code in the proprietary closed-source part).


So three orders of magnitude more tests than code. Yikes, if that is what 100% branch coverage looks like then count me out!


The license is "Copyright 2024 the Limbo authors". How is that possible if Limbo is based on a rewrite?

Do they claim a clean room implementation?

It seems wise of SQLite to close down their test suite. That's a great idea I wish I had heard about earlier.


SQLite is in the public domain. It is perfectly legal to create a derivative works from a public domain project and license it however you want. It's not cool and kind of a dick move to put it under a more restrictive license, but it's legal.


It's not a dick move if you are making legitimate improvements -- especially if you still reference the origin. That's literally the idea behind public domain


However that only really works for people who are satisfied with SQLite's public domain licensing. If you are in a jurisdiction that doesn't allow you to dedicate a work to the public domain and are worried about the SQLite developers suing you for infringement at some point, Limbo holds the exact same risk of SQLite suing you.


SQLite is in the public domain (i.e. not copyrighted), so a clean room implementation is unnecessary.


So many good things with incremental improvements in the space, but as a consumer it kinda stresses me out having to worry about libsql vs sqlite vs duckdb etc.

I personally use SQLite and DuckDB daily, but recently adopted turso in lieu of litestream for a something. I appreciate that they all are relatively compatible but I'd love to just have a tool.

Even then thats why I love the relationship between SQLite and DuckDB. I can backend my system with SQLite and run analytics and processing via DuckDB and they service specific purposes.

The hard thing with this for me is being a split consumer and not having the bandwidth to split my attention between who is doing better innovation and just using a tool I can rely on to predictably get the job done for me.

That being said, hats off this is awesome. I really appreciate turso.


I am assuming that DO-178B certification for the Rust variant is not on the table.

https://www.sqlite.org/hirely.html

https://www.sqlite.org/qmplan.html

https://www.sqlite.org/th3.html

The name "Limbo" is also used by a post-C/UNIX language from AT&T for the Inferno operating system.

https://en.wikipedia.org/wiki/Limbo_(programming_language)



Interesting, thanks for pointing that out.

If there was a solid certification, it would likely be for a specific version.


> To complete the puzzle, we wanted to deterministically test the behavior of the database when interacting with the operating system and other components. To do that, we are partnering with Antithesis

Are there any open source DST projects, even just getting started? I don't even know how/where to start if I would want to do the same on a small app, but can't afford nor want to depend long term on a commercial license.


A side topic: is there a nice big extensive free test suite for sql, for people interested in making toy databases to use?


The standard TPC-C and TPC-H benchmarks are available online at https://www.tpc.org/tpc_documents_current_versions/current_s...

They aren’t fully open source but are free to use, including use with open source software. They may be a bit on the complex side though.


Which SQL? Every database system implements their own variant, and (almost?) none are fully ANSI SQL compliant.



OT: for just a second I thought it was a rewrite of the Limbo programming language. Might be a fun side project! :)


Are there any plans for the python bindings to support an async interface?


That would be a really cool feature! I've been running sqlite3 in async python for Datasette for six years now but it involves some pretty convoluted threading mechanisms, having native async would be fantastic.


I would love to see it succeed!

They mention testing that bytecode generation generates the exact same results as SQLite... Does this exclude writing new optimization passes that are not in sqlite?


Nice. They're just starting out, but it's a good idea.


One killer feature I miss from SQLite is table compression. Especially important on various embedded devices where you collect data from sensors or logs.


20% faster for some operations now, but with only a small subset of SQL implemented. Sound like making it as fast as SQLite will be hard with full compatibility given that I expect a lot more branching etc?

https://github.com/tursodatabase/limbo/blob/main/COMPAT.md


Not as fast as /dev/null


sqlite3 is 1.6MB while limbo is 6MB, size matters for many low-end but huge-volume embedded boards.


This is slightly OT, but I wanted to mention that Limbo is also the name of a programming language:

https://en.wikipedia.org/wiki/Limbo_(programming_language)


disclosure: I work here. I am happy to answer any questions

tl;dr We are rewriting SQLite in Rust. It uses Asynchronous I/O, considers WASM as first class, and has Deterministic Simulation Testing support from the beginning.

source: https://github.com/tursodatabase/limbo


Will you release the code into the public domain, like SQLite?


absolutely not.

We're not fans of public domain, which is one of the things that led us to create libSQL in the first place.

It is MIT.


Why not do both? Release as public domain or MIT license. Take your pick.


The MIT license requires you to attribute the original authors, I would assume.


What's wrong with public domain?


Public domain rights differ across countries. So someone could put a work into the public domain and sue you in the one or two major countries where copyright can’t be disclaimed by the original author without a license, or where there isn’t legal precedent for it (this is purely theoretical and has never actually occurred before, it would also not be likely to succeed if it did happen).

Also, it is technically possible for someone to claim public domain software as entirely their own work, while MIT requires attribution.


Do you honestly believe any of the SQLite authors or its millions of users (especially deep-pocketed companies like Apple) lose any sleep at night over this?


No. It’s only really an issue in theory.


Is your company a member of the SQLite Consortium? https://www.sqlite.org/consortium.html


> It uses Asynchronous I/O

Can it have more than 1 writer?


As of now it has a single writer, same like SQLite. But we plan to add MVCC with multiple writers in the future. Pekka has experimented with MVCC earlier: https://github.com/penberg/tihku


I assume this will need to change the shared memory header protocol so either SQLite comes along for the ride or this is an incompatible ABI?


So a hard fork?


What is meant by a 'hard fork'? Are there different kinds of forks?


To me, a "hard" fork is one where you plan not to maintain any compatibility with your upstream and not to share any future code in either way (neither from original to fork nor back). "Soft" forks often retain a degree of compatibility, and some future developments can be shared.

(In this case, since it's a rewrite in Rust, it's not actually a fork at all, I think)


Got it! Thanks.


The kinds that keep file format compatibility?

Or are they claiming they will support MVCC with full file format and interprocess synchronization compatibility.


Got it! Thanks.


How much has Turso raised so far? Rewriting SQLite sounds like it may cost a sum.



I am not sure how feasible it is, but can't SQLite be partially rewritten step by step on the main branch instead of being forked?

As the article mentioned, a complete rewrite will not be as stable as the original.


SQLite is open-source but not open-contribution – they don't accept contributions of that sort. They follow "cathedral" style development and invented a whole alternative to git for that purpose https://fossil-scm.org/home/doc/43c3d95a/www/fossil-v-git.wi...

https://www.sqlite.org/copyright.html

> In order to keep SQLite completely free and unencumbered by copyright, the project does not accept patches. If you would like to suggest a change and you include a patch as a proof-of-concept, that would be great. However, please do not be offended if we rewrite your patch from scratch.


As other commenters have already pointed out, SQLite does not take outside contributions.

We already have a fork, called libSQL. However, the goals of Limbo are far more ambitious and we cannot rewrite some parts step by step. We want to have DST ( Deterministic Simulation Testing), a testing methodology pioneered by Foundation DB and TigerBeetle. It is not easy to do that in an existing codebase


> can't SQLite be partially rewritten step by step on the main branch

Only by the SQLite team. They don't accept contributions of anything other than spelling fixes and such.


what do you mean with "on the main branch"? i doubt that migrating from c makes sense for their constraints and expertise, you would not want someone to come into your house and change your furniture. forking is the right political and technical approach for this team. also rust does not support a lot of sqlite target platforms


Name clash, with this cool-looking emulator:

https://virtualmachinery.weebly.com/


Such a missed opportunity to call it "Lambo"


then we'd have to ship with a PHP driver from day1.


Limbo has been taken (as the name of a language), so this should be SQuaLor or something...


this is a codename, and if the project is to be successful, we don't expect to keep it.


Nothing is as permanent as something temporary.

Congrats on a great new undertaking!


Is there any big open soure, long term, community contributed, in rust?


Edit: Is there any big open source project, long term, community contributed, in rust?


As others have said, the “Performance” section is asinine, because they haven't fully implemented 100% of SQLite. Not disclaiming this obvious fact in the “Performance” section is incredibly misleading.

I could trivially write “an SQLite clone” that could execute `SELECT * FROM users LIMIT 1` even faster than either this or SQLite—if that's the only string I accepted as input!


Is there any software that let me make graphical user interface to a connected database, allows me to make data visualizations, all things automatic and interactive?

Like a node editor or spreadsheet? It needs to be suitable for general public


Dunno. Good luck to them, but I never saw a need to rewrite sqlite.


Guessing the shortcomings become starker if you’re spending lots of time in the codebase/building a company on top of it.


Yeah… Attempting to integrate MVCC and then doing vector search gave enough perspective to do this!


> building a company on top of it.

So be sure you proceed in such a way that never contributes any money or code back to the original project.


I do see a need for multiple implementations of SQLite3. First there's the need for multiple implementations for the reasons given by the LibSQL folks, second there's the need for a memory-safe language implementation of SQLite3, and third there's the need for a native language implementation for languages whose runtimes really want not to have C involved (e.g., Go).


The fact that there's no alternative implementation of SQLite also seems to play a part in preventing standardization of WebSQL.

https://www.w3.org/TR/webdatabase/

"The specification reached an impasse: all interested implementors have used the same SQL backend (Sqlite), but we need multiple independent implementations to proceed along a standardisation path."


I was completely unaware of that! How old is that document? I should reach out.


That effort died about 14 years ago.


damn =(


an opportunity for all of us to celebrate the astonishing power of necromancy


Indeed! This sort of thing is a problem. It's the same with Internet protocols: you need at least two implementations to get to Standard.


Most of us don't see the need to do something until after it is done.


The sqlite implementation that matters the most the next 10 years will be the one with the smallest WASM build.


Why do you believe that? I'm guessing you're thinking about in-browser or sandboxed lambdas?


Local-first primarily.


Ok I understand that, but why wasm size? Local-first works pretty well for a shared library, and it can already be compiled to wasm. But you're predicting based on the _size_ of the wasm bundle being the determining factor which is a really interesting opinion so are you able to explain that?

I could understand if you said 'the fastest' or 'the safest' but 'the smallest' is what I'm hung up on


I'm biased for sure, but the biggest thing keeping me from using sqlite or pglite in the browser is the size of the WASM payloads. They dwarf every other part of a well designed app, at least for the simple things I like to build.


Ah I see! If your use case was read-only you might be able to cut out some of the binary size, but it sounds like Web SQL would've been what you need unfortunately


I’ll take the faster c version anyday over the rust. How are those conditional if statements working for you all?


The benchmarks in the post indicate that Limbo is more performant than SQLite, not less.


> Executing cargo bench on Limbo’s main directory, we can compare SQLite running SELECT * FROM users LIMIT 1 (620ns on my Macbook Air M2), with Limbo executing the same query (506ns), which is 20% faster.

Faster on a single query, returning a single result, on a single computer. That's not how database performance should be measured or compared.

In any case, the programming language should have little to no impact on the database performance, since the majority of the time is spent waiting on io anyway


The goal here is not to claim that it is faster, though (it isn't, in a lot of other things it is slower and if you run cargo bench you will see)

It is to highlight that we already reached a good level of performance this early in the project.

Your claim about the programming language having no impact is just false, though. It's exactly what people said back in 2015 when we released Scylla. It was already false then, it is even more false now.

The main reason is that storage is so incredibly fast today, the CPU architecture (of which the language is a part) does make a lot of difference.


Yo glommer, I am ... very surprised to see any benchmark beat the micro-tuned sqlite so early, congrats. Where do you think rust is picking up the extra 100ns or so from a full table scan?


> It is to highlight that we already reached a good level of performance this early in the project.

This is the right thing to do. It's a pity so many projects don't keep an eye on performance from the very first day. Getting high performing product is a process, not a single task you apply at the end. Especially in a performance critical system like a database, if you don't pay attention to performance and instead you delay optimizing till the end, at the end of the day you'll need to do a major rewrite.


thanks. I am sad, but not that surprised, that a lot of people here are interpreting this as we claiming that we're already faster than sqlite all over.

I don't even care about being faster than sqlite, just not being slower, this early, is already the win I'm looking for.


Not sure which other comments you're seeing, but my original comment wasn't intended that way.


> In any case, the programming language should have little to no impact on the database performance, since the majority of the time is spent waiting on io anyway

That was true maybe 30 years ago with spinning disks and 100 mbit ethernet. Currently, with storage easily approaching speeds of 10 GB/s and networks at 25+ Gbit/s it is quite hard to saturate local I/O in a database system. Like, you need not just a fast language (C, C++, Rust) but also be very smart about how you write code.


I can’t trust benchmarks on code that isn’t feature complete. The missing functionality is never free.


> In any case, the programming language should have little to no impact on the database performance, since the majority of the time is spent waiting on io anyway

may be! However, Rust makes some things easier. It is also easy to maintain and reiterate


Related: https://old.reddit.com/r/rust/comments/1ha7uyi/memorysafe_pn...

C libraries aren't automatically the fastest option, there's a lot of C code which has stagnated on the performance front but is still widely used because it's battle tested and known to be robust by C standards.


There's still an element of truth in the idea that C is going to be faster by default. There's simply a much lower bar to writing fast (and unsafe) C. Fast Rust demands considerably more thoughtfulness from the programmer (at least for me).


> There's simply a much lower bar to writing fast (and unsafe) C.

That's kind of my point, writing a faster PNG decoder in C may be easier for you but convincing anyone to actually use it instead of the slower but proven safe-ish libpng would be an uphill battle. Trust in C code is extremely hard-won compared to Rust which uses little if any unsafe. The 'png' crate that Chrome is considering to replace libpng has no unsafe whatsoever and is still faster.


> Fast Rust demands considerably more thoughtfulness from the programmer (at least for me).

While fast code requires thoughtfulness regardless of the language, I think rust lets you focus on the fast aspect more because rustc ensures _some_ safety and correctness.

I can write fast and very unsafe C code fast, but I write code that just as fast , but safer in rust faster than in C.


It's entirely likely that you could write faster Rust in the same (nigh-infinite) time it'd take to write equally safe C. I intentionally avoided that comparison though. If you take a normal 10min function in C, it's going to compile into something reasonable and run fast. If you take the same 10min rust function, the language surface area is so much larger that there's a much higher chance that it won't.

Here's a more concrete, albeit irrelevant in practice example from writing most things in both languages:

https://news.ycombinator.com/item?id=42342382#42352053

Implemented in Rust over generic T, you need Wrapping<T> or the equivalent num_traits traits. The implementations for these take borrowed references. Rustc is pretty good at ensuring this becomes pass by value under the hood, but it's imperfect. I found instances of it failing in the test disassembly, even though an implementation for this never has to touch anything but registers. That's performance work that wouldn't have existed in C/C++ for these particular types.


For very simple things C might be accidentally faster.

But C lacks many modern data structures, which are critical for performance on modern hardware and bigger input data.


What do you mean by C lacks many modern data structures?


This is often true on the journey to reach feature-parity with an original codebase.

The reason is obvious, of course: it has less features, and doing nothing is always faster than doing something.

Once it's feature complete, then meaningful comparisons can be made. For now, it's puffery.


be careful with that, though. In a lot of ways it is still slower.

The goal with that was just to demonstrate that there's nothing really there that is fundamentally slower, and the perf is already on par in the areas where we spent cycles on.


Microbenchmarks are not particularly predictive of performance with real workloads. And there's just one microbenchmark claimed here.


That doesn't mean much until it's got feature parity.


what on earth do you mean by 'conditional if statements' if not a tautology?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: