Given the code quality and rigid testing, SQLite is probably the last project that should be rewritten. It'd be great to see all other C code rewritten first!
That was my take when LibSQL was announced. And it still is and would be my take if LibSQL remains C-coded. But a Rust-coded rewrite of SQLite3 or LibSQL is a different story.
The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary, and they don't accept contributions to any of either. This incentivizes anyone who needs support and/or new features in SQLite3 to join the SQLite Consortium. It's a great business model -- I love it. But there are many users who want more of a say than even being a consortium member would grant them, and they want to contribute. For those users only a fork would make sense. But a fork would never gain much traction given that test suite being proprietary, and the SQLite3 team being so awesome.
However, a memory-safe language re-implementation of SQLite3 is a very different story. The U.S. government wants everyone to abandon C/C++ -- how will they do this if they depend on SQLite3? Apart from that there's also just a general interest and need to use memory-safe languages.
That said, you're right that there are many other projects that call for a rewrite in Rust way before SQLite3. The thing is: if you have the need and the funding, why wouldn't you rewrite the things you need first? And if SQLite3 is the first thing you need rewritten, why not?
>This is going to sound pedantic, but SQLite is not Open Source. It's Public Domain.
Well, there are 2 different modes of communication:
(1) official language-lawyer pedantic communication: "open source" != "public domain"
(2) conversational casual chitchat : "open source" includes "public domain"
Yes, the SQLite home page does say "public domain". However, when people interview SQLite create, Richard Hipp, he himself calls it "open source". He also doesn't correct others when they also call it "open source". Excerpt of R Hipp:
So, I thought, well, why can't I have a database that just
reads directly off the disc? And I looked around and
there were none available. I thought, “oh, I'll just write
my own, how hard can that be?” Well, it turns out to be
harder than you might think at first, but I didn't know
that at the time. But we got it out there and I just put it
out as open source. And before long, I started getting
these phone calls from the big tech companies of the
day, like Motorola and AOL, and, “Hey, can you
support this?”, and “Sure!” And it's like, wow, you can
make money by supporting open source software?
it's wrong though. like, can't be more wrong than that. you can't do whatever you want with open source software, the license tells what you can and cannot do.
with public domain software you can do most things.
Open source means just that: that the source is open. The OSI and co. re-defining the term to suit their ideological preferences doesn’t really change that. SQLite is open source, even if it’s not Open Source.
I don't know where you got this idea but it's not true. The OSI is simply defending the definition as it has been generally understood since the start of its usage in the 1980s by Stallman and others.
The only group of people "re-defining" -- quite successfully I suppose, which you are an example of -- what open source software means are those that have a profit motive to use the term to gain traction during the initial phase where a proprietary model would not have benefited them.
I don't think I need to provide concrete examples of companies that begin with an open source licensing model, only to rug-pull their users as soon as they feel it might benefit them financially, these re-licensing discussions show up on HN quite often.
In the 1980s we had Shareware, Beerware, Postware, whateverWare, Public Domain, "send me a coffee", "I don't care" open source, magazine and book listings under their own copyright licenses (free for typing, not distribution).
Most of us on 8 and 16 bit home computers didn't even knew "Stallman and others" were.
Additionally, GCC only took off after Sun became the first UNIX vendor to split UNIX into two SKUs, making the whole development tools its own product. Others quickly followed suit.
Also, in regards to Ada adoption hurdles, when they made an Ada compiler, it was its own SKU, not included on the UNIX SDK base package.
I don't really understand what your point is, but shareware has never been "open source".
Nobody's arguing that public domain code, or the MIT, or whatever is not open source; it's obviously open source because it's _more_ free than the GPL.
Sure, devs can call any "source available" project "open source" because it gets people interested even though you have zero interest in using an open source development model or allowing others to make changes to the code. Devs can also expect well deserved flak from people who understand that "open source" is not marketing speak.
I don't understand why OSI didn't pick an actually trademarkable term and license its use to projects that meet its ideals of open-sourceness. OSI knows it has no right to redefine common language and police its usage, any more than a grammar pedant has the right to levy fines against those of us who split infinitives.
(To be fair to OSI, I've never seen any of their representatives do this. But the internet vigilante squad they've spawned feels quite empowered to let us know we've broken the rules.)
> conversational casual chitchat : "open source" includes "public domain"
No. What are you talking about? They are not related... other than for people virtually completely new to, well, open source.
You are also completely confused, here, too:
> Yes, the SQLite home page does say "public domain". However, when people interview SQLite create, Richard Hipp, he himself calls it "open source". He also doesn't correct others when they also call it "open source".
They are different things. A project can be both; a person can talk about these two aspects of one project.
This quickly gets into the details of definitions, but I think by most people's definitions of 'open source', something that is 'public domain' qualifies as such (see also 'source available' or 'copyleft/free software', one of which is not quite open source and the other is a more restrictive kind of open source. 'permissive' licenses like MIT and similar are closer to public domain but are different to varying degrees of technicality: one of the main problems with 'public domain' is that it's not universally accepted that there's any means to deliberately place a copyrightable work into it, so something like sqlite where the authors are not long dead is not actually public domain according to many jusrisdictions)
It's a difference only insofar that in many jurisdictions their claim that it's public domain has no legal value. If it was truly public domain (e.g. if the authors were long dead) it would be open source. But far from all places allow you to arbitrarily put things in the public domain.
I'm a bit puzzled why SQLite doesn't solve this trivial issue by claiming the code is CC0-licensed. CC0 is made just for that: a very wordy way to make it as close to public domain as possible in each jurisdiction.
On the other hand, hobbyists won't care. As long as you trust them in their intention to have it open source they won't sue you for infringement either. And if as a company you need more assurance than "it's public domain" they are so nice to sell you a fancy legally-satisfying piece of paper for an undisclosed price. It's a subtle but clever way to get income from users with too much money
They explicitly state, "Anyone is free to copy, modify, publish, use, compile, sell, or distribute the original SQLite code, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means."
> They explicitly state, "Anyone is free to copy, modify, publish, use, compile, sell, or distribute the original SQLite code, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means."
It's not clear this is a license grant rather than legal advice (which would be correct legal advice if the code were public domain, but it is not).
Is it though? The website does say "All of the code and documentation in SQLite has been dedicated to the public domain by the authors" but copyright law has no exception for "dedications" to the public domain. At best the authors are estopped from bringing suit but even that is unclear.
Companies can buy licences if they're uncomfortable with the Public Domain dedication:
[quote]
Licenses are available to satisfy the following needs:
* You want indemnity against claims of copyright infringement.
* You are using SQLite in a jurisdiction that does not recognize the public domain.
* You are using SQLite in a jurisdiction that does not recognize the right of authors to dedicate their work to the public domain.
* You want to hold a tangible legal document as evidence that you have the legal right to use and distribute SQLite.
* Your legal department tells you that you have to purchase a license.
They could have CC0 licensed the code or they could have said they would not enforce their copyright. They did neither. SQLite is closed source. The "dedication" (which has no legal effect, what does it even mean?) encourages widespread adoption and big players are spooked into paying for a license (or "warranty of title"). That's quite a strategy.
capitalization is not bearing meaning in these contexts.
open source means OSI compliant, broadly speaking, and licensed as such.
in contrast, public domain doesn't exist in some jurisdictions, which is why sqlite as a company had to create an option to provide an official license. which they found so annoying that they charged a sweet fee to send a signed printed letter...
They don't own the words "open source" no matter how much they might like to.
> “Open Source” describes a subset of free software that is made available under a copyright license approved by the Open Source Initiative as conforming with the Open Source Definition.
No it doesn't. It describes software whose source is "open" which is generally understood to mean that you can read, modify and reuse the code for free.
Public domain definitely fits that. The "public domain doesn't exist in some countries" arguments are spurious as far as I can tell.
It is absolutely true that a work can be in the public domain and not have source available (or even contributable). But that doesn't really matter to most people. The question for most people is not whether something is open source, but whether they can copy and make use of a work without being held liable for copyright infringement. SQLite happens to be both public domain and open-source to an extent (i.e., source available).
Conversely, open source doesn't necessarily mean "free to use without encumbrance." There are many open-source licenses that forbid certain uses (e.g. Business Source License). On the other hand, a work in the public domain is free to be used by all without restriction.
A better analysis of open source vs. public domain would be in the form of a square, where one dimension would be the right to use the work, and the other dimension would be the ability to obtain and contribute source code.
The Business Source License is not an open source license. Open source does mean "free to use without encumbrance" - see points 5 and 6 of the Open Source Definition at https://opensource.org/osd
Approximately zero people who make real business decisions care what the OSI considers a "real open-source license" to be. They care what the text of the license says.
Also, many licenses, such as the GPL (one of the very first "open source" licenses), have certain encumbrances; you cannot redistribute GPL-licensed software without either including its source code or making it readily available.
No one's saying public domain isn't useful. You're replying to a comment that's specifically and solely combatting the idea that public domain means open source.
Any definition of open source that doesn't include the public domain is out of touch with how real people use the words "open source" and is therefore useless. You can make up any definition you want, but if you insist on calling elephants "bananas", I'm not going to take you seriously
The problem with your analogy is that open source has a definition. As does public domain. As do elephants and bananas.
In your analogy we're not the ones calling elephants bananas, you are. We want to keep calling one bananas and the other elephants. You are suggesting that since elephants are similar to bananas you can simply use either word.
Legally, Open Source and Public Domain are -very- different animals. Open Source comes eith a copyright, and a license (which has requirements), public domain does not.
Of course public domain and open source are both "shipped as source code". Then again so is a fair bit of proprietary software. That doesn't make it open source either.
How people use the term "fair use" is out of touch with the legal definition. That doesn't change the legal definition, it means people use the term incorrectly.
It means the common use of "fair use" is different to the legal definition. It doesn't mean either are wrong. It isn't wrong to say a tomato is a vegetable. In common use it is.
Similarly the common use of "open source" is different to the OSI's preferred definition. Note that the OSI's preferred definition is not a legal definition. It's just what they prefer.
I read the text: it's license hermenuetics at best and FUD at worst. Has there been a single instance in recorded history of the author of a public domain work trying to enforce usage, modification, or distribution permissions. Sure, you can point to theoretical variation in the precise semantics of the public domain in various jurisdictions, but it feels like a bar exam puzzle, not a real world practical concern. In the real world, you can safely do whatever you want with public domain software. It counts as free software. That half the planet nowadays uses SQLite and treats it as free software is testament to this reality. Obscure license pedanticism just doesn't inform the choices of anyone actually building.
Open source and Free software have different philosophies, but in practice they are essentially the same. You are thinking about copyleft vs non-copyleft. BSD, MIT, CC0 are all Free Software licenses but not copyleft.
You’re making the common mistake of confusing the copyleft vs. permissive distinction with the free software vs. open source distinction.
GPL is copyleft. MIT, BSD etc. are permissive. But all of those are both free software and open source, which are essentially synonyms.
The reason so many people get confused by this is that some of the people who prefer copyleft licenses (notably the FSF) also tend to prefer the term “free software”, for philosophical reasons.
It might seem really unlikely any acquirer would ever sue, but if your big company has compliance auditors they will need to see something in black and white.
> public domain software may be free software but is not certain to be.
Open Source relies on copyright and contract law (which are somewhat standardized or at least understood due to their importance in commerce). Public domain relies on other laws that can vary significantly.
As far as I can see, these tests come with the same public domain dedication as the rest of the code.
You may be referring to the TH3 tests (https://sqlite.org/th3.html). The main goal (100% branch coverage, 100% MC/DC) would not be achievable for a Rust implementation (or at least an idiomatic Rust implementation …) because of the remaining dynamic run-time checks Rust requires for safety.
sqlite also has some runtime checks that are expected to be always true or always false, and solves that by using a custom macro that removes these branches during branch coverage test.
The same would be possible in Rust. Everything that could panic has a non-panicking alternative, and you could conditionally insert `unreachable_unchecked()` to error handling branches to remove them. That wouldn't be most idiomatic, but SQLite's solution is also a custom one.
> The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary
no.
the business model is services, and a red phone to companies who use sqlite in production. like nokia back in the days when we had these little flip phones, or desk phones had a "rolodesk" built in, or many other embedded uses of a little lovely dependable data store.
the services include porting to and "certification" on specifically requested hardware and OS combinations, with indeed proprietary test suites. now these are not owned by sqlite, but by third parties. which license them to sqlite (the company).
and it started with being paid by the likes of nokia or IBM to make sqlite production ready, add mc/dc coverage, implement fuzzing, etc etc etc.,
their license asks you to do good not evil. and they take that serious and try their best to do the same. their own stuff is to an extreme extend in the public domain.
It's not just old Nokias or desktop phones, nor just embedded sytsems. sqlite is almost everywhere. Adobe, Apple, Microsoft, Google, Mozilla and many other companies use it in very widely deployed software.
> > The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary
> no.
> the business model is services, and a red phone to companies who use sqlite in production. like nokia back in the days when we had these little flip phones, or desk phones had a "rolodesk" built in, or many other embedded uses of a little lovely dependable data store.
Members of the SQLite Consortium surely have this "red phone" you speak of. So in what way was my characterization of their business model wrong?
> The U.S. government wants everyone to abandon C/C++
That's the position of two federal agencies, namely, FBI and CISA. They don't describe how this change will reduce CVEs or why the languages they prefer still produce projects with CVEs.
I don't particularly hold the technical or social acumen of FBI or CISA in particularly high regard and I'm not sure why anyone would by default either. Mostly because they say things like "switch to python!" without once accounting for the fact that python is written in C.
It's an absurd point to invoke as a defense of this idea.
You keep and maintain your local fork that does what you need it to do. perhaps if you are charitable you share it with others. but you don't need to do this. and it just adds support burden.
Even without that, it’s helpful. It means there is less (no?) undefined behavior that you will need to emulate to maintain compatibility. You can just follow the spec.
If you can not run the test suite, then how do you know that you properly followed the spec? And did so securely? And in a performant manner? Even for edge cases? On obscure hardware, filesystems, and OSes? Even if the power cuts out? Or the cable to the hard drive (transactions)? Even if a stray cosmic ray flips a bit?
By the way, SQLite itself does not meet one of these criteria. Know which one? ))
I’m not sure what your point is? Yes, it would be better if they would run their tests against your fork. But they won’t. Still, it’s better for the fork writer that they exist.
> Given the code quality and rigid testing, SQLite is probably the last project that should be rewritten.
That was my take for many years but I have come around 180 degree on this. I think at this point it's very likely and most likely mandatory to eventually rewrite SQLite. In parts because of what is called out in the blog post: the tests are not public. More importantly, the entire project is not really open. And to be clear: that is okay. The folks that are building it, want to have it that way, and that's the contract we have as users.
But that does make certain things really tricky that are quite exciting. So yes, I do think that SQLite could need some competition. Even just for finding new ways to influence the original project.
This reminds me of VIM - and after quite some time I believe that all VIM users will agree that adding NeoVIM to the ecosystem improved VIM itself. VIM 8 addressed over half the issues that led to the NeoVIM fork in the first place - with the exception of the issue of user contributions, of course.
A company that works with SQLite and prefers to write Rust has the expertise needed to rewrite SQLite in Rust. That’s what they’re doing.
All the other C code could be rewritten, this doesn’t stop or slow down any such effort. But for sure it was never going to be possible for a database provider to start making a memory safe implementation of libpng or something.
Seems like a potentially interesting project to get rid of sqlite's compatibility baggage e.g. non-strict tables, opt-in foreign keys, the oddities around rowid tables, etc... as well as progress the dialect a bit (types and domains for instance).
But the article mentions that they intend to have full compatibility:
> Our goal is to build a reimplementation of SQLite from scratch, fully compatible at the language and file format level, with the same or higher reliability SQLite is known for, but with full memory safety and on a new, modern architecture.
If you "intend to get rid of some of the baggage" you won't be fully compatible.
libSQL already isn't fully compatible: as soon as you add a RANDOM ROWID table, you get "malformed database schema" when using the (e.g.) sqlite3 shell to open your file (also Litestream doesn't work, etc).
And that's fine, as there probably is no better way of doing what you needed to do. But it's also taking what SQLite offers and breaking the ecosystem, under the covers of "we're compatible" without ever calling out what compromises are being made.
You also never got round to documenting the internal Virtual WAL APIs you exposed. This is something where SQLite is lacking, where you could've made an impact without any compatibility issues, and pressure upstream to release something by doing it first/better. Alas, you did it for Turso's exclusive benefit.
Once you compile your Typescript to Javascript, Javascript runtimes can run it, Javascript code can call it, etc. Even source maps work.
Once you start using libSQL features, SQLite tools will simply stop working with your databases.
That means the sqlite3 shell stops working, backup solutions like Litestream and sqlite-rsync stop working, SQLite GUIs like SQLiteStudio stop working, forensic and data recovery tools start giving will have a harder time working, etc.
Maybe it's all worth it, but it's not full compatibility, and it should at least be documented.
i would guess "full memory safety" is going to be impossible, at least at compile time. I'd guess that if for no other reason than performance SQLite uses data oriented techniques that effectively reduces pointers to indices, which will no longer have ownership or lifetime tracking in the rust compiler.
As a counterpoint, doing a rewrite of an example of the best C codebases gives you a much more interesting comparison between the languages. Rewriting a crappy C codebase in a modern, memory safe language is virtually guaranteed to result in something better. If a carefully executed rewrite of SQLite in Rust doesn't produce improvements (or some difficult tradeoffs), that's very informative about the relative virtues of C and Rust.
Code quality is not the only thing to consider. Some people would love to see something like SQLite with 2 important changes: referential integrity that respects the DDL and strict tables that also respects the DDL.
An SQLite fork will have a hard time being compelling enough to draw users away from the main project. Being written in Rust is the most compelling reason that I could think of. SQLite has many annoying quirks (foreign key constraints disabled by default and non-strongly-typed columns are my two pain points) but a fork that addresses them would still not pull me away from the original project that I have so much trust in.
If I were to fork SQLite, drawing users away from the main project would be a non-goal. The goal would be to get strict tables and foreign key constraints enforced 100% of the time.
Yeah, I would assume that any project like this would strive to be a soft fork that just has a few minimal patches to address specific needs, not something that actually tries to compete with the original.
If that was the case, they wouldn't introduce cross incompatibilities in the changes they made (or would at least discuss compatibility in the docs), and they'd make any added features useful to others by properly documenting them.
Compatibility for libSQL is a one way street. I don't expect Limbo to be any different.
Agreed! Rewriting in Rust (or any other language) is not required for those features. A fork and modifying the existing C code could also result in those features (and I might do just that if it doesn't come around soon).
Here is the STRICT table type page: https://www.sqlite.org/stricttables.html
It is fairly straightforward: you just have to add STRICT to your table definition and you have it.
And the FOREIGN KEY support is here: https://www.sqlite.org/foreignkeys.html
The two requirements are that your build not have it disabled, and that you execute `PRAGMA foreign_keys = ON;` when you open the database (every time you open the database).
Then build with SQLITE_DEFAULT_FOREIGN_KEYS=1 to make it opt-out (and to opt-out you'd need to inject SQL).
As for STRICT: if you make your tables STRICT, there's no opt-out.
So why is this an issue? Do you want them to break the file format to say "from this version forward, all tables are STRICT"? What does that really buy you?
It's an embed database: anyone who can mess with your database and circumvert integrity can also open the file and corrupt it.
I agree on a level that SQLIte is a master class in testing and quality. However, considering how widely used it is (essentially every client application on the planet) and that it does get several memory safety CVEs every year there is some merit in a rewrite in a memory safe language.
While I agree with you on one level, that code rigidity and testing means that a port of SQLite is much more viable than most other C-based projects. And I'm intrigued by what this would enable, e.g. the WASM stuff the authors mention. It's not that it couldn't be done in C but it'll be easier for a wider range of contributors to do it in Rust.
When the initial SQLite3->LibSQL fork was announced I was pretty negative about it because SQLite3 has a wonderful, 100% branch coverage test suite that is proprietary, and so without access to that any fork would be bound to fail.
However, if there's a big product behind the fork, and even better, a rewrite to a memory-safe language, then the fork begins to make a lot of sense. So, hats off to y'all for pulling this off!
Good luck for sure, but browsing their compatibility matrix, it looks like they are a LONG way off. By the looks of it, they have mostly read compatibility with little write capabilities (no alter table, for example).
That's fully in line with what they're announcing here. It's the announcement of a new project that has passed the prototyping stage, but one that has not reached the 1.0 stage.
correct, this is just the project being moved from fun personal side project from the company's CTO to an experimental stage as a company project.
There isn't a long term roadmap, or anything like that. I got pretty excited when I saw the results, though. It's less about the number of github stars - who the hell cares about those - but the contributors. Limbo already has a very nice list of contributors, which led me to believe there is something here!
All this talk of “SQLite is not open contribution” never seems to consider that a project being “open contribution” doesn't mean the maintainers will accept your contributions.
They have a process for contributions to follow: you suggest a feature, they implement it. It's far from the only project to take such a stance.
Just in the SQLite “ecosystem” see the contribution policies of Litestream and LiteFS. I don't see people brandishing the ”not open contribution” to Ben's projects.
This is literally the first time I've ever heard of this, for any project anywhere. I suppose Android is built a bit in this way, but that's a whole other can of worms.
I think Java/JDK was closed source initially, then went open source in 2006/2007 (?), but without the TCK. The TCK was never open sourced but the JCK is now kind of "open": https://openjdk.org/groups/conformance/JckAccess/
they do not fully own said proprietary sql test suite. they've licensed it. that's why they can _run_ it but not publish it or share it. That's at least how I remember Richard Hick describing the situation at a talk.
It could be simply to prevent forks, but if it really is 100% branch coverage, why do they still have memory safety related CVE coming out? With asan turned on, and full static analysis, that should make such errors exceedingly rare. Part of the benefit of rust is that it makes coverage both easier to get due to its type system, and less necessary because of the guarantees it makes. But if they really went all the way to 100% branch coverage that should be almost as good if all the samitizers are running.
Large chunks of the test suite are open source, committed to the repo and easy to run with a `make test`.
Everytime a bug is reported in the forums, the open source tests are updated as part of the bug fix for everyone to see.
There's a separate test suite that offers 100% coverage, that is proprietary, and which was created for certification for use in safety critical environments.
HN loves to discuss business models for open source, but apparently has a problem with this one. Why?
You can have a memory safe SQLite today if you compile it with Fil-C. Only tiny changes required. I almost have it passing the test suite (only two test failures left, both of which look specious).
>>> import limbo
>>> con = limbo.connect("/tmp/content.db")
thread '<unnamed>' panicked at core/schema.rs:186:18:
not yet implemented: Expected CREATE TABLE statement
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
It's this database from here: https://datasette.io/content.db - it uses the SQLite FTS extension though so it's not surprising there was something in there that caused problems!
I'm not buying the rationale in the "async IO" section.
First, there's no need to rewrite anything to add an async interface to sqlite if you want (many clients do, whether local or remote).
The issue with sqlite's synchronous interface is leaving a thread idle while you wait for IO. But I wonder how much of an issue that really is. sqlite is designed to run very locally to the storage, and can make use of native file caching, etc, which makes IO blocking very short if not zero. You wonder if applications have enough idling sqlite threads to justify the switching. (It's not free and would be at quite a fine-grained level.)
The section does mention remote storage, but in that case you're much better off with an async client talking to compute running sqlite, sync interface and all, that is very local to the storage. AKA, a client/server database.
Also, in the WASM section, we're still talking about something that would best be implemented as a sqlite client/wrapper, with no need at all to rewrite it.
> The issue with sqlite's synchronous interface is leaving a thread idle while you wait for IO
That's not the only issue. waiting for the result of every read to be able to queue the next read is also an issue, particularily for a VFS that exists on a network (which is a target of theirs, they explicitly mention S3).
I'm not sure if they also are doing work on improving this, but I'm sure that theoretically many reads and writes that SQLite does do not depend on all previous reads and writes, which means you could queue many of them earlier. If your latency to storage is large, this can be a huge performance difference.
You can get more total IO throughput (at the cost of latency) by queueing up multiple reads and writes concurrently. You can do this with threads, but io_uring should theoretically go faster (but don't take my word for it, let's wait for benchmarks).
I'm personally interested in the potential for async bindings for Python. Making fast async wrappers for blocking APIs in Python-land is painful (although it might improve in the future with nogil).
They had been talking about making the high-level interface to sqlite async (sqlite3_step()).
With io_uring you're talking about the low-level, where blocks are actually read and written.
As-is, sqlite is agnostic on that point. It doesn't do I/O directly, but uses an OS abstraction layer, called VFS. VFS implementations for common platforms are built-in, but you can create your own that handles storage IO any way you like, including queuing reads and writes concurrently using io_uring.
So that's not a reason to rewrite sqlite.
(In fact, I'd be surprised if they weren't looking at io_uring, and, if it seemed likely to generally improve performance, to provide an option to use it, either in the existing linux-vfs or in some other way.)
> I'm personally interested in the potential for async bindings for Python.
Well, it's perfectly possible to do that with the current sqlite. It may be painful, as you say, but not even remotely at the level of pain a complete rewrite entails.
The VFS interface is synchronous, I don't see how a custom VFS could meaningfully implement asynchronous IO.
> Well, it's perfectly possible to do that with the current sqlite.
If you want to wrap a blocking API in python, with actual parallelism, you have to use multiple processes with communication between them. The main advantage of sqlite in the first place is that it's in-process, and you'd lose that.
On a single thread. There can be multiple threads.
Of course leaving a thread idle while waiting for IO isn't great. That's why I noted it at the beginning. But it doesn't seem idling threads has proven to be much of a problem with sqlite, so it wouldn't be much justification for a rewrite.
> If you want to wrap a blocking API in python, with actual parallelism, you have to use multiple processes
You can use multiple threads in the same process.
(Python has some limitations in that respect, but that's not a sqlite issue and can't be fixed by a sqlite rewrite.)
SQLite is in the public domain. It is perfectly legal to create a derivative works from a public domain project and license it however you want. It's not cool and kind of a dick move to put it under a more restrictive license, but it's legal.
It's not a dick move if you are making legitimate improvements -- especially if you still reference the origin. That's literally the idea behind public domain
However that only really works for people who are satisfied with SQLite's public domain licensing. If you are in a jurisdiction that doesn't allow you to dedicate a work to the public domain and are worried about the SQLite developers suing you for infringement at some point, Limbo holds the exact same risk of SQLite suing you.
So many good things with incremental improvements in the space, but as a consumer it kinda stresses me out having to worry about libsql vs sqlite vs duckdb etc.
I personally use SQLite and DuckDB daily, but recently adopted turso in lieu of litestream for a something. I appreciate that they all are relatively compatible but I'd love to just have a tool.
Even then thats why I love the relationship between SQLite and DuckDB. I can backend my system with SQLite and run analytics and processing via DuckDB and they service specific purposes.
The hard thing with this for me is being a split consumer and not having the bandwidth to split my attention between who is doing better innovation and just using a tool I can rely on to predictably get the job done for me.
That being said, hats off this is awesome. I really appreciate turso.
> To complete the puzzle, we wanted to deterministically test the behavior of the database when interacting with the operating system and other components. To do that, we are partnering with Antithesis
Are there any open source DST projects, even just getting started? I don't even know how/where to start if I would want to do the same on a small app, but can't afford nor want to depend long term on a commercial license.
That would be a really cool feature! I've been running sqlite3 in async python for Datasette for six years now but it involves some pretty convoluted threading mechanisms, having native async would be fantastic.
They mention testing that bytecode generation generates the exact same results as SQLite... Does this exclude writing new optimization passes that are not in sqlite?
One killer feature I miss from SQLite is table compression. Especially important on various embedded devices where you collect data from sensors or logs.
20% faster for some operations now, but with only a small subset of SQL implemented. Sound like making it as fast as SQLite will be hard with full compatibility given that I expect a lot more branching etc?
disclosure: I work here. I am happy to answer any questions
tl;dr We are rewriting SQLite in Rust. It uses Asynchronous I/O, considers WASM as first class, and has Deterministic Simulation Testing support from the beginning.
Public domain rights differ across countries. So someone could put a work into the public domain and sue you in the one or two major countries where copyright can’t be disclaimed by the original author without a license, or where there isn’t legal precedent for it (this is purely theoretical and has never actually occurred before, it would also not be likely to succeed if it did happen).
Also, it is technically possible for someone to claim public domain software as entirely their own work, while MIT requires attribution.
Do you honestly believe any of the SQLite authors or its millions of users (especially deep-pocketed companies like Apple) lose any sleep at night over this?
As of now it has a single writer, same like SQLite. But we plan to add MVCC with multiple writers in the future. Pekka has experimented with MVCC earlier: https://github.com/penberg/tihku
To me, a "hard" fork is one where you plan not to maintain any compatibility with your upstream and not to share any future code in either way (neither from original to fork nor back). "Soft" forks often retain a degree of compatibility, and some future developments can be shared.
(In this case, since it's a rewrite in Rust, it's not actually a fork at all, I think)
SQLite is open-source but not open-contribution – they don't accept contributions of that sort. They follow "cathedral" style development and invented a whole alternative to git for that purpose https://fossil-scm.org/home/doc/43c3d95a/www/fossil-v-git.wi...
> In order to keep SQLite completely free and unencumbered by copyright, the project does not accept patches. If you would like to suggest a change and you include a patch as a proof-of-concept, that would be great. However, please do not be offended if we rewrite your patch from scratch.
As other commenters have already pointed out, SQLite does not take outside contributions.
We already have a fork, called libSQL. However, the goals of Limbo are far more ambitious and we cannot rewrite some parts step by step. We want to have DST ( Deterministic Simulation Testing), a testing methodology pioneered by Foundation DB and TigerBeetle. It is not easy to do that in an existing codebase
what do you mean with "on the main branch"? i doubt that migrating from c makes sense for their constraints and expertise, you would not want someone to come into your house and change your furniture.
forking is the right political and technical approach for this team.
also rust does not support a lot of sqlite target platforms
As others have said, the “Performance” section is asinine, because they haven't fully implemented 100% of SQLite. Not disclaiming this obvious fact in the “Performance” section is incredibly misleading.
I could trivially write “an SQLite clone” that could execute `SELECT * FROM users LIMIT 1` even faster than either this or SQLite—if that's the only string I accepted as input!
Is there any software that let me make graphical user interface to a connected database, allows me to make data visualizations, all things automatic and interactive?
Like a node editor or spreadsheet? It needs to be suitable for general public
I do see a need for multiple implementations of SQLite3. First there's the need for multiple implementations for the reasons given by the LibSQL folks, second there's the need for a memory-safe language implementation of SQLite3, and third there's the need for a native language implementation for languages whose runtimes really want not to have C involved (e.g., Go).
"The specification reached an impasse: all interested implementors have used the same SQL backend (Sqlite), but we need multiple independent implementations to proceed along a standardisation path."
Ok I understand that, but why wasm size? Local-first works pretty well for a shared library, and it can already be compiled to wasm. But you're predicting based on the _size_ of the wasm bundle being the determining factor which is a really interesting opinion so are you able to explain that?
I could understand if you said 'the fastest' or 'the safest' but 'the smallest' is what I'm hung up on
I'm biased for sure, but the biggest thing keeping me from using sqlite or pglite in the browser is the size of the WASM payloads. They dwarf every other part of a well designed app, at least for the simple things I like to build.
Ah I see! If your use case was read-only you might be able to cut out some of the binary size, but it sounds like Web SQL would've been what you need unfortunately
> Executing cargo bench on Limbo’s main directory, we can compare SQLite running SELECT * FROM users LIMIT 1 (620ns on my Macbook Air M2), with Limbo executing the same query (506ns), which is 20% faster.
Faster on a single query, returning a single result, on a single computer. That's not how database performance should be measured or compared.
In any case, the programming language should have little to no impact on the database performance, since the majority of the time is spent waiting on io anyway
The goal here is not to claim that it is faster, though (it isn't, in a lot of other things it is slower and if you run cargo bench you will see)
It is to highlight that we already reached a good level of performance this early in the project.
Your claim about the programming language having no impact is just false, though. It's exactly what people said back in 2015 when we released Scylla. It was already false then, it is even more false now.
The main reason is that storage is so incredibly fast today, the CPU architecture (of which the language is a part) does make a lot of difference.
Yo glommer, I am ... very surprised to see any benchmark beat the micro-tuned sqlite so early, congrats. Where do you think rust is picking up the extra 100ns or so from a full table scan?
> It is to highlight that we already reached a good level of performance this early in the project.
This is the right thing to do. It's a pity so many projects don't keep an eye on performance from the very first day. Getting high performing product is a process, not a single task you apply at the end. Especially in a performance critical system like a database, if you don't pay attention to performance and instead you delay optimizing till the end, at the end of the day you'll need to do a major rewrite.
thanks. I am sad, but not that surprised, that a lot of people here are interpreting this as we claiming that we're already faster than sqlite all over.
I don't even care about being faster than sqlite, just not being slower, this early, is already the win I'm looking for.
> In any case, the programming language should have little to no impact on the database performance, since the majority of the time is spent waiting on io anyway
That was true maybe 30 years ago with spinning disks and 100 mbit ethernet.
Currently, with storage easily approaching speeds of 10 GB/s and networks at 25+ Gbit/s it is quite hard to saturate local I/O in a database system. Like, you need not just a fast language (C, C++, Rust) but also be very smart about how you write code.
> In any case, the programming language should have little to no impact on the database performance, since the majority of the time is spent waiting on io anyway
may be! However, Rust makes some things easier. It is also easy to maintain and reiterate
C libraries aren't automatically the fastest option, there's a lot of C code which has stagnated on the performance front but is still widely used because it's battle tested and known to be robust by C standards.
There's still an element of truth in the idea that C is going to be faster by default. There's simply a much lower bar to writing fast (and unsafe) C. Fast Rust demands considerably more thoughtfulness from the programmer (at least for me).
> There's simply a much lower bar to writing fast (and unsafe) C.
That's kind of my point, writing a faster PNG decoder in C may be easier for you but convincing anyone to actually use it instead of the slower but proven safe-ish libpng would be an uphill battle. Trust in C code is extremely hard-won compared to Rust which uses little if any unsafe. The 'png' crate that Chrome is considering to replace libpng has no unsafe whatsoever and is still faster.
> Fast Rust demands considerably more thoughtfulness from the programmer (at least for me).
While fast code requires thoughtfulness regardless of the language, I think rust lets you focus on the fast aspect more because rustc ensures _some_ safety and correctness.
I can write fast and very unsafe C code fast, but I write code that just as fast , but safer in rust faster than in C.
It's entirely likely that you could write faster Rust in the same (nigh-infinite) time it'd take to write equally safe C. I intentionally avoided that comparison though. If you take a normal 10min function in C, it's going to compile into something reasonable and run fast. If you take the same 10min rust function, the language surface area is so much larger that there's a much higher chance that it won't.
Here's a more concrete, albeit irrelevant in practice example from writing most things in both languages:
Implemented in Rust over generic T, you need Wrapping<T> or the equivalent num_traits traits. The implementations for these take borrowed references. Rustc is pretty good at ensuring this becomes pass by value under the hood, but it's imperfect. I found instances of it failing in the test disassembly, even though an implementation for this never has to touch anything but registers. That's performance work that wouldn't have existed in C/C++ for these particular types.
be careful with that, though. In a lot of ways it is still slower.
The goal with that was just to demonstrate that there's nothing really there that is fundamentally slower, and the perf is already on par in the areas where we spent cycles on.