SQLite is a wonderful database. We use it in production many times over for all of our clients. Not having to worry about whatever arbitrary SQL Server installation is available in a particular environment has saved us so much time and frustration. Combine SQLite with .NET Self-Contained Deployments means that we (our automated tools) now copy our software distribution zip to target, extract to path, run executable as administrator and walk away. Without SQLite we could not do this.
Migrations are also hilariously easy with SQLite if you use the user_version pragma. We have a monotonically-incrementing number to indicate each version. Our migrator first queries this upon startup and runs all the migrations that exist between current user_version and whatever our highest sequence # is (defined as a constant in code). Some might argue that you should just use EF (or whatever language-specific ORM), but I prefer the level of control and the substantial performance boost you get with raw SQL and a simple migrator approach. I cannot speak for EF Core, but EF6 performance was absolutely abysmal in some more complex cases.
I would also say that performance might surprise you if you turn on WAL. I have saturated NVMe disks inserting lots of data into SQLite. The trick is to use a single connection and let SQLite handle the serialization for you, rather than try to manage multiple connections to the same file in code (i.e. one connection per logical operation sucks for SQLite).
I didn't know about user_version or application_id either until just now. I think I'll start setting application_id. Probably not registering it in the SQLite repository - there's no procedure to do that at the top of the file and it looks like almost no one does. I'll just pick a random 32-bit int and assume I don't collide.
I don't think I'll use user_version though. I prefer my current solution: a version table with multiple rows. One for initial installation and one for each schema upgrade with timestamps and the software version that performed the operation. The current version is just the max. If my migration procedure goes badly, I have some clue what happened.
If my application supported schema downgrades, I'd want to have a separate operation_id and schema_version; the max operation_id would win rather than the max schema_version.
application_id is really meant for scenarios where the SQLite database is treated as a document - a file that is intended to be copied around by the user, and opened by the same or another app elsewhere.
According to their mailing list, the way to register your own ID is to submit a patch that adds your ID to the file.
I tried to do that, but maybe the message didn't go through.
Some of the application IDs are made up using hexadecimal (such as Fossil), and some using ASCII (everything else listed in the magic.txt file in the SQLite repository). I have used base thirty-six; I later saw RADIX-50 mentioned, which would be another way.
We also do something very similar but our version table has two columns: module_name and version. Instead of doing central schema management like in a usual monolith, Every module manages its own schema.
I came here to say this. It succeeded because it's an extraordinarily high quality piece of code!
The 'lite' is even a bit of a misnomer. Properly tuned I've seen 1TB+ SQLite databases work just fine. It doesn't scale horizontally with CPU cores as well as a "real" database but it can perform very well with large data sets if there's only one or two writers.
There is actually no "lite" word in SQLite. Think of it as SQL-ite. The "ite" is a deliberate connotation to rocks/geology i.e. granite, magnetite, etc. Make of that what you will. This is documented BTW on the SQLite docs.
If you want TB+, you are talking about multiple computers (in context of indefinite scaling).
For most who browse HN that immediately means you reach for Postgres (and you would be wise to do so), but I would argue you can continue to use SQLite indefinitely (one instance per node) if you manage replication and transactions between nodes at your application layer.
I realize this is probably beyond what most developers want to screw with, but I have seen some extremely promising results in experimental implementations of this scheme. Being able to replicate items conditionally based directly on business needs is a very powerful capability on paper - E.g. User session data must be synchronously replicated to all nodes, while other entities can by asynchronously replicated based on their use cases. You could handle replication and transactions based upon any business factor or state if you go down this path.
It’s not latency, it’s throughput. Broadcast and incast network communication patterns in the LAN kill throughput, and are present in any quorum protocol. Atomic broadcast protocols like chain replication or LCR which use strictly sequential communication patterns (via chain or ring topologies) have much better throughput in LAN environments than Paxos/Raft (except for Ring Paxos, which requires IP multicast). In addition, because they outsource group membership and failure detection to a separate (likely Paxos-based) service, they require only f+1 members to tolerate f faults, rather than 2f+1 (see Lamport et al’s Vertical Paxos work). But they have two main downsides relative to Paxos/Raft-like protocols: latency grows linearly with cluster size and is bounded below by the slowest node (so they aren’t a good fit for high-latency WANs), and as noted above, they can’t “self-host” like Paxos/Raft.
PS Apparently my previous comment outraged the HN hive mind, judging by downvotes. I find this hilarious and revealing given that all I’ve said is obvious to any actual distsys researcher (ask eg the HyperDex or FawnKV folks about why they chose CR instead of Paxos). And at one large cloud provider I worked at, this was taken for granted back in 2012. Apparently “distsys” pop culture has yet to catch up.
I should also mention that a great virtue of quorum-based protocols (for majority or basic read/write quorum systems) is their ability to mask tail latency/transient faults. That is difficult to replicate in sequential protocols and is a good reason to use quorum protocols in high/variable-latency and otherwise unreliable environments like geo-distributed databases.
I’m not sure how often Fast Paxos is really a good tradeoff, since in the LAN it has inferior throughput to the other protocols I mention below, while in the WAN >3/4 quorums expose you to additional tail latency, probably enough to negate the advantage of eliminating a single message delay if clients are distant from the cluster (see the Google SRE book for discussion).
> Some might argue that you should just use EF (or whatever language-specific ORM), but I prefer the level of control and the substantial performance boost you get with raw SQL and a simple migrator approach.
Agreed. I wrote a performance-oriented lightweight ORM on top of SQLite calls: https://github.com/zmj/sqlite-fast . Its focus is minimizing allocations - treating SQLite as something closer to an in-memory store than a network API.
>Some might argue that you should just use EF (or whatever language-specific ORM), but I prefer the level of control and the substantial performance boost you get with raw SQL and a simple migrator approach. I cannot speak for EF Core, but EF6 performance was absolutely abysmal in some more complex cases.
Look into Linq2Db [1] (Disclaimer, I contribute to the project a little bit.)
It's the perfect in-between for if you want LINQ like abstractions but don't want the overhead of an ORM, it's more like Dapper than, say, EF or NHibernate.
Best of both worlds: I can quickly prototype a design using SQLite, and then flip it over to SQL Server/Postgres by changing a config string.
EF Core is somewhat improved from EF in some ways (It's faster, often as fast as Linq2Db or Raw SQL) but has moved backwards in others (lots of stuff still doesn't work right, more importantly things that -used- to work in EF6). IMO EF Core tries to do too much; It tries to give the same API to interface with both Relational and KV stores, and results in not being great at either.
We exclusively use Dapper for all database access. I have zero problems with a few string constants in my services that contain the requisite SQL statements for managing my business entities against the database. It's really nice having such a consolidated context where your SQLite implementations are just single partial classes tacked onto the services you are providing persistence for.
That is one other point - SQLite has enabled us to do some really deep consolidation of our codebase. We generally use one SQLite database per business entity or abstraction. Users.db, Logs.db, Sessions.db, Settings.db, etc. It allows for a truly 100% independent vertical stack (aside from coupling against business models) for purposes of developing various pieces of business functionality. This is, in my view, is the perfect ideal of 'microservice' architecture. One big folder called services, one class per 'microservice', one SQLite database per class, one collection of mappers per class, etc. I honestly feel like we have settled on the perfect grain for our codebase structure (at least in C#/.NET Core land). The benefit in our world is that instead of HTTP-based RPC, we are doing direct method invocation throughout. We also use Blazor in order to extend this concept as close to the screen as technically feasible.
Thanks for reminding me about this, just taking a quick look at it now. Quick question: do you know why it uses methods like db.InsertWithInt32Identity(product) rather than db.InsertWithIdentity<int>(product) or similar?
which is also potentially confusing if you're trying to glance at the code.
But wait, some might ask, Select has two types, how come it doesn't have this problem?
public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
But as the signature shows, the return type is based on the expression passed as a parameter, which is why you don't have to do it there. :)
Edit: That said, there's facilitation for inserting with granular control over columns.... which is when I stopped waiting for EF to get it's act together. :)
Thanks, I really appreciate the reply and hadn't thought about those side effects. I then thought "why not derive the identity type from an attribute on the identity property of the table class?", but then you have a dynamic return type, so you either return an object needing a cast, or it leads you back to the same problem you identified.
Inserting rows with ~500KB blob columns (JSON data) on a single connection w/ WAL enabled & pragma synchronous == NORMAL or OFF.
I will admit that it would be very difficult to saturate a disk that can write on the order of gigabytes per second if you are trying to insert rows that are only a few KB in size (at least without batching these into larger entities or requests).
I know SQLite's SQL query engine is implemented as a VM. Are there ways of optimizing repeated queries, similar to compiling a query to avoid the VM churn? Perhaps more importantly, have you ever been disappointed by the query engine's performance?
You can prepare a query to use multiple times (if you are using the C interface, at least; as far as I know the command-line interface does not support this), as another comment mentions. You can also prefix a SQL statement by EXPLAIN to view the VDBE codes. But, I see that sometimes the VDBE code is not as good as it should be.
You can put each query and the result of preparing that query into an LRU cache (or even just a static pointer in C if your application only ever deals with a single connection). Then you can reuse these without preparing them over and over.
It is possible, but makes is somewhat impractical, because you need to convert numbers back and forth every time you touch them on an application level. Not worth the extra headache.
Speaking from experience, if you exclusively deal with integers internally and then apply the appropriate formatting when outputting to the user (which you need to deal with regardless of internal representation), it makes dealing with monetary data so much easier. Integers are generally better supported, faster, and more convenient for most languages, databases, and serialization formats.
Floating point (up to a certain amount of absurdity) you can describe with integers (value, scale).
Real numbers you can describe with integers (numerator, denominator, scale).
Irrationals are more difficult. That's why we try not to use them :). Hopefully your company's commission tiers are not defined in terms of circles or euclidean distances.
Accounting rules often care more about consistency than precision per se. As long as the code follows accepted accounting rules (say bankers rounding) choosing a known precision works. Often GAAP and other standards don’t define all details for, say amortization, but if you use a scheme the code should always use that method.
Not insisting, and could be seen as a bad practice, but I think I have experience worth sharing.
Unless you operate 2+degree currency polynomials, it may be okay to store money in minimal integer units at an application level too. It may be adjusted in only ui formatters. One should be careful with constants in code and external apis, but nothing you can’t get used to or catch with a test. We store timeouts in milliseconds, weights in grams, and so on anyway - not a new thing.
As an example, the primary database I maintain for our company is able to hold decimal .00 natively, but we use its fields as if they were multiplied by 1000, i.e. 12,345.00 is stored and operated as 12.35 (and goes as int64 1235 to ODS) (rounding 5.00 off is fine for this particular accounting model). Nobody made a three-digit mistake ever in a whole decade, because square/cubic money is not a thing naturally, and ratios & exchange rates work as usual. It is like storing cents, but the other way round.
Storing in-app amounts in floats has a downside of accumulating an error in for-looped calculations and the need for periodic rounding (although it should run on a really big dataset to collect that enough). If a language has a money type and it can be programmed accordingly, then a problem disappears. If not, then it persists either way in different forms.
I dunno... the only alternative seems to be storing as a float everywhere, since most coding languages don't support a decimal type... and using floats for currency honestly just feels scary to me, since they're not fixed-precision.
I mean I know there are min/max ranges in the float specification that can guarantee no loss of accuracy to two decimal places... but then I need to manually make sure there's never anywhere in the code that might ever sum up transactions to beyond that, or whatever. Or people thinking it's OK to divide and then multiply, which you're hopefully less likely to do if it's an integer.
I feel much more peace of mind storing currency data as integers and doing the conversion. It feels quite natural really, since I'm already used to doing that with datetimes for different timezones.
Python has Decimal, JavaScript has BigNum and so on. I disagree with you - most languages support arbitrary precision numbers. Floats are not needed or wanted.
You so not want to invent your own arbitrary precision numbers
Applications can handle integers just fine, even more of a reason to use them. 64-bit integers will work everywhere seamlessly and you only need to convert once, when showing them in the UI and/or collecting user input.
> Integers have complete precision, but very limited range
You can store up to 900 trillion dollars with 0.01 cent resolution in an int64. I doubt many applications need more range than that.
> when [integers] overflow, they usually “wrap around” silently
If you want to write an application handling money, you really want to use a Money type of some description, regardless of whether the underlying storage format is an integer. So it's very easy to avoid silent overflows, since basically all languages that are prone to them also offer primitives that check perform addition, subtraction and multiplication with checks for over/under flow.
> The implicit decimal point is hard to change and extremely inflexible
If you have a Money type, it's not hard to change. The pain point would normally be serialization, but that's not necessarily different from using decimal floating point internally, since very few APIs, serialization protocols or even databases have native support for fp decimals either.
> The number of subunits in currencies change with time in the world.
I suspect Goverments are equally or more likely to drop trailing zeros (e.g. Iraq did not so long ago), which you also need to deal with anyway and which does not get easier by using a floating point format.
You know its strange - We work with an incredible number of decimal facts (financial data), but we find that this has zero impact on our SQLite usage because we have policy where extremely complex business entities that are prone to rapid change are serialized to JSON and stored as BLOB columns. Within this JSON serialized data is where virtually all of our decimal values reside. Obviously, its not a solution for any arbitrary case, but you can usually find workarounds in interesting places further up the stack. We use only 3 data types in SQLite - BINARY, TEXT and INTEGER.
There is no BINARY type in SQLite. (However, the declared type mostly does not restrict the data that column may store. All columns can store blobs and nulls (unless NOT NULL is specified). Columns declared CHAR, VARCHAR, CLOB, CHARACTER, or TEXT can store only text, blob, and null; trying to store a number will convert it to text. Columns declared REAL, FLOAT, or DOUBLE cannot store integers; they are converted into floating numbers; if you try to store text representing a number, it is converted to that number. Columns declared INT, INTEGER, NUMERIC, or an unrecognized type (such as BINARY) can store any value, but text representing a number is converted to a number, and floating numbers which can be losslessly represented as integers are converted to integers. Columns declared BLOB or with no type specified can store any value.)
In addition to the other workarounds suggested, you can store the numbers in two columns - one for the coefficient and one for the power of ten, negated. For instance 2.50 becomes 250 and 2, meaning 250 x 10 ^ -2. You can then do math at the application level, or even in SQLite with user-defined functions or even with recursive CTEs if you’re a bit of a masochist.
> The trick is to use a single connection and let SQLite handle the serialization for you, rather than try to manage multiple connections to the same file in code (i.e. one connection per logical operation sucks for SQLite).
It seems to me that in that case you cannot use transactions though, or am I wrong ?
For programming, the use-case for EF or expression-tree queries is the specification pattern.
Although you are correct that it doesn't handle complex queries very fast ( the ORM), there is definitely a valid motivation to use/enforce it.
Complex queries are mostly handled through something like elastic search or search engine ( azure), the goal is to keep queries simple and if the code is loosely coupled, it's easier to do this.
There's a level of write-heaviness after which SQLite, even in WAL mode, is the wrong choice.
But one can get pretty far with this design. Thousands of writes per second from a few dozen writers, no problem. Millions and hundreds, I'd use something else for that.
I remember listening to this when it was recorded and I still remember details from it years later – probably my favorite Changelog podcast ever. It's obviously very foolish to make software choices based on how much you like its creator as a person. But Richard Hipp comes off as a guy so likable that goshdarnit, I hope his little project succeeds. Jokes aside, though, I don't think it's entirely coincidence that SQLite, something that is so good and reliable, was made by someone who seems so conscientious and thoughtful.
Though I have to admit, I was and am still disappointed to learn that the official way to pronounce SQLite is "S-Q-L-ite", i.e. "like a mineral".
Conscientiousness and empathy are rare commodities in software development. It's difficult for somebody to get out of their own head and consider a problem fresh, and it's not always personally rewarding to do so. It's also expensive, if you want to back it up with user testing.
You see it, even, in ostensibly public-consumption open-source projects, where the goal is clearly not merely to scratch one's own itch but to market the project for uptake for this reason or that. I feel like SQLite is a great example of what happens when a team is able to foster the kind of empathy to really understand how the user wants it to be.
I think there's a case for disputing the official pronunciation because it's based off an "official" pronunciation of SQL. I've heard it ess-queue-elle and sequel, but personally side with sequel. It was originally called “Structured English Query Language”, abbreviated and pronounced "SEQUEL"; later shortened to "SQL" but same pronunciation. So I figure sequel-ite is a perfectly legitimate pronunciation.
That's how I do it. I understand the etymological reasons for "S-Q-L-ite", but SQLite is just too ubiquitous in my everyday work for me to speak 4 syllables when referring to it.
PostgreSQL is the fun one out because they officially say it is Postgres-Q-L.
The "it's always S-Q-L and never `sequel`" is IBM's fault and an early trademark infringement issue in computing. (IBM was told it couldn't call it "SEQUEL" by an aircraft company.)
For anyone curious about the trademark infringement, I hunted down the excerpt from the book that Wikipedia uses as a source (a book called Databases Demystified) and this is what it says in the book:
>> The forerunner of SQL, which was called SEQUEL (for Structured English Query Language), first emerged in the specifications for System R, IBM’s experimental relational database, in the late 1970s. However, two other products, with various names for their query language, beat IBM to the marketplace with the first commercial relational database products: Relational Software’s Oracle and Relational Technology’s Ingres. IBM released SQL/DS in 1982, with the query language name shortened to “SQL” after IBM discovered that “SEQUEL” was a trademark of the Hawker-Siddeley Aircraft Company. When IBM released its next generation RDBMS, called DB2, the SQL acronym remained. To this day, you will hear the name pronounced as an acronym (S-Q-L) and as a word (see-quel), and both are considered correct pronunciations.
I was hoping for a bit more interesting or detailed story.
I've heard that, but assumed it came from splitting up the name as "Postgre"+"SQL"; if you leave off the SQL part you get a word ending in E, and "postgreh" can't be right, can it? :)
Which is why I've also heard developers that either assumed the 'g' was silent or a transposition problem and you get "poster SQL", "postreg SQL", or worst of all "posgret SQL".
Somewhere, I believe in an HN comment, I saw a Postgres developer say that one of the biggest regrets of the project naming was capitalizing that "shared S".
The norm is to pronounce it by how likely you think you might get yelled at for trademark infringement by a holding company of the assets of an old aircraft company: https://en.wikipedia.org/wiki/Hawker_Siddeley
The name confusion is all IBM's fault.
> Chamberlin and Boyce's first attempt of a relational database language was Square, but it was difficult to use due to subscript notation. After moving to the San Jose Research Laboratory in 1973, they began work on SEQUEL. The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark of the UK-based Hawker Siddeley Dynamics Engineering Limited company.
I looked it up, you're right, it is officially my-ess-que-ell. This leads to another question, how do you pronounce "NoSQL"? I couldn't find a convincing answer on the web. I used to say "sequel" for everything SQL related but clearly I was wrong.
One of the many reasons I love sqlite and redis are because of the attitude of the developers. Sqlite has just an uncanny amount of testing, I wonder if it's the most formally tested piece of user software (outside of formally proven software and the space/medical industry I suppose).
And Salvatore (redis) has a way of simplifying issues that gives me a great deal of confidence in the decisions he makes and the code he writes. I really like the code and reasoning behind his 'linenoise' library in particular
> Jokes aside, though, I don't think it's entirely coincidence that SQLite, something that is so good and reliable, was made by someone who seems so conscientious and thoughtful.
The same could probably be said about Redis and Salvatore Sanfilippo.
One thing that I wish more people realized, especially ones developing native apps, is this: unless you really need an specific file format for your application, don't design your own. Just use SQLite. You get so much out of it (including features like undo), as well as a huge toolkit.
Even Apple, home of not-invented-here, built their "Core Data" object storage API (which is extremely prevalent throughout iOS and macOS apps) using SQLite as a backing store.
And you lose the ability to quickly change settings in a text editor, to have multiple settings files, to have tiered (system, local, project) settings files, to version them in source control, etc. Instead of learning ini, josn and xml you'll have to learn whatever schema project x uses and there will be more schemas than file formats. Plus SQLite might be fast, but it's still a lot more overhead than parsing an ini file.
> unless you really need an specific file format for your application
It's all about choosing the right tool for the job.
There's great reasons to use xml files over SQLite, but that mostly comes down to needing a file to be human readable versus file size.
In the product that I work on, I moved very small tables out to xml files because it's easier to inspect them in a text editor instead of expecting non-developers to figure out a database. These files are also small enough and are written infrequently enough that any performance difference between xml and SQLite is negligible.
The vast majority of data is in SQLite. An older version of the product would always write out a giant xml file, but when the xml got big, performance suffered. In this case, SQLite is the correct choice because it can make a small change to the database without needing to rewrite the entire file.
You don't lose the ability to have multiple or tiered settings files - there's nothing preventing you from using as many SQLite database files as you want.
The other points are valid, but not everything can be shoehorned into JSON or XML. Sometimes the data size is too large for the read-everything-into-memory approach. Sometimes you need to deal with a lot of binary blobs. And so on.
Transaction rollback is probably what he meant. If that is a familiar topic to you the relation to undo should make sense and his point should be clear. If that doesn't make sense, look up ACID, Transactions, and Migrations in the context of a database.
I had a different interpretation. It's not uncommon in commercial code to use a sort of copy-on-write behavior where every edit creates a new row and marks the other one as dead (for instance, filling in or blanking out a column to mark it defunct). You leave an audit trail behind.
Which, with a little bit of work, can be used to revert to earlier versions.
While everyone's praising SQLite, I want to note that SQLite type system is absolutely horrible. When you can store string in the integer column, you don't have a type system. And it's absolutely not intuitive for people having experience with other databases.
I finished the podcast a couple hours ago. It was pretty clear to me that Hipp has the perspective of a person working on small teams, where most of the changes to the system either don't affect you or are highly visible.
This is not a man who fears the existential dread of discovering, just after you've stated loudly that some part of the system could not possibly break that way, that some other developer has pulled the rug out from under you when you weren't looking.
The odds that I add the string "52" to a column I know damn well has integers is pretty low, but non-zero. And I'm gonna transpose some fields eventually. The odds of any particular coworker doing the same is some multiple of that. But q^n across my entire team, across months and years, across new hires and people switching into and out of the team? We are so deep into Murphy's Law territory that it's almost a certainty.
It's a fool's errand to try to prevent absolutely every one of those scenarios happening. All I ask is that we have a way to discover them early and cheaply.
Hipp talks about that in the interview a little bit:
> SQLite really started life as a Tcl extension, Tcl being the programming language, the Tcl/Tk. The project I was working was working on was written in Tcl/Tk and so SQLite began as a Tcl extension and as a scripting language, like Perl or Python, where you can put any type of value you want in a variable. So a variable might hold a string, a number, a byte array or whatever. So I made SQLite the same way, where just because you’ve declared a column of a table to be text doesn’t mean you can’t put integer in there, or just because you declared a column in the table to be a short int, why not put a 2-megabyte blob there? So what? It’ll do that.
> Or if you have a column that’s declared integer and you try to put text in it, it looks like an integer and it can be converted without loss. It will convert and store it as an integer. But if you try and put a blob into a short int or something, there’s no way to convert that, so it just stores the original and it gives flexibility there. And this is useful in a lot of cases, because sometimes you just have a miscellaneous column in a table that you might need to store lots of different things in. And in traditional database systems you actually have to have multiple columns, one for each possible data type, whereas in SQLite you put it all in one column. So it works well.
> And for that matter, with SQLite you don’t have to give the column a type at all. You can just say, CreateTable T1 (a,b,c) and then you’ve got a table with three columns named a, b and c and you put whatever you want there.
> Well, it flows directly out of the scripting language traditions. You don’t declare types for variables in Tcl; you didn’t used to do it in Python, I guess you can do it some, now. You don’t do it in JavaScript… You just say it’s a var.
This apparently has roots in some ancient practices (Tcl is not exactly young). In 21th century, we likely don't need to save a few bytes and reuse a variable, and can afford to preserve our sanity instead. Well, we can even afford a built-in ACID-compliant SQL database!..
SQLite's quirks page [1] actually suggests the same thing:
> In retrospect, perhaps it would have been better if SQLite had merely implemented an ANY datatype so that developers could explicitly state when they wanted to use flexible typing, rather than making flexible typing the default. But that is not something that can be changed now without breaking the millions of applications and trillions of database files that already use SQLite's flexible typing feature.
I wish they had done this in the first place. I wish even now that they implemented a pragma to enforce types (and introduce the ANY type for folks who explicitly want it). But SQLite has enough good things going for it that I'm happy to use it in spite of this problem.
Seems like (1) is doable, as long as it's opt-in. But if it becomes the default behavior, existing applications would break when they do something against the type system. I'm not sure what (2) buys you.
Forgive me, I was confusing strongly typed with statically typed. My point still stands though, for someone used to a dynamically typed language like Python having database columns that aren't locked to a single rigid type will seem quite natural.
It doesn't though. There are almost no dynamically typed languages where you supply a type yet don't enforce it, maybe forth is the closest with its stack effect comments.
If you give me a type, I do expect a type, if you omit the type fine, I'm gonna assume it'll be dynamic at runtime.
All these dynamic languages are also strongly typed for a reason, even javascript uses === everywhere instead of the weakly typed ==. The bugs that occur because of automagic are just ridiculously hard to track down, and even statically typed languages can fall into this pit, for example Scala with its implicits shudder.
But SQLite does the worst of all worlds, you have to specify a type, yet you cannot rely on it, it sometimes does type conversion for you e.g. int to string and back, and sometimes does not.
They should've just implemented a single ANY type or have the syntax not use types at all and deviate from the SQL spec (which they do a lot anyways).
But I have also run into hard problems using it. Deleting and renaming columns is not supported natively. On the internet one can find procedures to do that but they generally do not take foreign key references into account. Getting this right is a hard problem. In the end we had a function that could be used to rename or delete a column even without switching off foreign key checking but it was not simple and it was not the first iteration of said function and earlier iterations caused lots of subtle problems, like a column having a default value or not depending on whether the database was a new database or was created using schema updates.
Not disagreeing with your overall point about SQLite's feature set, but IIRC, `ALTER TABLE RENAME` was added in 2018, for version 3.25.0: https://www.sqlite.org/releaselog/3_25_0.html
(I chime in only to remark that I myself am constantly surprised that SQLite continues to add these kinds of features – e.g. WINDOW functions were also in 3.25 – long after I had accepted their absence as a fact of life)
Yeah, its a pity it doesn't have the ability to do anything more than alter table statements that add to a schema, having to rename the table, create a new schema and copy its contents from the old table can be quite a tricky and troublesome thing to do if you've got a live system running with users on it.
I use EF core and instrument all my queries - if I see a slow one I hand optimize it. Best of both worlds I think - and I say that as someone who loves to fiddle with SQL.
Also worst of both worlds - you need to know very well both the underlying database (SQL) and also the complex leaky abstraction (EF Core) and how are those abstraction translated into the SQL.
I'm no ORM hater (use Hibernate daily) but claiming that using ORM will simplify working with the database is at least misleading.
For some reason people think it's okay to be ignorant while using an ORM. What you are interacting with is still a database! Treat it like one and you will save a lot of time in the easy situations.
Pretty sure SQLA the ORM has nothing to do with schema migration, which is usually handled with alembic, the sister project. Not sure about Sequelize.
Now, there’s nothing magical about alembic’s handling of renaming or deleting columns for SQLite; it has to recreate the table just like what you would do if you’re hand rolling the SQL, so it has the same problems.
While on SQLite, Hipp's fossil-scm is also another great piece of work. My personal favourite to start projects since I can keep code, docs and issues all in a single .. SQLite file! fossil is also just a single executable with no other dependencies and comes with a built-in web server complete with user management if you want to expose to a small group.
The main and real reason I love fossil is that I'm always at peace knowing that I can't accidentally bungle up the history. I'm always nervous about that when I use git .. still.
> knowing that I can't accidentally bungle up the history. I'm always nervous about that when I use git
I just read a great insight yesterday:
> Understanding there’s no harm to be done eases things a lot. It encourages experimenting, which is one of the most effective ways to learn new things and to get familiar with them. Imagine this: you start rebase, mess things up, hit Ctrl+Z several times and try again. Good news is, we can implement Undo/Redo given just ordinary Git implementation. Git already stores all the possible states our repo was in. All Undo/Redo will have to do is to shuffle some branch pointers here and there.
You can still bungle history with badly authored commits, though. This is exactly why I like Git - no need to think about how messy the history is while working on something, I can create lots of small commits tweaking things as I go, and then clean it all up into larger commits representing logical groups of changes, with proper description etc.
I'm fine with my commit flow and commenting, which perhaps helps. I do use orphaned dirty branches. It helps that fossil's tags are reusable, unlike git's.
For me, there've been cases where I've worked on an angle, only to discard it for another approach .. and much later wanting to revisit that angle because it was a good idea. If I discard history to "clean it up", such a resurrection becomes impossible.
We used SQLite at my company to allow users to write SQL queries against their db. When we hit the limit of it, we had to switch to Postgres. That migration is quite difficult and I wish we had used Postgres from the start. 20/20 hindsight but that was my first thought.
For anyone wondering why: I was working on a hobby project a few years ago that had to do a lot of inserts in a SQLite DB. Performance was ok, but it wasn't great either. Turning on WAL greatly sped up performance. In both cases just a single thread was writing to the database.
With the WAL turned off (the default), when you start a transaction SQLite will duplicate the data into a new file, change the original file, and delete the backup file when the transaction is committed. With WAL turned on it will write each change to a journal file, and merge this into the main file periodically (by default when the change file grows to 1000 entries) - so only a single write is needed.
Were you using an ORM? I ask because most people use database switching as a selling point for using an ORM. I'm rather indifferent on the matter, but I'm curious.
Ha! This selling point is just one in a miserable litany of poor reasons to use an ORM - you absolutely cannot simply switch between databases without doing some work to ensure that the data is migrated correctly and the SQL statements translate properly.
That's true in theory, but unfortunately you can still run in to issues when different databases support different parts of SQL, and the db you're migrating from has different features than the one you're migrating to.
There are a huge number of differences in the SQL supported by different engines. You can't just switch from one to another, unless you're only using a small subset of SQL to begin with.
I'm glad something like SQLite exists. It's awesome if you want to store data in your applications without falling back to a full SQL database. I am also thankful for those client applications like "DB Browser for SQLite" [1]. It nearly provides the same functionality like SQL Management Studio, which is quite an achievement.
The only gripe I have with SQLite is with full text search FTS5. The last time I tried to store and retrieve text with associated metadata from another table was really confusing. It kinda worked in the end, but then I also realized that you can't do a contains-query on your text, just starts-with and ends-with [2]. This is a major flaw in my book. So many times I just want to dump tons of text into a table and want to query single words naturally, but I can't do that without resorting to hacks or unoptimized LIKE queries. And if I need to take into account typos or similarity, I'd have no idea how to do that with SQLite. Does someone have experience regarding this?
SQLite is a brilliant lightweight database which is available everywhere. All mobile OSs support it and it's a breeze to run on even embedded Linux. I just wish web browsers would also support it and make webSQL a standard, it would make programming for the web so much easier.
Once I discovered SQLite, I never went back. It’s so easy. I’m still shocked when I see docker compose files that spin up entire MySQL dbs when a simple SQLite connector would be so much easier!
Actually, it was about to become standards. Some browsers implemented it, but then they decided not to go ahead as all browsers will have the same SQLite implementation.
It is weird in the hindsight as with just Webkit(which is most of blink still) and Gecko, there are just two engines left. And when Gecko goes down (which is losing market share), this would seem even more stupid. SQL is a standard. They could have standardized on it rather than an implementation.
SQL is a standard, but SQLite doesn't implement "standard SQL", it implements "SQLite SQL".
For better or worse, the web has always been driven by specifications, rather than implementations. Properly standardizing all of SQLite in a spec is not trivial, and as much I think SQLite is great software, I don't think that relying on SQLite the library specifically would be a good idea. What if the latest version of SQLite does something that's not in the spec or prohibited by the spec?
Either way, there's IndexedDB now, which is supported by almost everything, including IE.
> For better or worse, the web has always been driven by specifications, rather than implementations.
Surely you're joking ? Having things done by specifications first has been extremly rare, pretty much everything rather followed the "make the specification after the fact to be whatever X browser has done and is used". Protocols, markup, css, javascript, the whole thing was done by implementation.
But IndexedDB has been struggling very hard from day 1 to be as fast and performant as WebSQL. You can do the benchmark yourself [1]. Try inserting 10000 records. WebSQL is by orders of magnitude faster than IndexedDB.
The people who rejected WebSQL proposal had a lack of decisiveness and technically-focused vision. They also interpreted everything very politically.
> For better or worse, the web has always been driven by specifications, rather than implementations.
What? The specifications document what browsers already do. Have you seen the HTML5 spec? It's mostly formalization of error-recovery, showing how to interpret previously technically-broken documents in the way modern browsers have converged on. It's not what anyone would design, but there's no one driving.
Sure, but the requirement is there should be at least two independent implementations (as there is for the error recovery in the HTML5 spec). This was never going to happen with SQLite.
But the policy is to only accept a standard where there is at least two independent implementations. This was realistically never going to happen with the SQLite.
Something I see some developers do which I don't recommend is using SQLite for testing environments when they're using PostgreSQL in production. SQLite3 doesn't support a lot of types that PostgreSQL does and it doesn't support things like full outer joins. Save the headaches of debugging in production and spin up a docker PostgreSQL instance since testing environments should emulate production environments as much as possible.
Yes, I often see Java developers using an embedded H2 database [1] for development and test, and a "real" database in production, and there are always unexpected issues of false-positive tests and functionality due to this. Terrible anti-pattern. I am glad "Dev/prod parity" is recognised in The Twelve-Factor App manifesto [2].
I do what you suggest: run dev/test against a real PostgreSQL (of the same version) running in a container. In fact, I take this further and run builds against an actual PostgreSQL, since I like to generate code based on the DB schema using jOOQ [3].
The build proceeds like so:
1. Spin up a fresh PostgreSQL container
2. Run my schema migrations against it
3. Run the jOOQ code generator against it
4. Compile code
This provides some additional guarantees, including:
1. My queries are fully type-safe from my code down to the database schema level
2. My schema migrations can be run against an empty database
I know some people with a certain generation of Macbook or macOS version have had bad luck running docker which causes their fan to go nuts and resources get drained so they prefer to avoid docker if possible.
The Micro-Orm I use in C# manages to work around this well enough; the only thing SQLite is really hard to work with is Time since there's no real structure defined for it.
>... they realized that if this is a critical part of their infrastructure, they needed to make sure my business was sustainable.
>So companies which are typically large companies that really depend on SQLite as part of their product, they just pay us an annual fee.
>... we are a 100% engineering shop.
An intriguing option for anyone sitting on an original infrastructure solution, and looking to turn it into a livelihood, with quality of life as first priority.
The interview transcript is much appreciated! I hate it when random two-hour podcast has a 15 minute interview I want to listen to, and have to go hunting for it.
Which is a shame because most podcast apps have chaptering systems where particular interviews and segments can be easily bookmarked. Even without that feature show notes can include links and timestamps to particular segments too but many shows don't bother because it's extra work.
I came to SQLite recently having used big iron enterprise tools for my analytical work previously (namely Oracle), and wow, what a lovely elegant tool. It isn't just super useful for web and embedded applications, it is quite a powerful and convenient analytical tool for quickly ingesting huge flat files and querying them.
If the people who did the original Macintosh "resource fork" updater had been serious about maintaining consistency of the data structure, that approach would have lasted longer. That was a really good idea, badly implemented because they had to cram it into 128KB and use a floppy. Unfortunately, it wasn't rewritten when the Mac line got more memory and hard disks.
Somebody had to get the open source world past using text files for everything.
AFAIK nomadlist.com by @levelsio is using SQLite in production. He has a bunch of data, plenty of users and his site is very fast. Proof that you can use SQLite in production on a successful project.
Serious question, how in the heck did the podcasters get their transcript? Did they pay someone to edit a machine translation? That is one of the coolest things I have never seen for any podcast!
I found this fascinating, as a student we are taught to use SQLite and then told to use PostgreSQL for production. I am told that a large amount of mobile native apps use SQLite. Does anyone have any stats on this? Seems like a nice choice, but the searches I have done are all over the place.
We used paged feeds[1] for our RSS, but not all podcast clients support it. I might end up going back to a full feed, but remove the show notes, etc from older episodes to keep the file size smaller...
I use Huffduffer[1] either by searching for someone who has added it already[2] or adding it directly with details and a link to the mp3 myself. Then add your huffduff feed to Overcast (can add based on tags also).
I've worked on a few internal tools for the company I intern at and I've always been told to use Postgres. Don't get me wrong, I love Postgres, but for an internal application that won't get any significant concurrent interaction, Sqlite is the way to go in my opinion. Maybe I'll try to mention it at the next meeting.
And you'll be told their production is using PostgreSQL.
Having identical environment makes life easier. I never run anything on my Mac despite many Unix daemons being available but either run it remotely on Linux or locally via Linux VM with same distro.
Is there really any other file based SQL alternative anymore?
Back in the day on Windows you would use the Jet (MSAccess) db for this purpose or maybe dBase before that in DOS. Many apps that need to store data would use them as it did not need a server process just a library.
I guess you have some NoSql options like LevelDB, but its really nice to have a full ACID SQL engine running in process.
I always thought it would be great if PostgreSQL could separate into a library for the DB with a server that uses the lib or you could use the lib directly in your process but I would imagine the process model is baked heavily into the DB engine.
My app had been running on CodeBase (dBase clone from Sequiter) for twenty-five years. When we converted to macOS, CodeBase wouldn't compile for 64-bit. Panicked, I turned to SQLite. Miraculously, we were able to switch in a couple of weeks. It is indeed a pleasure to work with. Especially with tools like DB Browser to poke around with.
We are currently working on IoT solution where we deploy many distributed computing units with leaf devices and SQLite works flawlessly to keep local state on the remote devices. There may be other solutions but this was a no-brainer for us and so far we haven't had any major issues with it.
Obviously because it's awesome in so many ways. The only thing I miss when using SQLite are implementations in high-level programming languages. In many cases (e.g. with .Net WinForms) using a native library is a pain while the speed native code provides is not always necessary.
I personally cannot see any near term replacement for what SQLite offers. In my world, SQLite became the way in which I now manipulate a scope of structured data on a single logical disk. I mostly view it as a very reliable and friendly layer on top of basic file IO which also happens to be compatible with virtually every computer on earth. I can copy a SQLite database out of an embedded microcontroller environment and directly open it on my x86_64 workstation, edit some values, and then copy it back down without any fears of compatibility issues.
Is it simple, though? I mean, reading about how they test it[1] leads me to believe it is not simple. Maybe it's simple compared to Oracle, and maybe using it is simple, but the actual implementation is definitely not simple.
I don't care about the implementation and I congratulate the developers on abstracting all of that complexity away from me. That's what good software does.
I misinterpreted your comment. I thought you were saying SQLite was successful because SQlite itself (i.e. the implementation) was simple, not because it was simple to use.
For what it's worth, I use "simple" for talking about the complexity of the thing, and "easy" for the user experience. Quite frequently easy things have to be complex because the cognitive load gets shifted into the machinery.
E.g., compared with a Model T, modern cars are much easier but much more complex because the car itself understands things like adjusting the ignition timing and safely getting the motor turning, things that a Model T user had to understand if they wanted to avoid getting a broken arm before they got out of the driveway: https://www.fordmodelt.net/how-to-drive-ford-model-t.htm
> I mean, reading about how they test it[1] leads me to believe it is not simple.
They are unusually thorough in their testing. I see that as a good thing, not a sign it isn't (relatively) simple. In fact, I don't think any other database engine is tested as thoroughly. I can't even imagine how much effort it'd take to do so for Oracle or even PostgreSQL, except that it'd be prohibitive.
Most of the time, yes, if SQLite supports the filesystem you are using. If the filesystem doesn't have a particular feature SQLite needs, it can be difficult to get working the way you want it to.
Well the issue I ran into was that I wanted a SQLite db to have one writer and many readers. However, what I was seeing was that queries were failing because the DB was locked at the time of reading. The solution is to switch journaling mode to WAL[1].
However the particular platform I was developing for was an embedded system using the JFFS2 filesystem. JFFS2 doesn't support shared mmap, which is a feature SQLite needs for WAL.
So basically, it was big pain - I had to setup a tmpfs mount point where the WAL-enabled DB resided and then periodically sync it to flash.
Why didn't you just do more granular writes and let the different processes lock the file? SQLite is impressively fast, it can handle a lot before you need a full database.
Basically there is one process periodically dumping stats into a sqlitedb. There is an HTTP API that can retrieve said stats. How do I make it so the HTTP API doesn't occasionally fail when trying to read from the DB?
Retry until it gets the lock and don't hold the lock longer than you need to to write to the db.
Really though if you are trying to read and write to a db concurrently and over the network, it sounds much more clear cut for something like postgres.
You asked the question and now you don't like the 'idea' of the answer. Use real numbers instead of feels, and again if you need real concurrency, use a real database.
"Retry" is not a solution; it's a workaround. Imagine if that's how opening files worked: "Oh yeah, if the file fails to open just retry a few times until it works."
There's a small chance even retrying 3 times you could fail all 3 times in a row. WAL journaling is a solution.
Not exactly a filesystem, but it doesn't work on SMB shares, which is an enormous pain in the ass. And I'm not even talking about concurrent access - just one user. I get that SMB file locking is a mess, but making it a hard fail by default was a big mistake IMO as most develoers using the library don't make special cases to allow this.
The documentation is very clear than SQLite over a network isn't a good use case, for reasonably sized data. Item 1 from "Checklist for choosing the right database engine" [1]
> 1. Is the data separated from the application by a network? → choose client/server
> Relational database engines act as bandwidth-reducing data filters. So it is best to keep the database engine and the data on the same physical device so that the high-bandwidth engine-to-disk link does not have to traverse the network, only the lower-bandwidth application-to-engine link.
> But SQLite is built into the application. So if the data is on a separate device from the application, it is required that the higher bandwidth engine-to-disk link be across the network. This works, but it is suboptimal. Hence, it is usually better to select a client/server database engine when the data is on a separate device from the application.
Thanks for sharing that link, it's on the most valuable things in this thread for me! (I loved SQLite already and one HN thread more or less isn't going to change that)
I find it remarkable that it succeeded, despite its complete and utter lack of security. It's trivial to overwrite its metatables to create rop chains via select, it's default fts hook is insecure. It's basically a hack.
https://research.checkpoint.com/2019/select-code_execution-f...
They address this directly under the section entitled "Open-Source, not Open-Contribution":
SQLite is open-source, meaning that you can make as many copies of it as you want and do whatever you want with those copies, without limitation. But SQLite is not open-contribution. In order to keep SQLite in the public domain and ensure that the code does not become contaminated with proprietary or licensed content, the project does not accept patches from unknown persons.
In other words, the reasoning is that since the code is released to the public domain, they want to ensure they can continue doing so without encumbering or confusing future releases with tainted contributions. Quite admirable.
It's pretty normal for open source projects to refuse contributions from anyone who hasn't signed a CLA or something similar. The alternative is a legal nightmare.
Just to be clear: We do not have any support contracts with the
US military, nor any other US government agency, nor any other
government entity, either inside or outside the US. Not that
we would turn down such work if it were available, it is just
that is has never come up.
Until I read the transcript, just now, I had confused SQLite's origins (a General Dynamics contract with the US Navy) with its through-2050 support contract, which, it turns out, is with Airbus.
There are 3 people on the planet who can make changes to it and one person who can work on their custom made source control system. Single points of failure are not sustainable.
Well, it's been popular for 20 years, so that sounds fairly sustainable to me. A lot of projects with many more contributors have come and gone in that time period.
Either way, that doesn't make it "closed source" like you said in the other comment.
> There are 3 people on the planet who can make changes to it
Anyone on the planet can make changes to it and distribute those changes. The only thing those 3 can do that the rest of us cannot is get their changes into the copies distributed at sqlite.org.
Migrations are also hilariously easy with SQLite if you use the user_version pragma. We have a monotonically-incrementing number to indicate each version. Our migrator first queries this upon startup and runs all the migrations that exist between current user_version and whatever our highest sequence # is (defined as a constant in code). Some might argue that you should just use EF (or whatever language-specific ORM), but I prefer the level of control and the substantial performance boost you get with raw SQL and a simple migrator approach. I cannot speak for EF Core, but EF6 performance was absolutely abysmal in some more complex cases.
I would also say that performance might surprise you if you turn on WAL. I have saturated NVMe disks inserting lots of data into SQLite. The trick is to use a single connection and let SQLite handle the serialization for you, rather than try to manage multiple connections to the same file in code (i.e. one connection per logical operation sucks for SQLite).