As a non-dev intruder I have to say that I love SQLite. I do a lot of data-analysis and it makes everything easy, from fast SQL Wizardry to sharing the DB just coping a file! Just how amazing is that?!
It must sound naive to some of you, but the first time stumbled upn sqlite I was so excited!
I work in data analysis too and SQLite is to me what MS Excel is to other people in the industry. I have development background so I never picked up in MS Excel to same degree as my colleagues from statistics backgrounds. In corporate organisations where DBAs monitor the use of their database with an eagle eye SQLite is a fantastic option to analyse several millions of lines of data on your desktop. I find DB Browser for SQLite (DB4S)[0] is invaluable when using SQLite.
If you like SQLite for data analysis, you might want to check out DuckDB https://github.com/cwida/duckdb which is billed as "SQLite for analytics."
SQLite is a row store, which is best for OLTP (point queries, inserting/updating/deleting one or a few rows at a time), while DuckDB is a column store, which means the data layout has values from the same column stored contiguously, making aggregation queries (GROUP BY) perform much better.
I was wondering the same--I guess because it's fairly new (2018) and it came out of a database research group at a European university, rather than a SV tech firm. Therefore, limited marketing budget.
By the way, here's a YT video of a talk given by one of the DuckDB implementers about why they made it, what it's for, and how it works: https://www.youtube.com/watch?v=PFUZlNQIndo
Thanks for this. I used to be a longtime user of HeidiSQL but drifted away because I landed up in an Oracle environment. At the time I drifted away SQLite was not supported. I look forward to being re-acquainted with HeidiSQL.
Excel is nice as a simple sandbox, and PowerQuery is amazing IMO, the problem is that it's basically impossible to work with large datasets. If you didn't know about it I recommend you to play a bit with it.
Most people don't realize that Excel (like PowerBI) has an in-memory, compressed, column store database inside of it.
Loading hundreds of millions of rows into it takes a while, but given a commensurate amount of RAM and a reasonable data model (single table or star schema), performing aggregations with a pivot table is pretty snappy.
Well, for me with my personal PC with a quad core and 24gb DDR3, it gets unusable for about 1 million rows, but it's true that most of my datasets have around 10 columns or so.
Does SQLiteStudio let you cancel an unexpectedly slow query part way through? DB Browser for SQLite doesn't seem to, and I don't really know what I'm doing so often accidentally set off a query that can't use indices without doing "EXPLAIN QUERY PLAN" first.
The first beta release for 3.12.0 should be out next week. There's not much change in it though (mostly language string changes), as the alpha1 has turned out to be really stable. :)
I used to use this but if you have a JetBrains sub you can open these files directly. Or even better if its a full subscription DataGrip is fantastic and they recently added MongoDB support. Which is not perfect but good enough!
> WARNING: this library is NOT ready for production. Use at your own risk!
How far off do you reckon it is from being "production" ready?
Asking because we've been adding useful SQLite extensions as optional extras in our (sqlitebrowser.org) installer. Can add yours too, if you reckon the code is reasonably cross platform and shouldn't cause (many) weird issues. :)
Basically--it's a struct containing a byte array, weight, scale, and sign, and all the arithmetic operations are implemented in software. So it's really slow, and each RDBMS has to implement this type from scratch, or find a vendor lib that provides a suitable one, because C/C++ standard libraries don't provide one.
My tools of choice are similar: sqlite, DB browser, Jupyter. I've found that DB browser struggles with analytical queries on tables over a few GB though. Still an invaluable exploratory tool.
Most of my web apps’ databases are an SQLite file. It’s more than enough for the ammount of traffic they serve and the db files are easy to set up and backup..
I also always wondered why sqlite isn't used more in websites. Especially if you split heavy write workloads to a separate database file it scales quite far.
There's no reasonable way to share a SQLite database between processes on separate machines or VMs.. What website doesn't use at least 2 instances for HA?
How do you make sure you don't lose data in a SQLite DB?
I throw together plenty of simple sites intended for personal use or just a small group. I don't need nor want the added complexity of going for HA. I can easily spin up a cheap VPS with an SSD and scale amazingly far with one instance and a SQLite DB. When it's for personal use or small group it's not a problem to take it offline for maintenance/deploys. I don't even use a load balancer (tho I often throw nginx in front to do mTLS and compression). I can build a site like that in 30 minutes.
I deal with massive scale all day every day for work (kuberntes/openshift/public clouds) so I find the simplicity amazingly refreshing.
I might also say the performance of my "single box" websites is also amazingly good. I've had even developer friends ask me how I get response times so fast, and they are often surprised when I tell them.
Many websites just run on a single machine somewhere, with maybe some caching in front or off-loaded static files. Even HN is as far as I know still just a single box.
Not sure about parent’s case, but I’ve used it for small blog-type applications where the only writes are from me (the admin) creating new entries. The rest is read only and behind caches, cdns, etc. The blog is effectively a singleton. Analytics and comments are handled by 3rd party providers.
You can use a regional GCP disk that replicates over all zones in a region (https://cloud.google.com/compute/docs/disks#repds). You cannot mount the same disk to many VMs, but you can force mount the disk to a new VM.
Right! But as I said, the low traffic volume doesn’t require more than this kind of architecture.
I do regular file backups and have a cron job restarting the web apps each 20 minutes, it’s enough for now. I would love to have a more solid architecture, but it matches the traffic volume and it’s good enough..
TBH, it'll work for a lot of multi-process apps, so long as the app runs on a single server and the app doesn't do anything foolish like write to the database on every page hit. For read-mostly applications -- like a blog, wiki, or even a small web forum -- SQLite is a surprisingly adequate database.
I can't even tell you how many times a dev came to me with some weird question that was very simply answered by "just use sqlite."
I've never thrown more than 20GB of data at it, but it always handles things admirably, and it often (to me) makes the right calls about the level of complexity required (accepts column type defs but really just doesnt care if it maps to string.)
I have done several multi-terabyte queries on it. One time I had a race with another dev to see who could get the right answer on a 3TB dataset. I was finished before his spark cluster had even spun up.
I'm a front-end dev that can wing backend from time to time, and I use SQLite as much as possible. On multiple projects now I've run into complications due to complexity or environments, and adding a simplified local development backend with sqlite kept down time to a minimum.
Same. The last time I was this excited about an RDBMS in your pocket was when BerkleyDB released their Java database in a single JAR file. I am not sure when they initially released it but I remember doing some hobby projects in 2003.
I do occasionally use Apache Derby[0] in similar manner to SQLite. Mostly when data types are critical. Works well when using DBeaver or any SQL editor that allows you to add JDBC connections.
At my job we started using H2 which much like SQLite has shareable databases. We now wish more databases were this simple. Its just too efficient to share a db in its total current state and someone with a little more experience or who does not have tunnel vision can probe and resolve issues. Plus you can always go between states of broken vs working.
For our B2B application, we've been using SQLite as the exclusive means for reading and writing important bytes to/from disk for over 3 years now.
We still have not encountered a scenario that has caused us to consider switching to a different solution. Every discussion that has come up regarding high availability or horizontal scaling ended at "build a business-level abstraction for coordination between nodes, with each node owning an independent SQLite datastore". We have yet to go down this path, but we have a really good picture of how it will work for our application now.
For the single-node-only case, there is literally zero reason to use anything but SQLite if you have full autonomy over your data and do not have near term plans to move to a massive netflix-scale architecture. Performance is absolutely not an argument, as properly implemented SQLite will make localhost calls to Postgres, SQL Server, Oracle, et. al. look like a joke. You cannot get much faster than an in-process database engine without losing certain durability guarantees (and you can even turn these off with SQLite if you dare to go faster).
+1 we use SQLite in our BI tool to keep information about user accounts, data sources (cubes) / reports configs, access control rules etc. SQLite works like a charm even in cloud version where we have > 6,000 registered user accounts.
A successful business, but not a demanding database: 6000 users accessing the database hundreds of times per day each, for small reads and small writes, are unlikely to need more than a few GB of space (the parts of the database in use should fit in the operating system's disk cache) and unlikely to exceed 20-30 small I/O operations per second.
But it accurately describes a lot of use cases where something like Postgres (or MSSQL or even Oracle!) have been deployed because it’s what people assume is needed. It’s great to see accounts detailing how simple things can be.
Perhaps the needs in enterprise are slightly different. Handling HA , backup and security! in a typical deployment entire DB is a single file on the application runtime, in a multi-user context this is significant risk, basically you are trusting each app developers and admins to made the environment and code secure for each as opposed to centralized DB admins doing it. It is easier for a existing DB service stack when you are just adding one more DB, than handling DBs distributed across hundreds of applications typical in any organization.
Managed SQL services for SQL variants are readily available and fairly affordable for applications which do not have access to such shared service layers .
It does not mean that SQLite has no place I love using it in native applications and mobile apps, however multi-user web applications is perhaps not one of them, even concurrency is not a problem
With current volume, our largest transactional datastore is ~50GB.
That said, if we had a situation where we were pushing such volume that the transactional store was 50 TB, and we were still within the scope on the overall project, I see no reason why this would be a problem for us. As I mentioned prior, the context of our applications is single process, single server. Obviously, this is not suitable for all edge cases and you will potentially be better off with a hosted solution in such cases.
At the end of the day, there is no arbitrary number of bytes where B-Tree indicies just stop working because you didn't shell out for a SQL Server license.
Sure, but things like horizontally partitioning your data across multiple hard drives via splitting files, supporting partition elimination in your queries, etc are all things that I am pretty sure sqlite doesnt even want to bring to to the table.
And I would hope that they do not bother to. The current offering is arguably perfect as-is.
I would prefer to implement things like horizontal partitioning myself using business keys and logic that make sense for my problem. I do not want a database to manage this sort of thing for me, because I want perfect visibility into exception cases so business logic can be developed around them.
I.e. if one of the drives fails, but I was only storing a lower-priority business entity on that particular partition (e.g. user preferences), I could decide to just continue processing without it (e.g. reset to defaults). With a hosted database solution where you do not have this degree of granularity, a catastrophic loss of part of the storage array would mean the entire database is down. Even if the important data isn't directly impacted.
Out of interest, what circles are you moving in? Are people filling their databses with videos or something?
I'm sure I just lack the imagination, but i cant imagine how there can be thwt many companies who can make TB of data they then want to shove in a database. The complete sales history of even a medium sized company should still easily fit in a GB or two.
If you decide you dont want to throw anything away forever you can grow pretty fast :)
Some of the huge ones absolutely do binary storage for HA/DR reasons, but payroll companies and realestate companies are the mainstay and they have a tendency to store a lot of XML crap on top of the lovely normalized rows.
I often connect to production databases in read only users to do various data analysis. Is this something you can do with SQLite (besides maybe SSHing into the machine)? If not, how do you get around it (if it ever even comes up)?
One is to simply zip up the entire database and send it across the wire. This is most applicable for local development and QA testing scenarios.
Another is to have something in the business application and relevant tooling that allows for programmatic querying of the data we need to look into.
We also have some techniques where we do ETL of the data range we care about from 1 SQLite db to another, then pull down the consolidated db for analysis.
While I love SQLite as much as the next person (and the performance and reliability is really quite remarkable), I can’t understand all the effusive praise when you can’t do basic things like dropping columns. How do people get around this? Do you just leave columns in forever? Or go through the dance of recreating tables every time you need to drop a column?
SQLite is for storing data in an environment where SQL is meaningful. Anyone wanting to do database admin tasks (like adjusting table schema) would be well advised to go with a Real Database.
SQLite really shines when you have a more-or-less final idea of what tables you want to build and don't have to worry about users inputting crazy data. Anything up to that and it is a model of simplicity. Anything beyond that and it is undercooked.
I just sucked the existing table into RAM and recreated the table. I did it on a transaction so there was no risk of data loss.
In my case the data was always 10s of MBs.
Remember, the point of SQLite is a replacement for generating a file format. Although it's a database, it lets us (developers) re-use our knowledge of databases when doing basic file I/O.
When do you need to drop a column in a production DB? Maybe my anecdotal bubble is about to burst, but I work in the public sector, and have for a while and on our 200 different production DBs behind around 300 systems we’ve never dropped a column.
Depends on the maturity of your schema - if it's all figured out based on your business domain it won't happen much. If you're still finding product-market fit (or equivalent) splitting a table into two happens sometimes.
"Splitting" a table usually means creating two new ones and dropping the old one after migrating its content with a complex migration script followed by thorough testing. Dropping columns is not only abnormal (adding columns is far more common: features tent to be added, not removed, over time) but also a very crude tool.
Well what I meant was: when you break one table out of another. The kind of thing that comes up when you learn that there's a one-to-many in the domain that you didn't know about when you started.
There are also operational concerns here. Dropping columns may require rebuilding indices, which can have a high cost that isn't worth paying for just to keep the schema clean.
Pretty sure they must, row based storage on disk would practically require it just to not completely waste all of the space you've just gained from deleting the column by leaving a gap on every single row.
If adding a nullable column is free, it probably means that the DBMS is able to distinguish multiple layouts for the same table: existing rows in which the new column doesn't actually exist and is treated as NULL, and newly written rows in which there is space for the new column.
But dropping a column is different: even if the DBMS performs a similar smart trick (ignoring the value of the dropped column that is contained in old rows) space is still wasted, and it can only be reclaimed by rewriting old files.
Dropping a column in postgres is also instant, so yes, it uses the same trick.
Deleting a row is similar too - the row is not removed from the heap page and the database does not get smaller (though if that page gets rewritten the row is not kept). Last time I used innodb it didn't actually return free heap pages to the filesystem at all so no matter how much you deleted the database file never got smaller.
I agree with you. SQLite drove me nuts when it came to changing your database. This is one of the reasons I just use DB Browser for SQLite (DB4S). It takes care of all the complexity.
The general strategy is to create a new table, insert data from the old table,
drop the old table, rename the new table, and re-create the indexes:
create table foo2 (
col1 int,
col2 text
);
insert into foo2 sleect col1, col2 from foo;
drop table foo;
alter table foo2 rename to foo;
create index on foo(col1);
As for the reason, see the next section on that link. It's not perfect, but it is what it is. SQLite wasn't designed to solve 100% of the use cases, and that's not a bad thing.
We use sqlite as a smaller cache of a much larger postgres db. The cache is generated by a job and yes is regenerated every time before being pushed to machines that need it.
Think of SQLite as a file format which happens to have a query interface, and not a database.
MySQL did DML for years with full table rewrites behind the scenes. It's not particularly hard to emulate, though not entirely pleasant.
(Although I really raise an eyebrow when people talk about performance. Every time I write anything server-like in sqlite I end up regretting it, and get enormous speed boost, not to mention actual concurrency, switching to PostgreSQL.)
For data analysis workloads i just load in my raw source data and then develop a series of scripts that create new tables or views on top of those raw inputs.
For my use cases I've thusly never had to drop/alter a column... but I understand it could be very annoying.
One of the great things one can learn from SQLite is the degree to which they unit (and integration) test their source code. It's honestly the best unit test document I have read in my career to date: https://www.sqlite.org/testing.html.
IIRC, some company wanted to use SQLite on an airplane, so they paid the devs enough to bring the test suite up FAA standards. IIRC, they have code coverage of every machine instruction.
That was my business plan: Do the intense testing required for avionics, then sell the test cases to aviation manufacturers. That plan didn't work out - I've never sold the tests to any aviation manufacturer; not one. But the TH3 test harness has had side benefits that I did not anticipate, not the least of which is that it allows us to maintain a complex code base that is run on billions of devices with just a few developers.
SQLite is great but its decision in not having a standard datetime/timestamp datatype -- a standard in all other relational databases -- has always struck me as a surprising omission, but in retrospect I kind of understand why. Datetimes are undeniably difficult.
So sqlite leaves the datetime storage decision to the user: either TEXT, REAL or INTEGER [1]. This means certain datetime optimizations are not available, depending on what the user chooses. If one needs to ETL data with datetimes, a priori knowledge of the datetime type a file is encoded in is needed.
In that sense, sqlite really is a "file-format with a query language" rather than a "small database".
Simple. Bulk insert with a 999-parameter limit is just painful; if each entry has 9 columns, you can’t even insert 112 rows at once. In practice distros already compile with higher default; e.g. Debian compiles with -DSQLITE_MAX_VARIABLE_NUMBER=250000, still way higher than this new default.
What's the point? Inserting batches of 1000 rows at once, or even 10k rows at once is hardly any faster overall than using batches of 100 rows, assuming there are no delays in presenting the batches to the DB.
Is it reasonable to assume that in most current deployments of PostgreSQL or MySQL, SQLite would be at least an equally good choice?
I was recently choosing a database for a medium-size website and SQLite seemed like an obvious choice. One thing I was worried about was that the database locks for each write - but this is apparently not true anymore with write-ahead log.
Depends on the environment. SQLite will scale out reasonably well so long as it's only needed on one machine. As soon as you need a network-accessible database, traditional database servers start looking like a better option.
I ran some performance/reliability benchmarks on the product I'm working on (which supports SQLite and PostgreSQL), and SQLite was about 30% faster than PostgreSQL.
This won't hold true for all use cases; one table now has 11 million rows, and I'm not sure how well SQLite would perform on that. The benchmark was very simple anyway, and it's mostly a read-only where users don't update/insert new stuff. Would be interesting to re-test all of this.
Yes, most wordpress or joomla sites come to mind. There is typically only one application communicating with it, the user doesn't typically doesn't admin the database directly (and if they did they want a file), medium traffic load (hundreds per second), and most of the queries are reads, with the occasional content update.
As soon as you get into privilege levels or heavy loads, then those others make more sense.
I doubt it but we’re both guessing. Personally I’ve never worked on a professional project that had all readers/writers on a single computer. So in my bubble SQLite is not an option.
I’ve been using SQLite on GCP for a few small projects and it seems to work well.
I use docker volumes to write to disk. I pass the disk directory to my process via a CLI arg.
When running on a VM these disk writes are replicated between zones (this is default for regional GCP disks). So you get zero config high availability (if you can tolerate down time during a reboot).
Thanks I have seen this, but would prefer to use the data center provided replication at the disk level as I do not need to have real time failover (I just need to make sure I can recover data in case of a single zone failure). Also incremental disk snapshots are nice to have.
Are there resources for good practices on database formatting? I feel that what I make 'works', but I'd be curious on what experienced databases look like.
For example I have an app that you upload files through. Files can be local to the server or on s3 and have metadata. I end up making a new table for the API points. Like a table for listing files/directories. A table for local files and a table for s3 files. Then a table for the metadata, and a table for the kind of file it is, etc. It works, but it feels like a heavy hammer.
A few nice little conveniences like IFF(). I like reading SQLite released because they seem good at avoiding adding cruft.
(The refusal to implement JSONB comes to mind.) Now if only I could get my shared web host to upgrade to a recent version...
If you have ssh access to your web host, you may be able to upgrade it yourself. I needed something more recent for django 2.2 and had to download the latest sqlite, compile it, put the lib in some folder and add the lib to .bashrc so that python3 could use it (ld_include_flags or something like that).
My Google skills are failing me here, can you provide any more details? I'm very interested in knowing tricks to upgrade the SQLite version used by Python.
Specifically, you'll need to recompile libsqlite3, put it somewhere, and then set LD_LIBRARY_PATH before invoking Python. You can do that globally in your shell by modifying your .bashrc or similar file.
Or if you're super brave, you just replace the libsqlite3.so that Python is pointing to (really depends on your use case).
Recommendations for learning SQL with SQLite? I've recently started doing the Khan Academy videos, and am liking them, but I'd like more practice problems and explanatory text.
One possible disadvantage of SQLite is that it only allows one writer at a time (but writes don't block readers with write-ahead log enabled). I'm really curious about whether Postgres performs better at concurrent writing, couldn't find any benchmarks. In theory, disk writes are always sequential, so I'm skeptical Postgres would do substantially better.
SQLite is widely known to be for single writer workloads, whereas PostgreSQL is similarly widely known for being extremely good in concurrent usage scenarios.
Those are the things they're each designed for. eg:
Well I explained my rationale above - writes are sequential on the disk level regardless the database. So PostgreSQL shouldn't have much of an advantage in concurrent simple writes (it could even be slower than SQLite). PostgreSQL should be faster in concurrent complex transactions but the question is how complex and how much faster.
Good point. That's a reasonable place to start investigating from.
Personally, I can't be bothered (to much higher priority stuff to do). But if someone else gets around to testing this specifically, I'd be interested in the answers as well just to "make sure" the common understanding isn't wrong. :)
SQLite isn’t a db-server like most other mainstream databases. It’s more of a db-file; almost an excel file. This means it’s usecases are quite different and perf comparisons don’t make sense.
I have running in production a SQLite powered service for the free Geonames gazetteer. It's a read only service so it fits perfectly and providing really good performance.
I also use it to work with data coming in CSV format.
What a great piece of software!
It must sound naive to some of you, but the first time stumbled upn sqlite I was so excited!