Excellent question! I'll jump in - I am a part of the DuckDB team though, so if other users have thoughts it would be great to get other perspectives as well.
First things first - we really like quite a lot about the SQLite approach. DuckDB is similarly easy to install and is built without dependencies, just like SQLite. It also runs in the same process as your application just like SQLite does. SQLite is excellent as a transactional database - lots of very specific inserts, updates, and deletes (called OLTP workloads). DuckDB can also read directly out of SQLite files as well, so you can mix and match them! (https://github.com/duckdblabs/sqlitescanner)
DuckDB is much faster than SQLite when doing analytical queries (OLAP) like when calculating summaries or trends over time, or joining large tables together. It can use all of your CPU cores for sometimes ~100x speedup over SQLite.
DuckDB also has some enhancements with respect to data transfer in and out of it. It can natively read Pandas, R, and Julia dataframes, and can read parquet files directly also (meaning without inserting first!).
How does DuckDB compare in that aspect? Does it have the same kind of guarantees of robustness, incorruptibility and performance (especially reading/writing binary blobs) that SQLite does?
In any case: DuckDB looks great, nice work! Good to have more players in this space!
It is a goal of ours to become a standard multi-table storage format! However, today we are still in beta and have made some breaking changes in the last few releases. (Exporting from the old version, then reimporting your DB in the new allows you to upgrade!) Those should happen less often as we move forward (the storage format was genericized a bit and is more resilient to future enhancements now), and locking in our format amd guaranteeing backwards compatibility will occur when we go to 1.0!
Thanks, it does help! I understand SQLite might be better/ideal for OLTP (?) but would DuckDB also work for use cases where I query for specific records (e.g. based on primary key) or would I rather use SQLite for OLTP stuff and then read SQLite from DuckDB for analytical workloads?
Basically I'm wondering: if I go all in on DuckDB instead of SQLite would I notice? Do I have to keep anything in mind?
I know, probably difficult to answer without a concrete example of data, schema, queries and so on.
The SQL query features in the article seem really neat. Kudos @ shipping.
Good questions! You are correct that it depends. We do have indexes to help with point queries, but they are not going to be quite as fast as SQLite because DuckDB stores data in a columnar format. (Soon they will be persistent - see comments above!) That columnar format is really great for scanning many items, but not optimal for grabbing all of a single row.
With DuckDB, bulk inserts are your friend and are actually super fast.
one thing I love about DuckDB is that it supports Parquet files, which means you can get great compression on the data. Here's an examples getting a 1 million row CSV under 50mb and interactive querying in the browser:
https://observablehq.com/@observablehq/bandcamp-sales-data?c...
the other big thing is better native data types, especially dates. With SQLite if you want to work with timeseries you need to do your own date/time casting.
I haven't used DuckDB yet but the biggest differences I've discovered if you aren't working on huge datasets where the column/row thing makes a difference (you're probably not) are:
1. SQLite has a great GUI and is really really widely supported.
2. DuckDB is properly statically typed with a much wider range of types than SQLite, which is dynamically typed and only just added support for any kind of type checking at all.
> huge datasets where the column/row thing makes a difference (you're probably not)
For programmers, it’s a tossup.
For most people working in data (databases, data engineering, ML etc) the column vs row thing makes a difference for datasets as small as a few hundred k records.
I found https://duckdb.org/why_duckdb but I'm sure someone here can share some real world lessons learned?