Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Quirks, Caveats, and Gotchas in SQLite (sqlite.org)
124 points by thefilmore on Oct 20, 2022 | hide | past | favorite | 89 comments


> The key point is that SQLite is very forgiving of the type of data that you put into the database. For example, if a column has a datatype of "INTEGER" and the application inserts a text string into that column, SQLite will first try to convert the text string into an integer, just like every other SQL database engine. Thus, if one inserts '1234' into an INTEGER column, that value is converted into an integer 1234 and stored. But, if you insert a non-numeric string like 'wxyz' into an INTEGER column, unlike other SQL databases, SQLite does not throw an error. Instead, SQLite stores the actual string value in the column.

wtf. who would ever want that?


Me!

I'm often exploring data where either there's no defined standard or the use of the data has drifted from the standard. Now, I could go over this line-by-line, but instead my go-to has been "Hey, let's throw this into SQLite, then run stats on it!" See what shakes out. SQLite kindly and obediently takes all of this mystery data, which ends up being nothing like what I was told, and just accepting it. Then I can begin prodding it and see what is actually going on.

This is something that has come up for me for at least a decade: chuck it in SQLite, then figure out what the real standard is.


It's reasonable to have an arbitrary/untyped data type but an integer data type should not accept non-integers.


Right what's the point of declaring a column to have type INTEGER if it really means "This column MAY contain integers, but may contain other kinds of data as well".


Or union types. Just tack "...or a string" onto any questionable column.


It's especially great for large important legacy datasets that similarly did no enforcement. You can then interact with the data and clean it up through some commands without trying to work with massive datasets in Excel where it sometimes just crashes the entire app.


I do exactly the same. You import from CSV, everything ends up as a string, and then you copy the data to a STRICT table so you can understand what's really going on.


irreversible loss of information.


What do you mean? I don't think this process ends up with less info than the CSV file format itself.

Short of doing that, cleaning very dirty data has no satisfactory solution, I think. Optional typing is a nice middle ground between untyped and slow (R, Python) or strictly typed and tedious (all other DB engines).


Maybe I misunderstand, but if you take a data type that is a union of two datatypes (string and int) and push it into somethign that converts all the data to strings, you lose the information of whether a string that looks like an int was originally a string or int.

Maybe instead you mean to use the database functionality to identify problems like that in the original source data. If somebody hands me a string "11101100101001" I would not attempt to interpret it by parsing it into a binary number, but I think that's because I really like strong, simple typing.


I can't answer for the person you're commenting to, but I use sqlite in a similar way and the loss of information isn't an issue. When you're exploring data, just trying to make sense of what someone has sent you (say, migrating data from an existing system to a new system) you probably have a pretty good idea if the column is supposed to have integers or strings in it. In my own use of it, it's very unlikely that the loss of information will cause a problem. Eventually the data will be type-checked -- not in sqlite (I haven't experimented with the new-ish "strict tables" feature yet), but in my case the data will eventually flow through a typed programming language.

Anyway, different ways to use the tool, for sure! And in some cases one would definitely need to be attuned to the issue you're raising. In the kind of situation I'm thinking of the data is usually so dirty anyway, a bit of string->integer conversion won't hurt (probably).


I get it, you don't have typing control over your data exchange. somebody might send you an unquoted-string CSV with multiple data types in the same column:

id,location,name

1,90210,Tori Spelling

2,Schitt's Creek,Eugene Levy

and you use sqlite to quickly explore the loaded data to check the set of types used in a column, and maybe even glean what the meaning is.


Exactly. The thing I'm talking about is necessarily a semi-manual process (data exploration, usually prior to migrating data from one system to another).

In the example you give, when you've done all the exploration you need, there's a program interpreting that CSV that ensures the location column values are strings. At least, that's how I do it!


> Maybe instead you mean to use the database functionality to identify problems like that in the original source data.

Yes, that's exactly my use case.


Like I said earlier, I don't know what I am getting. Strong, simple typing is premature at this juncture.


Got it, it's data exploration not a formal ETL pipeline code.


I've done it for formal ETL pipeline code, too. I make it tighter, but not too tight. Sadly, the spec given me was very ... optimistic? Bright-eyed? Many things that were not supposed to happen did happen, data that shouldn't have gotten near me did, and so on. SQLite swallowed it and kept on truckin', because when it comes to ETL, one dumb field entry shouldn't stop the show. Catch it, yes, deal with it when you note it in the logs, absolutely, but one must continue, and for that SQLite is most excellent.


Were SQLite not such a stellar piece of software, I would complain, and ask that this kind of flexibility be disabled by default, with an option to explicitly enable it if necessary.


While I doubt it will ever be enabled by default due to backwards compatibility, there is a strict tables[1] feature now.

[1] https://www.sqlite.org/stricttables.html


There should be an option to have strict tables, foreign key checks and other modern features to be automatically enforced.

E.g. a command line argument when creating a DB, which then taints this database as modern. Something like what html did with "<!DOCTYPE html>".

Or simply accept that the time has come to make a version 4 of sqlite.


> There should be an option to have strict tables, foreign key checks and other modern features to be automatically enforced.

I know to enable foreign key checking, and I learned strict tables from this thread. What are the "other modern features" you were referring to?

EDIT: In TFA, section 8 is relevant here I guess: SQLite accepts double-quoted strings as a historical misfeature, which can be disabled with the appropriate C API functions. This is one of the "other modern features" I guess; TIL.


That might have to be version 5 of sqlite. Version 4 came and went:

https://sqlite.org/src4/doc/trunk/www/index.wiki

> SQLite4 was an experimental rewrite of SQLite that was active from 2012 through 2014.


The strict keyword per that doc already taints the a table for pre 3.37.0 (2021-11-27) libraries. (There is a way around that for pre 3.37 per the doc, but it's mainly enabling the PRAGMA used for db recovery operations that treats the strict at the end as an error and ignores it to "facilitate recovery").

So the solution your after would be some additional calls when initializing the database to enable the FK checks (alongside any other app related PRAGMA calls like checking the data_version, configuring journal or page size), and ensure any tables created have STRICT applied.


Unfortunately I found strict mode not very usable, because it only allows the six basic datatype names and doesn’t do column affinity determination <https://www.sqlite.org/datatype3.html#determination_of_colum...>. This thwarts one of the more practical forms of strictness in querying libraries, automatic column type mapping (e.g. mapping DATE (which has NUMERIC affinity) to your language’s date type, or conceivably some other convention like mapping TEXT_Foo to your own Foo type). I’m perplexed as to the rationale for this restriction which feels just gratuitous and unnecessary.


That feels like a bit of a hack, tbh. Playing off a mechanism I think? was made for making porting from other day syntaxes easier?

Is sqlite converting an iso8601 to a timestamp, or would it just store as given when type coercion fails due to dashes and spaces? From memory, it's the latter.

It would be slightly more brittle, but surely this metadata should be at app/orm layer rather than the database?


> surely this metadata should be at app/orm layer rather than the database?

I cannot fathom why you would say this, because the way I see it of course it belongs in the database as part of the table definition, as it’s obviously part of the logical schema. Sure, you can’t actually enforce invariants for individual types so that they would be just conventional aliases for the underlying affinities, but that doesn’t mean you should avoid specifying meaningful types.

Which of these would you prefer:

  CREATE TABLE example (
     id      BLOB PRIMARY KEY,
     created INTEGER,
     data    TEXT
  );

  CREATE TABLE example (
     id      uuid PRIMARY KEY,
     created timestamp,
     data    json,
  );
I know that I want the latter, because it makes my life much easier when reading the schema, and lets code automatically use the right types based on inspecting the schema (though you will need to define a mapping of SQL type names to your programming language’s types, since they are still only informational rather than structural like in most SQL databases). Ideally you might be able to define your own datatype affinities (SQL even defines a suitable syntax: CREATE TYPE uuid AS BLOB), but it’s not so bad leaning into the built-in rules with BLOB_uuid, INTEGER_timestamp and TEXT_json (… though on reflection I admit this is rather perilous due to the precise affinity determination rules, shown in the example “TEXT_point” which would be INTEGER due to containing “int”, so maybe it is actually better that strict mode doesn’t blithely use the current affinity determination rules on the expressed type).

(Actually on the DATE thing I was forgetting and thinking that was a regular feature but it’s actually just the fallback affinity where a type is specified but not matched by any other rule, NUMERIC. Strike out that example as a canonical definition or anything, then. But the rest of the point still stands.)


> Were SQLite not such a stellar piece of software,

Why? Complaining about a rational and logical thing is orthogonal to something's popularity or how excellent it is otherwise.

State your opinions freely.


well, it seems like you should chuck it into an untyped table first. If there are types on a column I'd... really want them to do something.


> wtf. who would ever want that?

Each and every biostatistician on this planet. Especially those touching clinical data. Personally, I was saddened to learn that DuckDB did not include dynamically typed, or at least untyped, columns. Happily my data loads are usually small enough for a row-oriented data store.

CHECK constraints, now in conjunction with STRICT tables, are the best invention since sliced bread! If I could improve on one thing, it would be to remove any type notion from non-STRICT tables.


Fwiw the ANY type is valid in strict tables, and does what it says.

Slightly more so than in non-strict tables in fact: it will store exactly what you give it without trying to reinterpret it e.g. in non-strict, a quoted literal composed of digits will be parsed and stored as an integer, in strict, as a string.

Strict mode also doesn’t influence the parser, so e.g. inserting a `true` in a strict table will work, because `true` is interpreted as an integer literal.


We have a PR in the works that is adding support for sum types to DuckDB [1]. Not quite fully dynamic, but perhaps still useful :)

[1] https://github.com/duckdb/duckdb/pull/4966


This always put me off a bit too. I have memories of mySQL supporters back in the early days defending it storing dates like 2022-02-31 and other such type/range/other issues with the standard refrain of “but SELECT * is blazing fast!”…

As of last year there is an option to make things more strict (https://www.sqlite.org/stricttables.html) though as SQLite doesn't have real date types, unless you are using one of the integer storage options code could insert invalid dates like mysql used to allow.

--

EDIT: having actually read the linked article, it explicitly mentions the date type issue also.


Even in the absence of strict tables, you can also add a CHECK constraint whenever limits are needed on the value types. Here is one example from a recent discussion in SQLite forum threads: `check(datetime(x, '+0 seconds') IS x)`.


An important proviso on the strict tables is that they will not be observed on older SQLite versions lacking this feature.

SQLite has been rock-solid since attaining DO-178B, and commonly isn't ever upgraded in many installations.

CentOS 7 is using 3.7.17. Since the v3 database format is standardized, the older version can utilize a STRICT database file, but will not have the capability to alter datatype behavior.

For these cases, implementing both STRICT and the relevant CHECK constraints is advisable.


If you use integer unix timstamp as datatype for dates, you technically cannot insert an invalid date


Hence "unless you use one of the integer storage options". I suspect many don't because the values being directly human readable could be useful.


It's even more fun in a boolean column, where some databases accept 't'/'f', 'true'/'false', etc. SQLite accepts some of those, but treats other ones as text.


I do, for JSON columns. I store UTF8 strings in SQLite, so that it is easy to see JSON values with a plain `SELECT *`). And I load blobs, because I code in Swift and the standard JSON decoder eats raw UTF8 memory buffers, not strings.

This avoids two useless conversions:

- first from a C string loaded from SQLite to a Swift Unicode string (with UTF8 validation).

- next from this Swift Unicode string to a UTF8 memory buffer (so that JSONDecoder can do its job).

SQLite is smart enough to strip the trailing \0 when you load a blob from a string stored in the database :-)


SQLite is an excellent app database for local things. It doesn't compete with PostgreSQL or MySQL for huge number of reasons. So, to make it fit in the competition model, perhaps there are a bunch of things are glossed over.

Does anyone see a massive pro-sqlite movement going on? Sort of like what happens in JS-ecosystem. Everyone is bandwagoning on it. Criticism of SQLite is much welcomed, specifically exemplifying what its role is and which use cases it serves really well.


SQLite needs this, because changing column types is such a pain in the ass. The answer to "I want to move from INT primary keys to TEXT/UUID, how do I change the column type?" is "just insert the data into the column because SQLite allows everything".

And the reason changing column types is so hard is because, uh, SQLite stores its schema as human readable plaintext (easier to keep compatibility between versions) and not normalized tables like other databases.


What does it mean for a database to have primary keys of different types? In particular, primary keys are frequently used for clustering and other performance enhancements and I would expect that they would need a total ordering that made sense.


If you're talking about clustering you're probably also beyond the use case of sqlite


Not multi-machine clustering, index clustering: https://www.sqlite.org/withoutrowid.html

As much as I love sqlite, its table model is really confusing to me coming from a postgres mentality: "WITHOUT ROWID is found only in SQLite and is not compatible with any other SQL database engine, as far as we know. In an elegant system, all tables would behave as WITHOUT ROWID tables even without the WITHOUT ROWID keyword. However, when SQLite was first designed, it used only integer rowids for row keys to simplify the implementation. This approach worked well for many years. But as the demands on SQLite grew, the need for tables in which the PRIMARY KEY really did correspond to the underlying row key grew more acute. The WITHOUT ROWID concept was added in order to meet that need without breaking backwards compatibility with the billions of SQLite databases already in use at the time (circa 2013)."


> wtf. who would ever want that?

Have you met non-computer scientists?

Anything and everything is fair game and it is probably for the best.

Job security anyway


Setting aside some of the technical choices, I wish we saw more product builders publish transparent perspectives on the 'shortcomings' of their product. SQLite also has a list of reasons why you would or wouldn't want to use SQLite that I've always enjoyed: https://www.sqlite.org/whentouse.html


I'd say it's a lot easier to do when you've already got mass adoption.


I hit annoyances when using sqlite when dealing with some corner-cases. I wanted to implement a lock and share some small amount of data (who did what when) between two different linux users on the same machine.

I figured "sqlite is better than fopen, let's use that!", but between directory permissions, all the WAL files, probably the sqlite3 python lib and Diskcache (https://pypi.org/project/diskcache/) not helping things, it was a real pain, where regularly under different race conditions, we would get permission denied errors. I managed to paper it over with retries on each side, but I still wonder if there was a missing option or setting I should have used.


> I managed to paper it over with retries on each side

You'll hit that elsewhere as well. “Papering over with retries” is standard MS advice for Azure SQL and a number of other services: https://learn.microsoft.com/en-us/azure/architecture/best-pr...


I feel you! I had an ASP.NET Core Web API using a SQLite file for data storage. It was working exceptionally fine in development. When we released it, as soon as more than 10 users were throwing request upon it, it got so deadly slow. Obviously caching and some tweaking helped a lot. But it was far from easy to find, understand and test (!) all required options and optimizations.

Other than that I still love SQLite.


Was WAL enabled?


If I remember correctly, we set journaling to OFF, as it was a read-only database.


> Foreign Key Enforcement Is Off By Default

That is by far my biggest annoyance with sqlite.

Not only that, but the FK enforcement must be enabled on a per-connection basis. Unlike the WAL, you can’t just set it once and know that your FKs will be checked, every client must set the pragma on every connection. Such a pain in the ass.

The error reporting on FK constraint errors is also not great (at least when using implicit constraints via REFERENCES sqlite just reports that an FK constraint failed, no mention of which or why, good luck, have fun).

More generally, I find sqlite to have worse error messages than postgres when feeding it invalid SQL.


There is a compile-time option to always enable foreign-key constraints.


Can't you just wrap the connection handling code with some logic that automatically executes the pragma?


Sure.

The problem is that you have to carefully remember to do that in every project, and if other applications need write access that they do so as well.


Anyone can share their experience with the somewhat new STRICT mode? Does it help? I tend to use Postgres when available, primarily for the added strictness, but I'd surely prefer SQLite in more scenarios as it's easier to operate.


I use strict tables. I’m now realising it did help when I migrated tables from one data type to an other, because I’d missed updating some of the code.

I didn’t realise because it was just what I’d expected, but without strict tables I’d have had to debug strange errors on retrieval rather than the type error on insertion I got.


We have extremely heavy SQLite usage throughout. Strict mode would only cause trouble for us. Most of our SQLite access is managed through a domain-specific C# library we developed. It handles all of the parsing & conversion internally. We don't touch the databases directly in most cases.


I've run into situations where I need a db, but all I have is Python, so the embedded SQLite is my best option. It would be ideal if the interface was fully compatible with postgres (or a practical subset), even if SQLite was handling storage under the hood.


It does not really help in our scenario. I really hope that this feature will evolve more in next versions.


What's missing for your scenario?


- It is annoying to have it per table and not database - No alter table (to convert existing tables) - No backward support (maybe through connection string or something)


Like Oracle, SQLite lacks boolean (bit) columns, but it does support bitwise operators, so multiple boolean values can be packed into an integer with powers of 2.

Setting a value with an update is a bitwise or, and checking a value is a bitwise and.

  $ sqlite3
  SQLite version 3.36.0 2021-06-18 18:36:39
  Enter ".help" for usage hints.
  Connected to a transient in-memory database.
  Use ".open FILENAME" to reopen on a persistent database.
  sqlite> select 2 | 1;
  3
  sqlite> select 3 & 1;
  1
Oracle only has a "bitand" function, but "bitor" has been posted on Oracle user sites:

  create or replace function bitor(p_dec1 number, p_dec2 number) return number is
  begin if p_dec1 is null then return p_dec2;
        else return p_dec1-bitand(p_dec1,p_dec2)+p_dec2;
        end if;
  end;
  /
That isn't necessary in SQLite.

Searches on these will likely require full table scans, a definite performance disadvantage.


The group by behavior is useful. I wish Postgres did that.


Most DBs don't (and won't ever) support that because the results become undefined. You could get different values out for the same query in different versions of the DB (due to query planner changes) or even within the same version and the same database over time (as the relative sizes of your tables change so the planner choses a different route, or index options are changed, so the planner is forced to go with a different route or has the option of a better one).

Though in a world where “eventual consistency” is accepted, may be “eeny, meeny, miny, moe” will be more generally considered OK at some point :)

At least SQLite tries to be consistent (taking the value from the row where the aggregate found its value) where that is possible (which it often isn't) which mySQL (which also allows non-grouped non-aggregated columns in the projection list) does not.


Use of that group by behavior is a bug, nearly every time I've seen it. (Though usually I'm seeing it in MySQL.)

Columns that are not part of the group, an aggregate of the group, or a functional dependency of the group aren't stable — there are multiple rows that could supply the value, and they needn't have the same value.


Yeah, I've seen that bug in a monitoring graph that was showing data for say every 5 minutes but the (MySQL) database table had data per minute; so you'd see a spike in the graph, but refresh a minute later and voila no spike anymore.


You can do that with window functions in Postgres:

SELECT MAX(salary) OVER (), first_name, last_name FROM employee;


exactly, SQL being a declarative language, you should always prefer explicit over implicit

SQL bugs are very hard to detect, when the query return a result that looks right, and because the language is declarative its easy to do those mistakes


Number 6 was really surprising:

SELECT max(salary), first_name, last_name FROM employee;

This returns one row! AFAIK all other databases would return one row per record in the table where first_name, last_name would be from the row while max(salary) would be the value from the row with max salary. Is this SQL ANSI compatible?


Important bit to pay attention to:

    <SQLite's behaviour is> "Be liberal in what you accept". This used to be considered good design - that a system would accept dodgy inputs and try to do the best it could without complaining too much. But lately, people have come to realize that it is sometimes better to be strict in what you accept, so as to more easily find errors in the input.


I encountered a case a few years ago where a query failed because the database had been opened read only but it needed to recover a journal. The solution was to just never open the database read only... but this seems less than optimal. Anyone else seen this?


The lack of a datetime data type is a significant burden on applications.


Once I worked with a codebase that used SQLite to mock MySQL for tests. It mostly worked, but they're different enough that things sliped by around nasty edge cases.


Was the codebase supposed to be generic, or did someone wanted more weird challenges in his life?


> NUL Characters Are Allowed In Text Strings

Any raw byte sequence is allowed in text strings.


Which seems logical to me.

Though there is an issue that some of sqlite's own functions are unaware of this and will end early if a NUL is encountered: https://www.sqlite.org/nulinstr.html


> Which seems logical to me.

I expect text to be text (i.e., Unicode, these days).


\u0000 (NUL) is a perfectly valid unicode character.

Though I can understand people from mostly C/C-alike background where NUL termination is the norm for strings less uncomfortable with that.


I think it was against "Any raw byte sequence is allowed in text strings.", and not against NUL.


But aren't there extended ASCII encodings that have 256 8-bit characters?


There are other encodings, yes, where any byte string is valid within that encoding. The Unicode encodings (UTF-*) don't have that property, however.

The SQLite docs say this about text values or columns, though it's a bit muddy which is which. (But it doesn't really matter.)

> TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).

But it's the best reference I have for "what are the set of values that a `text` value can have".

E.g.,

  sqlite> PRAGMA encoding = 'UTF-8';
  sqlite> CREATE TABLE test (a text);
  sqlite> INSERT INTO test (a) VALUES ('a' || X'FF' || 'a');
  sqlite> SELECT typeof(a), a from test;
  text|a�a
Here we have a table storing a "text" item, whose value appears to be the byte sequence `b"a\xffa"`¹. That's not valid UTF-8, or any other Unicode. The replacement character here ("�") is the terminal saying "I can't display this".

Presumably for this reason, the Rust bindings I use have Text value carrying a [u8], i.e., raw bytes. Its easy enough to turn that into a String, in practice. (But it is fallible, and the example above would fail. In practice, it gets rolled into all the other "it's not what I expect" cases, like if the value was an int. But having a language have a text type that's more or less just a bytestring is still regrettable.)

¹borrowing Rust or Python's syntax for byte strings.


That seems odd, I normally think of text strings as explicitly disallowing control characters, and instead, consider this a binary blob.


"Similarly, SQLite allows you to store a 2000-character string into a column of type VARCHAR(50). Other SQL implementations would either throw an error or truncate the string. SQLite stores the entire 2000-character string with no loss of information and without complaint."

That's not flexible typing, that's user-hostile behavior.


There is nothing called "VARCHAR(n)" in SQLite. It is mapped to `CHAR(ACTER) => TEXT` which is limited to 2^31 - 1 characters.

[1] https://www.sqlite.org/datatype3.html#3.1.1

[2] https://www.sqlite.org/limits.html#max_length


Thats user hostile behavior- VARCHAR is a critical optimization for large databases.


In PostgreSQL a varchar is also a text type column but with an added length check. So it's actually just slower to insert and not more space efficient, as far as I know. So SQLite is not alone in this.


What part of varchar is a “critical optimisation” exactly?

If it’s ensuring that the data is never larger than X, then you can use a CHECK constraint. It has no impact on storage in sqlite, or in postgres for that matter.


These are the native SQLite datatypes, that are also used as column affinities.

TEXT - text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).

BLOB - blob of data, stored exactly as it was input.

NUMERIC - generic number, attempts to devolve to integer or real.

INTEGER - signed, stored in 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.

REAL - floating point value, stored as an 8-byte IEEE-754 format.

Any CHAR variant is really text. If you really need it to be 50 characters, then a suitable CHECK constraint must be in place.

You can also create tables with unknown types:

CREATE TABLE foo(bar razzamataz);

With some gymnastics, you can see the affinity assigned to this column:

create table bar as select * from foo where 1=0;

  sqlite> .dump bar
  PRAGMA foreign_keys=OFF;
  BEGIN TRANSACTION;
  CREATE TABLE bar(
    bar NUM
  );
  COMMIT;




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: