The problem is precisely that NULL is not some sort of Maybe monad, but people keep trying to use it as such. It's a lot like using NaN as a sentinel value for floats - sure, you can do that, but when something goes wrong, instead of an error at the point where the problem is, you end up dealing with a mysterious NULL somewhere way down the line. And that's the best case - the worst is that you get wrong query results because of the way NULL comparisons work.
An empty string is better as a sentinel value because at least this doesn't have the weird "unknown value" semantics that NULL does. But if you really want the same level of explicitness and safety as an option type, the theoretically proper way to do this in relational model is to put the strings themselves in a separate table in a 1:N (where N is 0 or 1) relationship with the primary table.
It looks to me like using empty string would not have prevented the bug in the article. If their language had maybes, they might be able to prevent this bug by having a function type signature where uri is concrete. And most langs with maybes will automatically turn external nulls into maybes.
Joins are cheap. Wide tables are often a sign that a data-model is a bit too CRUDdy. Foreign key relationships often do a much better job modeling optionality/cardinality in relational systems.
In this case, a `user_uris` table with non-nullable columns and a unique constraint on `user_id` is the first option that comes to mind.
At scale ... it is better to use CQRS ... sou you have a transaction model which is fully normalized and a read nly model which is wide, if you really want to use a RDBMS.
I had a situation where I'm not really sure I could have used something else than null: I need a value in one of two columns exactly (meaning one is NULL and the other not).
You can build a constraint to check that if it's in the same table, but across tables it seems to be a bit more complex right ?
You could have the first column indicate the type of value and the second column the value. If you now have your columns named "a" and "b" you could have the first column named "type" only allowing values "a" or "b" and the second column would be "value" where the actual value is stored.
In this specific case, all the local users could have had URLs in the database instead of NULL (or empty string), which would have prevented them from merging.
Yes! NULL is relational for “don’t know”, and SQL is (mostly, with varying degrees of success) designed to treat is as such. That’s why NULL=anything is NULL and not e.g. false (and IMO it’s a bit of a misfeature that queries that branch on a NULL don’t crash, although it’s still better than the IEEE 754 NaN=anything outright evaluating to false). If the value is funky but you do know it, then store a funky value, not NULL.
I’m not sure what to say here. ... Yes? If you’re referring to the Oracle behaviour where '' IS NULL, well, the rude way of putting it is that Oracle is doing a stupid. The more polite way of putting it is that Oracle is absolutely ancient and these parts probably existed long before people had the theory developed well enough to recognize which things make sense and which don’t, and now Oracle’s backward-compat-driven livelihood depends on not recognizing that it’s making no sense there. Either way, if this matters to you, you’re stuck and will have to work around this particular wart.
IMO the problem (at least in this case) is not NULL in the DB, but NULL at the application level.
If NULL is some sort of Maybe monad and you're forced to deal with it, well, you're forced to deal with it, think about it, etc.
Empty string, whatever NULL string is in your language of choice, or some sort of sigil value you invent... not much of a difference.