Indexing existed 15 years ago. The article never mentions why indexing didn't so...

ericbarrett · on Nov 7, 2021

Early InnoDB* had pretty strict limits on varchar indexes and was not the most efficient. I don't remember the details but it's entirely possible the single-table format Rachel described ran head on into those limitations.

Also remember indexes take space, and if you index all your text columns you'll balloon your DB size; and this was 2002, when that mattered a lot more even for text. Indexes also add write and compute burden for inserts/updates as now the DB engine has to compute and insert new index entries in addition to the row itself.

Finally, normalizing your data while using the file-per-table setting (also not the default back then) can additionally provide better write & mixed throughout due to the writes being spread across multiple file descriptors and not fighting over as many shared locks. (The locking semantics for InnoDB have also improved massively in the last 19 years.)

* Assuming the author used InnoDB and not MyISAM. The latter is a garbage engine that doesn't even provide basic crash/reboot durability, and was the default for years; using MyISAM was considered the #1 newbie MySQL mistake back then, and it happened all the time.

wruza · on Nov 7, 2021

indexes take space, and if you index all your text columns you'll balloon your DB … also add write and compute burden for inserts/updates as now the DB engine has to compute and insert new index entries in addition to the row itself.

Indexes didn’t go away with these extra tables, they live in the ‘id’ field of each one. They also probably had UNIQUE constraint on the ‘value’ field, spending time on what you describe in the second half of my citation.

I mean, that should have saved some space for non-unique strings, but all other machinery is still there. And there are 4 extra unique constraints (also an index) in addition to 4 primary keys. Unless these strings are long, the space savings may turn out to be pretty marginal.

WorldMaker · on Nov 8, 2021

Right, indexes don't "go away" as you normalize, but the amount of data indexed drastically shifts and four unique indexes are pretty much guaranteed to be smaller in size than one massive combinatorial index of (all, four, possible, values). Not just in the case where any duplicates exist, but just alone in terms of the database's overhead in things like field delimiters and page metadata. While the specifics will vary a lot with specific DB implementations, generally with varchars involved you can usually assume a worst-case in terms of field delimiting (and from there sometimes a worst-case in average page size and how that impacts B-Tree balancing).

In general, Indexes alone can't save you from a normalization problems. From certain points of view an Index is itself another form of denormalization and while it is very handy form of denormalization, heavily relying on big composite indexes (and the "natural keys" they represent) makes normalization worse and you have to keep such trade-offs in mind just as you would any other normalization planning.

It is theoretically possible a database could do something much smarter and semi-normalized with for instance Trie-based "search" indexes, but specifically in 2002 so far as I'm aware of most of the databases at the time such indexes were never the default and didn't have great multi-column support and would have been expensive to compute if you did turn them own. Even such "smart" Indexes would likely still have suggested you normalize first.

noduerme · on Nov 7, 2021

To me it seems that spreading it over four tables would lead to a lot more potential read locks while the big combined table is waiting for a join on each of the others, and some process is trying to insert on the others and link them to the main. This is assuming they were using foreign keys and the main table 4-column index was unique.

ericbarrett · on Nov 7, 2021

Old best practice for InnoDB performance was actually to never use foreign keys because of these locking issues. Not sure if that's the case in 2021.

noduerme · on Nov 8, 2021

InnoDB uses row-level locking, and foreign keys are (usually) a great feature to ensure data integrity. But using multiple foreign keys from tables `a`,`b` as a composite index for table `x` can cause deadlock if both are being updated in rapid succession, because an update on `a` gets a lock on `x`, which needs a read lock on `b` which is waiting for the lock from `a` to be released. I try to never structure multiple foreign keys as a composite index.

bt1a · on Nov 8, 2021

This is fascinating, thanks for sharing.

cogman10 · on Nov 7, 2021

> Also remember indexes take space

Indexes take space, except for the clustered index.

An important distinction.

If the point of this table is always selecting based on IP, m_from, m_to, then clustering on those columns in that order would make sense.

Of course, if it's expected that a lot of other query patterns will exist then it might make sense to only cluster on one of those columns and build indexes on the others.

roywashere · on Nov 8, 2021

But IP addresses are actually _numbers_, and you can also save them as bytes 32 bits for ip4, 128 bits for ip6 addresses. (Postgresql has a native datatype 'inet' which supports ip6 since 7.4, if you're using postgres and save an IP address in a character string, you're doing it wrong) If you would save the IP as number and put an index on it, all your queries would be blazing fast!

teh_klev · on Nov 8, 2021

> using MyISAM was considered the #1 newbie MySQL mistake back then, and it happened all the time.

The author mentions that these events took place in 2002. At that time the site was likely still running MySQL 3 and InnoDB was very new and considered experimental. Sure MySQL 4.0 had shipped in 2002 and InnoDB was a first class citizen, but back then upgrades from one version of MySQL to another weren't trivial tasks. You also tended to wait until the .1 release before making that leap.

So in fairness for the folks who originally set up that database MyISAM was likely their only realistic option.

I looked after and managed a fleet of MySQL servers from back then and for many years afterwards and even then it wasn't until MySQL 5 that we fully put our trust in InnoDB.

quickthrower2 · on Nov 7, 2021

Surely more than 15 years! 50 maybe?