Notion seems like an interesting data storage problem. The vast majority of the data is in the `block` entity, which are organized into a sort of nested set, written/updated individually (user edits one at a time) but read in ranged chunks (a doc).
Off hand this seems like an almost worst case for PG. Since the updates to blocks could contain large data (causing them to be moved often) and there is one big table; it seems likely that the blocks for a single notion document will end up being non-continuous on disk and thus require a lot of IO/memory trashing to read them back out. PG doesn't have a way to tell it how to organize data on disk so there is no good way around this (CLUSTER doesn't count, it's unusable in most use cases).
Arm chair engineering of course - but my first thought would be to find another storage system for blocks that better fits the use case and leave the rest in PG. This does introduce other problems, but it just feels like storing data like this in PG is a bad fit. Maybe storing an entire doc's worth of block entities in a jsonb column would avoid a lot of this?
KV means key value. Essentially the suggestion is to organize your data as a big key value structure(hashmap-like).
This term usually means data is not normalized/separated as in regular SQL.
for instance, in SQL:
User table, that has columns: id, email
Article table, that has columns: id, text, user_id
KV/noSQL equivalent:
Article document, that has properties: id, text, user_email
> PG doesn't have a way to tell it how to organize data on disk so there is no good way around this (CLUSTER doesn't count, it's unusable in most use cases).
Tablespaces allow you to store tables in defined directories on the file system.
What I was talking about is controlling the ordering of the rows within a table on disk. If you are going to be reading some group of rows together often, ideally you want those rows to be contiguous on disk as a sequential read of a range is much faster than bouncing around to dozens of locations to collect the needed rows. This becomes more important for very large tables. Imagine a 5TB `blocks` table and you need to read 50 blocks to render a given notion doc but those blocks could be scattered all over the the place on disk, it's a lot more work and it thrashes the page cache.
PG doesn't normally make any guarantees about how rows are ordered on disk and it may move rows around when updates are made. It does has a CLUSTER operation, which re-orders rows based on the order of an index you give it, but this is a one time operation and locks the table while running. This makes it functionally useless for large tables that are accessed and updated frequently.
Some other databases do give you control over disk ordering, SQL Server for example has `CLUSTERED INDEX` which you can apply to a table and it'll order data on disk based on the index order, even for new insertions / updates. It does cost a bit more on the write side to manage this, but it can be worth it in some cases.
Off hand this seems like an almost worst case for PG. Since the updates to blocks could contain large data (causing them to be moved often) and there is one big table; it seems likely that the blocks for a single notion document will end up being non-continuous on disk and thus require a lot of IO/memory trashing to read them back out. PG doesn't have a way to tell it how to organize data on disk so there is no good way around this (CLUSTER doesn't count, it's unusable in most use cases).
Arm chair engineering of course - but my first thought would be to find another storage system for blocks that better fits the use case and leave the rest in PG. This does introduce other problems, but it just feels like storing data like this in PG is a bad fit. Maybe storing an entire doc's worth of block entities in a jsonb column would avoid a lot of this?