I'm curious if Notion has any plans to make the "type" property user-extensible. Given the current data-structure, which decouples the block data from the way its rendered through the type property, a user has to define only one template for rendering arrangements of UI components (boxes, bullets, etc), titles and children. Extension could operate even at the level of derivation, where users could extend current base types with custom styling (color, font, size, border, etc) and child layout. As a plus, derivation would allow for blocks to be shared, with a fallback default rendering if users don't share custom types. Given the multi-dimensional nature of the uses of Notion (for work, personal projects, life management, etc), having types that were specific to their domain (grocery list, monthly budget table, contact card) would be a useful tool to semantically separate blocks by their presentation.
What’s the reasoning why Notion doesn’t have partial text selection across blocks? I originally thought it was an issue with each block being it’s separate contenteditable, but if you wrap all of the contenteditable blocks in another contenteditable div, partial text selection works (at least on Chrome).
I'm wondering ... so when a user requests a page which will have a full hierarchy of blocks, how are the db queries done.
Do you first query the root to see it's content blocks, then make additional queries to load the root's children block, then make additional queries to get those blocks children blocks (ie. recursively) until there are no more children?
Does that result in too many database queries? Or do you have other ways to optimize it?
It appears that Notion uses Postgres, which supports the recursive/hierarchical queries that are part of standard SQL [1]. While I don't know for sure that Notion uses this, it seems likely.
Our source-of-truth data store is Postgres, with a Memcached cache on top.
Most of our queries are "pointer chasing" - we follow a reference from one record in memory to fetch another record from the data store. To optimize recursive pointer-chasing queries, we cache the set of visited pointers in Memcached.
We use Elasticsearch for search features like QuickFind.
> Also, are you adding presentational tables any time soon? :)
Why not extend this to create forms/surveys as well in your application? You can render a checkboxInput block type as a <input type="checkbox" /> in a form. Your backend would match up the form POST parameter names with the associated form input names and that would be your "Submission".
I'm going to be writing a blog post for my engineering team soon and I'm really impressed by the storytelling and presentation here. How did you plan out your blog? Were there any resources or examples you found useful during the writing process? Thank you!
I'm glad you enjoyed it! It took a long time to put together. I started out by writing down how the entire system works with enough explanation for a technically-minded non-engineer, and enough detail to satisfy a newly hired infra engineer. That rough draft was about 10,000 words.
From there we started looking for a narrative. We extracted out the sections you see in the final post, and removed a lot of the superfluous technical detail so we didn't end up with technology buzzword soup; for example we cut discussion of Postgres, Memcached, etc etc, how we host the web servers; the kind of details that don't actually matter to the narrative.
This is really the first engineering blog post we've put out, there was a fair amount of figuring-out-how-to-do-it going on. Now that we've had the experience, we're starting to write up our playbook internally.
Ltree is interesting, but if I understand correctly, to move a parent block, I'd also need to update the path column in all the child blocks -- at our scale such write amplification is a non-starter.
I don’t actually see a graph represented anywhere in the article; the author references wanting a graph at the start, but the only thing I’m seeing described are trees of nested blocks. Even the properties list seems to be a grab-bag of KV pairs that gets permanently attached to a block once initialized, to support roundtripping
Which is pretty much the ideal scenario for a document store. The article describes Notion as being very strictly hierarchal
A document store is basically optimized for specifically hierarchal data situations — a tree. The data structure you’re describing, and what the article describes, is precisely that: a tree.
When comparing a document store versus a RDBMS, in terms of suitability and appropriateness, the distinction is primarily along the lines of a tree, versus an arbitrary graph (by which I mean that an RDBMS is more powerful, and more general, but not inherently as optimal in either performance, “scalability”, or UX in the places where a document store makes sense.
More specifically, the way the article describes it, you’re not interested in “give me every block of type X” — you’re only interested in “given block Y, what type is it?”.
That is, the question is one-way, and fits cleanly in a hierarchal format of a document store.
The only question posed that operates in the reverse direction is permissions, though even that’s a little odd, since it seems to me it should only go “downwards” as well — a block’s permission scope is the sum of all of its parents, and you can store it there upon iteration.
> The underlying persisted data doesn't necessarily have to be a bag of KV pairs.
It doesn’t have to be... but it can be, and appears to be.
> A block is related to its parent and descendant blocks.
Right; the singular parent, and the multiple children. A tree.
> In graph theory, a tree is an undirected, connected and acyclic graph.
When discussing trees and graphs, I think it’s obvious a distinction is being made between a graph forming a tree, and graph forming a not-tree (more complex than a tree). When I say that a square is easier to encode than a rectangle, I do not mean that a square is not a rectangle, but that a rectangle is not a square — that a square’s more specific properties give us opportunity to simplify/optimize (I only need to store one length to represent it).
A database can encode a tree just fine, but that doesn’t mean it’s the best tool to do so.
There are other properties to a document store I don’t care for, and I don’t like them in general (like the implicit schema, and total lack of data consistency validation by the data store, and the fact that you often don’t truly have a tree), but representing a tree is what’s been described, and it’s exactly what they’re specialized for.
If you want to argue against it, you need to specify why you think this isn’t a tree, because I feel it’s quite obvious it is.
We don't use an ORM. Notion's codebase on the back-end is much more functional than object-oriented, in the sense that we have many more code that looks like `transformTheData(theData, theChangeToMake): ResultingData` than we have classes or methods.
We do lean very heavily on the TypeScript type system and try to make invalid states unrepresentable.
have you tried "data last" FP like `transformTheData(theChangeToMake, theData): ResultingData` instead? I learned this from Ramda.JS, makes it way easier to leverage currying, ex `change = transformTheData(theChangeToMake); change(theData)`