Author here. I'm happy to answer any questions you might have.

johnknowles · on May 18, 2021

I'm curious if Notion has any plans to make the "type" property user-extensible. Given the current data-structure, which decouples the block data from the way its rendered through the type property, a user has to define only one template for rendering arrangements of UI components (boxes, bullets, etc), titles and children. Extension could operate even at the level of derivation, where users could extend current base types with custom styling (color, font, size, border, etc) and child layout. As a plus, derivation would allow for blocks to be shared, with a fallback default rendering if users don't share custom types. Given the multi-dimensional nature of the uses of Notion (for work, personal projects, life management, etc), having types that were specific to their domain (grocery list, monthly budget table, contact card) would be a useful tool to semantically separate blocks by their presentation.

euroderf · on May 19, 2021

This sounds a bit like customized structured data types with style inheritance, à la DITA.

jitl · on May 18, 2021

This kind of idea is very interesting, but I won't comment on future plans.

bhl · on May 19, 2021

What’s the reasoning why Notion doesn’t have partial text selection across blocks? I originally thought it was an issue with each block being it’s separate contenteditable, but if you wrap all of the contenteditable blocks in another contenteditable div, partial text selection works (at least on Chrome).

gg2222 · on May 19, 2021

I'm wondering ... so when a user requests a page which will have a full hierarchy of blocks, how are the db queries done.

Do you first query the root to see it's content blocks, then make additional queries to load the root's children block, then make additional queries to get those blocks children blocks (ie. recursively) until there are no more children?

Does that result in too many database queries? Or do you have other ways to optimize it?

andyjohnson0 · on May 19, 2021

It appears that Notion uses Postgres, which supports the recursive/hierarchical queries that are part of standard SQL [1]. While I don't know for sure that Notion uses this, it seems likely.

[1] https://en.wikipedia.org/wiki/Hierarchical_and_recursive_que...

burlesona · on May 18, 2021

Very interested in your data store. How do you store, query, and search across documents?

Also, are you adding presentational tables any time soon? :)

jitl · on May 18, 2021

Our source-of-truth data store is Postgres, with a Memcached cache on top.

Most of our queries are "pointer chasing" - we follow a reference from one record in memory to fetch another record from the data store. To optimize recursive pointer-chasing queries, we cache the set of visited pointers in Memcached.

We use Elasticsearch for search features like QuickFind.

> Also, are you adding presentational tables any time soon? :)

Sorry, can't talk about future plans like that :)

RyanGoosling · on May 18, 2021

Why not extend this to create forms/surveys as well in your application? You can render a checkboxInput block type as a <input type="checkbox" /> in a form. Your backend would match up the form POST parameter names with the associated form input names and that would be your "Submission".

DarraghBurke · on May 18, 2021

I'm going to be writing a blog post for my engineering team soon and I'm really impressed by the storytelling and presentation here. How did you plan out your blog? Were there any resources or examples you found useful during the writing process? Thank you!

jitl · on May 18, 2021

I'm glad you enjoyed it! It took a long time to put together. I started out by writing down how the entire system works with enough explanation for a technically-minded non-engineer, and enough detail to satisfy a newly hired infra engineer. That rough draft was about 10,000 words.

From there we started looking for a narrative. We extracted out the sections you see in the final post, and removed a lot of the superfluous technical detail so we didn't end up with technology buzzword soup; for example we cut discussion of Postgres, Memcached, etc etc, how we host the web servers; the kind of details that don't actually matter to the narrative.

The illustrations were in the post from the beginning as Mermaid diagrams (https://mermaid-js.github.io/mermaid-live-editor/). As we got close to publication we polished them up in Figma.

This is really the first engineering blog post we've put out, there was a fair amount of figuring-out-how-to-do-it going on. Now that we've had the experience, we're starting to write up our playbook internally.

mritchie712 · on May 19, 2021

I'd read a follow up with the Postgres part. I did a cmd+f for Postgres guessing that's what you used and was disappointed not to find it.

deadbyte · on May 19, 2021

Would love to read the Director's Cut for more behind-the-scenes on storage, caching, infrastructure!

euroderf · on May 19, 2021

Did you... dogfood it ?

Madeindjs · on May 18, 2021

Really interessing.

This seem to be a really good use case for a NoSQL database. Am I wrong ?

thawab · on May 18, 2021

PostgreSQL has Ltree

http://patshaughnessy.net/2017/12/13/saving-a-tree-in-postgr...

jitl · on May 18, 2021

Ltree is interesting, but if I understand correctly, to move a parent block, I'd also need to update the path column in all the child blocks -- at our scale such write amplification is a non-starter.

RyanGoosling · on May 18, 2021

Yes, you're wrong. You're wrong because you need to JOIN a massive tree of blocks, to form the graph the author is referring to.

You can break out the "block" model into several tables and represent it in a relational database that way.

NoSQL = NO JOIN?

Hope that helps.

setr · on May 18, 2021

I don’t actually see a graph represented anywhere in the article; the author references wanting a graph at the start, but the only thing I’m seeing described are trees of nested blocks. Even the properties list seems to be a grab-bag of KV pairs that gets permanently attached to a block once initialized, to support roundtripping

Which is pretty much the ideal scenario for a document store. The article describes Notion as being very strictly hierarchal

RyanGoosling · on May 18, 2021

A block has many properties. A property has a name, and a value.

The underlying persisted data doesn't necessarily have to be a bag of KV pairs.

A block is related to its parent and descendant blocks.

These relations are suitably represented in a relational database, not a document store.

EDIT: In graph theory, a tree is an undirected, connected and acyclic graph.

setr · on May 19, 2021

A document store is basically optimized for specifically hierarchal data situations — a tree. The data structure you’re describing, and what the article describes, is precisely that: a tree.

When comparing a document store versus a RDBMS, in terms of suitability and appropriateness, the distinction is primarily along the lines of a tree, versus an arbitrary graph (by which I mean that an RDBMS is more powerful, and more general, but not inherently as optimal in either performance, “scalability”, or UX in the places where a document store makes sense.

More specifically, the way the article describes it, you’re not interested in “give me every block of type X” — you’re only interested in “given block Y, what type is it?”.

That is, the question is one-way, and fits cleanly in a hierarchal format of a document store.

The only question posed that operates in the reverse direction is permissions, though even that’s a little odd, since it seems to me it should only go “downwards” as well — a block’s permission scope is the sum of all of its parents, and you can store it there upon iteration.

> The underlying persisted data doesn't necessarily have to be a bag of KV pairs.

It doesn’t have to be... but it can be, and appears to be.

> A block is related to its parent and descendant blocks.

Right; the singular parent, and the multiple children. A tree.

> In graph theory, a tree is an undirected, connected and acyclic graph.

When discussing trees and graphs, I think it’s obvious a distinction is being made between a graph forming a tree, and graph forming a not-tree (more complex than a tree). When I say that a square is easier to encode than a rectangle, I do not mean that a square is not a rectangle, but that a rectangle is not a square — that a square’s more specific properties give us opportunity to simplify/optimize (I only need to store one length to represent it).

A database can encode a tree just fine, but that doesn’t mean it’s the best tool to do so.

There are other properties to a document store I don’t care for, and I don’t like them in general (like the implicit schema, and total lack of data consistency validation by the data store, and the fact that you often don’t truly have a tree), but representing a tree is what’s been described, and it’s exactly what they’re specialized for.

If you want to argue against it, you need to specify why you think this isn’t a tree, because I feel it’s quite obvious it is.

systoll · on May 19, 2021

The behaviour demonstrated at

https://www.notion.so/Tree-breaking-3a90e2bcd2154f4fab06a3c7...

Breaks the tree model, as 'Complete Task' would need to have both 'Subtasks' and the page itself as its direct parent.

That said… it's mostly a tree, and there may be merit to optimising for that access pattern.

ellimilial · on May 18, 2021

@setr explained it really well. A side note, NoSQL also includes graph databases, dedicated to this type of node/relationship traversal.

jitl · on May 18, 2021

We don't use JOIN for the content tree; I don't think I've seen one in any of our queries.

RyanGoosling · on May 18, 2021

What do your queries look like? Are you using an ORM?

jitl · on May 18, 2021

We don't use an ORM. Notion's codebase on the back-end is much more functional than object-oriented, in the sense that we have many more code that looks like `transformTheData(theData, theChangeToMake): ResultingData` than we have classes or methods.

We do lean very heavily on the TypeScript type system and try to make invalid states unrepresentable.

bionhoward · on May 19, 2021

have you tried "data last" FP like `transformTheData(theChangeToMake, theData): ResultingData` instead? I learned this from Ramda.JS, makes it way easier to leverage currying, ex `change = transformTheData(theChangeToMake); change(theData)`