Hacker News new | past | comments | ask | show | jobs | submit login
Using sqlite3 as a notekeeping document graph (epilys.github.io)
303 points by harporoeder on July 3, 2021 | hide | past | favorite | 41 comments



This ranks anywhere from "nerdy humble brag" to "absurdly impractical" on usage, but what an innovative way to get extremely familiar using sqlite3. I like it.

Sick of people reading your journal? Do your worst, MOM.


Thank you :) It was just a demo on how I found sqlite's indexing useful for something I thought would require an extension or external tools. I don't keep notes myself, I was working with full text search because I was indexing my PDFs. I hope it's useful for people that do keep notes, however.


It's very cool, I enjoyed reading the post describing it!

Personally I use Emacs, org-mode, and org-roam for Zettel-style links between documents. Org-roam uses sqlite to store its links and index, but documents remain in plain text .org files.


Indeed, thank for the post!

+1 on org-roam, and may I add org-roam-server, which visualises that graph in the browser very nicely, in an interactive way: <https://github.com/org-roam/org-roam-server>


No, thank you.

I had no idea indexing could be used this way. Very cool find.


I run a self hosted notes application[1] on raspberry pi at home. I have been thinking about moving from .md files to sql. Both seem to have their own pros/cons. With SQLLite, I get easy search, tagging while with .md files I get easy editing and viewing by mapping networked drive.

[1]: https://github.com/quaintdev/pinotes


I suggest having both.

My TILs site runs uses a GitHub repository where the notes live in markdown: https://github.com/simonw/til

Plus a build script running in a GitHub actions workflow that compiles the notes into a SQLite file using my markdown-to-sqlite tool and publishes the resulting SQLite file using Datasette to https://til.simonwillison.net - which gives me search and an Atom feed and suchlike.

The site has custom templates so it Durant look like regular Datasette, but you can run custom queries against it at https://til.simonwillison.net/tils


this is very interesting. Would you still use this architecture if u were building something from scratch today ?

I can see the markdown format being powerful. But how do you parse it and create a document graph ? (or rather...what do u use to persist the document graph)

I have been trying to use Firebase on a side project of mine for this markdown -> graph problem


I don’t see the difficulty here — with SQLite, you’d just store the set of links as an adjacency list; for the major job of linking things, you don’t even need to traverse the graph — you only need to know 1-level of relationships to generate the links. But when you do traverse, you can use recursive WITH.

Presumably for each build, you’d parse the names of each doc to produce the list of nodes (and their paths), and then as you encounter the markdown reference, update it accordingly.

If you start afresh each time, handling updates would be trivial, but you’re at risk of destroying existing URLs when changing doc titles/reorganizing — so you probably want a canonical name (to generate the final URL for), and the symbolic names (to refer by in markdown).

And then eventually you realize you want multiple ways to organize your docs, so you start tagging, and then eventually you realize you want ways to reference groups of notes, so you’ll go ahead and implement hierarchal tagging, and you’ll finally have be able to treat your graph as a graph, instead of a forest (list of trees)

And finally you can simplify the whole thing to only reference tags (or parent tags), which may happen to link to a single doc, or multiple. Probably in the special case of 1 doc, the link directly takes you to the doc instead of some collection page.

A document store seems like a terrible idea, because you’re really not modeling a tree, and that’s probably the majority of your modeling problem — I imagine you’re trying to currently stuff a graph into a tree.


I've been exploring this technique more over the past year and I really like it - https://datasette.io (code at https://github.com/simonw/datasette.io ) is a more recent and much more complicated example.

Extracting links from markdown and using them to populate some additional columns or tables at build time would be pretty straight forward.


Try Fossil, the version control system by the SQLote author which is built on SQLite and includes features like blogging, message board, bug tracker, files upload, and markdown formatting.


Could elasticsearch be a no compromise solution for this case? Or maybe meilisearch as a lightweight alternative


Elasticsearch is probably not needed, but a full-text index based search engine - plain vanilla Java Lucene or CLucene - could be very useful, and indeed be more appropriate given that the notes are unstructured text (I actually don't see the need for SQL unless you need to perform very complex operations on the structured meta-data).

The advantage is an easy-to-learn search syntax that is flexible and similar to Google: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html In addition to a document field for your notes, you can manage and query any number of meta-data fields e.g.

  Tom M* birthday:July
retrieves all notes that mention Tom with a last name starting with "M" as long as the notes also have a meta-data field called birthday where the value must be July. (Try compare doing this with the SQL equivalent version of the query!)


HN loves notes applications and I always click through and am almost always disappointed. Am I unusual here in that my notes are not all text? If you look in my notebooks (digital or physical) there are sketches and images on almost every page.


> Am I unusual here in that my notes are not all text?

Maybe unusual but not completely alone.

I use a mix but none are complete:

- Notes on iPad lets me start (or continue) a note by tapping the Pencil on the locked screen. Brilliant. Hard to link notes though.

- Pencil planner is totally brilliant It integrates calendar and lets me write on top of it and in a close to magic way it shows day notes in week view and the other way around. It is easy to use but not dumbed down. Lacks photos for now though but I can live with that for now.

- Joplin. For anything that doesn't contain handwritten or images.


Definitely no, I wrote this article and I don't keep notes. I use a skill cheap Huion drawing tablet to write and sketch on xournalpp: recommended for a good Foss experience.

by the way, bibliothecula markdown works with embedding image links. You add them as a file on the note document and attach it's UUID in the markdown text.


@epilys Could you please tell me which Huion Tablet do you use to take notes? I have wacom tablet. But it does not have a screen, so I am unable to use it as I am unable to coordinate my eyes and hand as I can't see what is getting written on the tablet (what is written shows up in the laptop). Any tips?


Huion H950P, no screen like yours. How much have you tried getting used to it? It felt weird at first like you say but now it feels quite natural like writing on paper.


Same here. I use OneNote and it does everything I need in that regard, and you can link sections to each other if needed.


I suppose you could always paste in a base64 image URL and make a UI for it?


Here is the direct link to the Bibliothecula project (it might be just me but it was not so easy to find): https://github.com/epilys/bibliothecula#tooling


FTS is great but it's a solution that's designed to work mostly with the English language. For other languages, with accents and so on, you would probably want to normalise the FTS data first by removing the accents.

For Chinese, Arabic and so on you'll need to a custom tokenizer which may or may not be available on your target platform.


How far behind do other big languages (Chinese, Arabic, Spanish, Hindi) lag behind English when it comes to full text search?


The problem here is unicode normalisation, a standard procedure of replacing equivalent glyphs (the "character units" a user sees) until only a specific set of graphemes (unicode character points split into groups that form glyphs) remain. you can do this with libicu before you index text or send text for a query to sqlite3. There's also an official ICU extension for sqlite3 that does this.

Other than that, there's also tokenizing (splitting text into words) that's also unicode defined and stemming (reducing tokens to a base stem like "likes"->"lik-" in English)


I believe there are tokenizers for Chinese language but haven't tried them since they aren't available by default in Android or iOS for example. In our app we've ended up having two modes - one for alphabetical languages and one for Thai, Chinese, Hebrew and Arabic, but in this case it's not actually using FTS but a plain SQL query.


Slightly off-topic: where/when does it make sense to "spend" the 5-6 bits in a version 4 (random) UUID to indicate which version and variant it is?


According to the RFC, even v4 requires the version bits set. Is it worth it? Depends on what you want. For stuff like the original post, even an AUTOINCREMENT integer primary key or even a title/slug text key would do just fine. For interfacing with external stuff that expects UUIDs it's mandatory for compliance. (I hope I didn't confuse many people by mentioning UUIDs)


Suppose I want a universally unique identifier, which can be generated in a decentralized manner and there is a very small chance that someone else chooses to use the same number in the same system, resulting in a collision. I decide to use a pseudorandom generator for this purpose, picking numbers out of a sufficiently large space, where the chances of an undesired coincidence are precisely dialed in.

What good rises out of including extra paperwork with each number, essentially saying "attention please, a random number", or "this is a timestamp, a monotonic counter and a MAC address", like if it was a newfangled smartphone? The RFC also says that there is no mechanism for validating a UUID, save for checking if its timestamp part is in the future, for versions that employ such a portion. Why can't any 128 bit value be accepted on the receiving end?


I suppose it further reduces the chance of a collision. If you're in a totally random space, there's always a chance someone will generate a collision if you wait long enough. But if you follow the timestamp and MAC address rules, then for a collision to happen it basically needs to be the same computer hitting the same number twice on the RNG within one second, which is less likely.

But, since that only holds true if people follow the rules, it's worth standardizing on either v1 or v4, and so it's good to be able to distinguish between the two for validation purposes.


Wow, thanks for documenting the idea, this could be a nice alternative to my tiddlywiki markdown setup. I could do all kinds of queries on my knowledge base. Still I could use grep as well for document counts and lists. It's neat that it's all in one compressed sqlite file - but I can tar ball my markdown directory as well. The only benefit I see is that I can use sql to query my documents and it seems like a step up from org-mode tables by using database tables.

It would be nice if there was cgi script that could serve a sorted and paged index with links to the html, md or gmi to a web or/and gemini browser.


O wow, I didn't notice there's FUSE file system.


No write support but should be easy :) it doesn't handle filename conflicts too. But searching and reading files works great.


Bear Notes, the popular note taking app for Mac, stores data in sqlite for many of the same reasons the author mentions here.

Bear organizes with "live folders" based on tags added to the document, powered by sqlite.


btw you can use Etherpad for this too, bonus points is that it has real-time collabo and a bunch of other powerful tools! :)


This is massive overkill for notes. Simple grep or org mode will do. You don’t need an indexing and querying engine at this scale.


I certainly know what you mean, but this article is a tech demo and not a recommendation for a note workflow.


IMO https://roamresearch.com/ offers a more practical way of maintaining notes using graphs.


This looks like a proprietary clone of TiddlyWiki[1] with a slower, clunkier interface.

[1] https://tiddlywiki.com/


I love TiddlyWiki. It’s the most useful software I’ve found in the recent years.


Ummm I've been personally using Roam for over an year. The interface is neither slow nor clunky IMO (maybe Tiddly is faster, but I don't want to solve problems I don't have). I have a graph with > 1000 nodes that I use to organize my reading. Not sure where the hate is coming from.


A good open source alternative is logseq: https://logseq.github.io/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: