Show HN: Promnesia – an attempt to fix broken web history

rosstex · on June 28, 2020

As a PhD student, your post reads like a beautiful research paper. Motivation, prior work, contributions, technical details, example use cases, self-references, future work, even a system design chart. You've certainly sold me on the extension, great work!

newman314 · on June 28, 2020

This sounds like it might be close to meeting my use case.

I have bad memory and hence try to write down everything I can. But often throughout a single day/week, I do research on a topic and have a bunch of tabs open that I intend to come back to. Or I read an article that several days later that I cannot recall where I read it at (HN, Twitter, etc.) This usually leads to a frantic search until I can find what I’m looking for as well as having a ton of tabs open.

Manually grouping topics together is too hard. What would be great is a tool that knows where I’ve been, discards bad information (google search result, followed by near immediate close) and some sort of an attempt at topic autoclassficiation (SAP, storage, backup etc.) that gives me the confidence to close tabs knowing that I can get back to a particular topic at a later date.

soulofmischief · on June 28, 2020

Bruh I've got tabs open from years ago. Hundreds across multiple VMs. I have tabs open that I migrated from my last computer. Someone please help.

karlicoss · on June 28, 2020

I've struggled for a while with this kind of overload and ended up with a system that makes it manageable:

1. make is as easy to 'bookmark' stuff as possible -- with a single hotkey

2. make it as easy to search over bookmarks as possible -- also ideally with a single hotkey or as quick as you could do google search

My way of achieving this is using org-mode files for 'bookmarks' [0] and using emacs/ripgrep to search over it [1]. Additional benefit of org-mode is that it's very easy to add notes, priorities, refile bookmarks, so the most interesting stuff propagates through my notes, and I don't feel bad about missing out on information that I don't have time to process because I can always quickly find it when I need.

[0] https://github.com/karlicoss/grasp#readme

[1] https://beepb00p.xyz/pkm-search.html#personal_information

riverlong · on June 29, 2020

I'm curious -- Roam Research appears to have won over a lot of folks with this kind of need recently. You didn't list Roam among the prior art -- do you think it's not really relevant? I can certainly see Roam eventually including cross-platform bookmarking/archiving, etc.

By the way, I think you should also take a look at Archive Box, which is very much in this direction: https://archivebox.io/

karlicoss · on June 29, 2020

You mean, I didn't include it as prior art for Promnesia or Grasp project?

For Promnesia, the goals of Roam and Promnesia are pretty different at the moment (although Roam data can be used with Promnesia, as I mentioned). In addition, I can't personally bet on a closed source tool.

For Grasp, simply because Roam wasn't known (or possibly didn't even exist!) when I wrote it. But even now that I tried Roam now, I don't think I can go back from using plaintext files, it's just so much snappier and more hackable.

Thanks, I used archivebox! Still need to set up a proper automatic archiving, and integration Promnesia with personal web archives is also in my plans!

leephillips · on June 29, 2020

He mentions Roam twice in the article, and links to a demo with interop between Roam and his project.

soulofmischief · on June 29, 2020

But why bookmarks? I recognize that ultimately they are ephemeral. At some point in the next few years I'll get really drunk and just delete them all and be left with the useful bookmarks. I can't go back to bookmark overload... For the really important things, I use Zotero.

dcposch · on June 29, 2020

As the Buddhists say,

attachment is suffering

scrollaway · on June 28, 2020

Hey karlicoss, I'm in love with your writeup.

You should know about Timeliner; Matt Holt's attempt at solving some of the data sourcing / data silo problems. https://github.com/mholt/timeliner

He also points out Perkeep: https://perkeep.org/

Anyway. Data liberation is a huge driver for me. Making it a primary goal of my next app (primarily for bookkeeping/financial data, but I want to allow users to connect to the third party services I integrate with, eg. Uber, Amazon, etc, and be able to download their own data / play with it via an API).

Feel free to email me / tweet me (@Adys) if you ever want to chat.

karlicoss · on June 29, 2020

Thanks! :)

Oh, nice, I bookmarked Timeliner recently, but haven't tried yet. Look promising, I expect this to integrate well in Promnesia, and vice versa, my helper HPI package to integrate easily in Timeliner.

For Perkeep -- I tried it briefly, but haven't exactly understood the problem they are solving. I was planning to try again and writeup my experience with it to spark a discussion!

kemonocode · on June 28, 2020

This sounds like something that would be close to meeting my needs. I, too, end up leaving far too many tabs open and I feel the need to have something in between a bookmark I'll never look at again and may have little context as to why I may have created it to begin with, and a tab just eternally polluting my browser and that might just end up getting sent to OneTab and thus as a "lesser" bookmark. I know Firefox (and probably Chrome as well) lets you leave tags on bookmarks, but these always seem like they're hardly enough. And that's without even mentioning all the different pseudo-bookmarks scattered over many different services!

CWuestefeld · on June 28, 2020

I've been frustrated by many of the same things, and have recently been playing with Memex.

The solution outlined here leaves me with a couple of questions, though.

1. Since there's a local app acting as a service, it's not clear to me how this would run on a mobile device.

2. Once it is running on my mobile device (and home computer, and work computer, and chromebook, and various other machines I use), how do I aggregate all of the data? I'd like to be doing work-related research at home in the evening, and be able to see the fruits of it from the office.

I suspect that the answer to this is the same thing: that rather than a locally-running server, I could put something on my home server or on a cloud-based server, and direct all my various devices to communicate with that rather than localhost?

karlicoss · on June 28, 2020

Someone was asking that before, perhaps I should add to FAQ! https://github.com/karlicoss/promnesia/issues/114

Yep, you could use a VPS or something and host it behind a reverse proxy, that's what I've been doing so far.

Also for mobile specifically, on Android it works under Termux (haven't personally tried yet, but can't see why not, and the person in the issue I linked claims it works).

For data aggregation: it depends on the data source, but the easiest seems to make sure your data ends on a single computer, index it there, and after than you get an sqlite database which you can simply sync with Dropbox/Syncthing or anything else you prefer.

indentit · on June 28, 2020

Nice description of what you're trying to solve - it certainly resonates with me so I plan to try it out!

I've recently started trying Shiori[1] to manage my "bookmarks" and preserving offline copies locally without relying on The Internet Archive, however it still doesn't really help with private content (i.e. Pages only accessible as an authenticated and authorized user) so it'd be great if Promnesia caters for that. Plus the whole data silo thing...

I was a little surprised to see no mention of the "tree style tabs" extension which can help with "where did I get to this link from?" style questions

[1]: https://github.com/go-shiori/shiori

idm · on June 28, 2020

You've convinced me to try it out.

My personal knowledge management project, Gthnk (gthnk.com), would appear to plug in easily as a Source - without any special plugin necessary. I really like what you've made!

spurgu · on June 28, 2020

This might be more suitable as a Github issue but since you're here, I'm simply getting an error using Brave: "ERROR: Failed to fetch" (shown in the extension popup when clicking the eye, which is always red)

Another thing: Have you considered adding annotation capability directly into the extension? This is something I've thought about creating an extension for, since I don't use anything like Instapaper.

karlicoss · on June 28, 2020

Very unlikely I'll be adding support for annotations -- the idea is using the existing tools and integrating with their data. Otherwise I end up reimplementing yet another annotation tool :).

If you're looking for something similar to Instapaper, but local only, your best bet is probably Worldbrain Memex. And as I mentioned in the post, I was thinking of potentially integrating with them tighter anyway.

rosstex · on June 28, 2020

You have to run the local Python server by following the next instructions.

karlicoss · on June 28, 2020

Yep! I guess I should make the error more clear in the extension and point to the readme.

In theory, I could make it defensive too and allow using without the local backend (only with local browser history), but not sure if there is much value in this.

rosstex · on June 28, 2020

I think the aspect of knowing where you browsed to a page from, and visualizing a hierarchy of pages within a site that I've visited, are the most interesting parts for me, and those certainly apply to the browser history alone.

karlicoss · on June 28, 2020

Fair enough! Created an issue https://github.com/karlicoss/promnesia/issues/120

spurgu · on June 28, 2020

Cheers guys!

ybbond · on June 28, 2020

I am following this post too. I meant, from the first time you published this. I am using Worldbrain's Memex 2 and when I see this post reposted, I check the "Memex 2" section.

There is update! Maybe I will look into Promnesia and StorexHub integration next weekend. Thank you for your effort with Promnesia!

contravariant · on June 29, 2020

Regarding the cleaning of URLs, are you aware of the ClearURLs [1] extension? It seems to achieve much of what you're trying to do.

[1]: https://gitlab.com/KevinRoebert/ClearUrls

karlicoss · on June 29, 2020

Oh nice, didn't know of it, thanks! Indeed, looks like there is a lot of opportunity to collaborate with privacy enhancement extensions.

m-localhost · on June 28, 2020

Great write up for a problem I'm thinking about myself a lot (https://marcus-obst.de/wiki/Notetaking)

Thanks also for using the Yak Shaving - for one, I got curious what was first, the term or the Ren & Stimpy episode illustrating the term and second, I found a description of most of my modus operandi.

j88439h84 · on June 28, 2020

Have you thought about using SingleFile/SingleFileZ [1] to download archives of the pages instead of using links to wayback?

[1] https://chrome.google.com/webstore/detail/singlefilez/offkdf...

hansvm · on June 29, 2020

Haha, I love reading other people's code :)

  # TODO fuck. why doesn't that work???

Seriously though, this project looks great. I've been tossing around building something similar for awhile, and frankly I'm glad somebody else did it first (and from the looks of it, probably better)

infogulch · on June 28, 2020

The motivations and analysis of current problems resonates with me deeply, thank you for the writeup!

Perkeep is another project that might be interesting to analyze in this context. https://perkeep.org/

stavros · on June 28, 2020

I've been thinking about this problem a lot myself too, and I'm currently rewriting www.historio.us to attack the problem more efficiently. I've been considering various new features, and this writeup is very useful, thank you.

dpacmittal · on June 28, 2020

This is awesome! I've definitely wanted this for as long as you have. I have this idea noted down exactly as you have described somewhere in evernote. Well done! Looking forward to contributing to it.

an4rchy · on June 28, 2020

This is awesome! I just started using the WorldBrain Memex and was trying to solve the issue of accessing other data sources, so perfect timing -- thanks!

Looking forward to trying it out.

mirimir · on June 29, 2020

It does seem very useful.

And I'm disappointed :( Given the title, I was hoping for a way to fix Google's broken web history. So it goes.

zingermc · on June 28, 2020

Does promnesia server run a local HTTP server? How do you prevent a website from slurping up the entire database?

karlicoss · on June 28, 2020

Yep, it's a local HTTP server by default. It's also possible to expose it via reverse proxy, and you can set basic auth password in the extension's settings.

What do you mean by slurping here? Security-wise, a random website shouldn't be able to query a localhost because of CORS policies.

zingermc · on June 28, 2020

Unfortunately, CORS isn't a magic bullet. Suppose a site named evil.example adds a script tag pointing to http://localhost:1234/promnesia.js and a victim loads evil.example. If your JS updates a DOM element with info from the database, evil.example's JS can read that DOM element and report it back to the server, without violating CORS.

karlicoss · on June 28, 2020

Ah I see, thanks! Good point, and I guess basic auth would protect against such sort of attack. So it seems it makes sense to use a token even if it's running as localhost, I could add an option, so it doesn't require setting up a separate proxy.

Either way, I hope I've been fairly reasonable about security so far, but I've mostly been concentrating on the 'plugging in the data' bit, so it's possible I've overlooked something (also I'm not a security specialist!). There is an open issue in case people have any specific concerns or spot something, happy to receive feedback! https://github.com/karlicoss/promnesia/issues/14

pvg · on June 29, 2020

I think it's becoming clear that the whole 'local web server to do system things for a browser extension' approach is probably too fraught and should be abandoned for better IPC mechanism that browsers support. I don't think this is some 'drop everything and rewrite stuff' thing but it's worth reading up on and planning for.

karlicoss · on June 29, 2020

Yeah, possibly. Chrome actually has something called "native messaging" https://developer.chrome.com/apps/nativeMessaging which seems like a potentially more secure (and faster?) alternative, but I haven't had time to play with it yet.

pvg · on June 29, 2020

Yep, that's one of the things I had in mind when mumbling about 'better IPC'. Safari already only supports that type of model. I think the day is not far when automated scans/app stores/etc start flagging the local http server thing as high risk/potential malware vector. It's an architectural dead-end.

On the other hand, some of the other stuff may not be fully baked:

https://news.ycombinator.com/item?id=23173724

zingermc · on June 28, 2020

Awesome! Unguessable auth is the answer. You could even have the server generate a uuid token and have the user paste it into the browser extension.

zingermc · on June 28, 2020

To follow up: the solution is that the localhost server needs to make sure each API call is authorized (if you aren't already). This means there must be a login/setup step.

An API call can't be considered authorized just because it came from localhost :)

karlicoss · on June 28, 2020

Thanks! Created an issue https://github.com/karlicoss/promnesia/issues/115

pkamb · on June 29, 2020

How about just an option for new tabs to retain a “Back” history to their parent.

karlicoss · on June 29, 2020

Chrome actually keeps it in the database. However it only works within a single browser and breaks as soon as you're leaving for a native app, note in your todo list, etc. So I feel like correlating timestamps is the way to go here, simple enough and agnostic of specific implementations.

pkamb · on June 29, 2020

I'm talking mostly about normal tabs in normal desktop browsers. When you open a new tab, it should keep the "Back" history of its parent.

Safari for iOS, too, has a feature where you can temporarily go "Back" from new tabs. But like you said it breaks if you do anything else. Kind of a hack for mobile convenience rather than a true feature.

owenshen24 · on June 28, 2020

Justifications are very well-reasoned; a good read in and of itself.

mongojunction · on June 28, 2020

Well done. The write up really being together some concepts and creates some clarity on things I've been feeling about for a while.

Is author aware of my history based fully interactive offline archiver? https://github.com/dosyago/22120

karlicoss · on June 28, 2020

Author here, thanks!

Haven't seen your tool in particular, thanks for the link, I'll check it out. I only used https://github.com/pirate/ArchiveBox before, but haven't set up an automatic archival pipeline (yet)!

Also, integrating with local web archives is on my Promnesia todolist! I expect them to be very useful for indirect history retrieval, e.g. "I haven't visited that page, but it's within one link". Having local web archives makes it possible to implement such functionality in efficient way.

mongojunction · on June 29, 2020

You have a really interesting way of thinking about all this stuff and have synthesized alot of different ideas, that I believe point to a future for the web. very cool to come across your work. Do you have a blog?

karlicoss · on June 29, 2020

My blog is literally the link I posted ;)

Perhaps this page https://beepb00p.xyz/blog-graph.html would be a good start if you want to explore

gpm · on June 28, 2020

Do you know of a reason that this can't work with firefox, or is it probably just a matter of someone putting in the work?

mongojunction · on June 29, 2020

I think because it's based on the DevTools protocol which is only partly supported (I think) in Moz.

Although you're right with enough work someone could engineer a way to achieve the same, even without DevTools protocol. I picked the protocol because it made it easy to achieve.

I think in future FF plans to support DevTools, or the standardised version which I think is called WebDriver protocol or something.

m0zg · on June 28, 2020

Also, since the various web archives are getting shut down soon, it'd be great if such extensions could locally and securely preserve pages much like an archival crawler does it, or better yet create a distributed archive that's impossible to shut down or censor. Better yet still if there's local, language-aware index over such pages so that I could search them easily, without Google deciding what I should and should not see.