I made something like this since I was tired of the asymmetric nature of data collection that happens on the Internet. Still not where I would like to be, but it's been really nice being able to treat my browsing history as any old log that I can query over. Tools like dogsheep are nice, but they tend to rely on data being allowed to be removed from the platform. This bypasses those limits by just doing it on the client.
This lets me create dashboards to see usage for certain topics. For example, I have a "Dev Browser" which tracks the latest sites I've visited that are related to development topics [1]. I similarly have a few for all the online reading I do. One for blogs, one for fanfiction, and one for webfiction in general.
I've talked about my first iteration before on here [2].
My second iteration ended up with a userscript which sends the data on the sites I visit to a Vector instance (no affiliation; [3]). Vector is in there because for certain sites (ie. those behind draconian Cloudflare configuration), I want to save a local copy of the site. So Vector can pop that field save it to a local minio instance and at the same time push the rest of the record to something like Grafana Loki and Postgres while being very fast.
I've started looking into a third iteration utilizing MITMproxy. It helps a lot with saving local copies since it's happening outside of the browser, so I don't feel the hitch when a page is inordinately heavy for whatever reason. It also is very nice that it'd work with all browsers just by setting a proxy which means I could set it up for my phone both as a normal proxy or as a wireguard "transparent" proxy. Only need to set up certificates for it work.
This lets me create dashboards to see usage for certain topics. For example, I have a "Dev Browser" which tracks the latest sites I've visited that are related to development topics [1]. I similarly have a few for all the online reading I do. One for blogs, one for fanfiction, and one for webfiction in general.
I've talked about my first iteration before on here [2].
My second iteration ended up with a userscript which sends the data on the sites I visit to a Vector instance (no affiliation; [3]). Vector is in there because for certain sites (ie. those behind draconian Cloudflare configuration), I want to save a local copy of the site. So Vector can pop that field save it to a local minio instance and at the same time push the rest of the record to something like Grafana Loki and Postgres while being very fast.
I've started looking into a third iteration utilizing MITMproxy. It helps a lot with saving local copies since it's happening outside of the browser, so I don't feel the hitch when a page is inordinately heavy for whatever reason. It also is very nice that it'd work with all browsers just by setting a proxy which means I could set it up for my phone both as a normal proxy or as a wireguard "transparent" proxy. Only need to set up certificates for it work.
---
[1] https://raw.githubusercontent.com/zamu-flowerpot/zamu-flower... [2] https://news.ycombinator.com/item?id=31429221 [3] http://vector.dev