Your catchword is "share", but you don't want us to share. You want to keep us within your walled gardens. That's why you've been removing RSS links from webpages, hiding them deep on your website, or removed feeds entirely, replacing it with crippled or demented proprietary API. FUCK YOU.
You're not social when you hamper sharing by removing feeds. You're happy to have customers creating content for your ecosystem, but you don't want this content out - a content you do not even own. Google Takeout is just a gimmick. We want our data to flow, we want RSS or Atom feeds.
We want to share with friends, using open protocols: RSS, Atom, XMPP, whatever. Because no one wants to have your service with your applications using your API force-feeding them. Friends must be free to choose whatever software and service they want.
We are rebuilding bridges you have willfully destroyed.
Get your shit together: Put RSS/Atom back in. <<<
Not gonna happen but I fully agree with the sentiment.
Thanks
(spoiler: I'm the author of this text. I'm the author of the first version of RSS-Bridge, which has since been beautifuly expanded by the community).
Unfortunately, the Facebook bridge is very broken and no longer maintained. From the public pages I've tried, it only intermittently works on at most half of them.
I don't blame them for not maintaining it though. I grabbed the code to see if I could get it working and realized Facebook has made their newer pages fiendishly difficult to scrape. Assuming one does put in the effort to make it work, how long will it last till it's broken again?
I wish there was a viable alternative platform for organizations where "public" content is actually accessible publicly.
"... Facebook has made their newer pages fiendishly difficult to scrape."
This has not been my experience. It is still as easy as ever using the mobile sites.
However I suspect one person's notion of "scraping" is not always the same as another's. I prefer to work on the command line, in textmode. Thus when I check Facebook I want text-only, no graphics. First I extract the all the story.php URLs and sort them by date. Then I retrieve the contents stored at these URLS and dump into a text-only format so I can read through comments. If there is something that looks interesting I can save the story URL and check out the photos later on a computer that has graphics layer loaded.
TBH, with Facebook I mainly just check messages and notifications. Saving story URLs in chronological order is useful for me because it makes it easy to go back and find items from the past. But if I were really serious about monitoring a "feed" I can create one myself, better than Facebook's, by retrieving the profile page of each friend and extracting story URLs from their source instead of relying on Facebook's manipulative algorithms that deliberately hide stories and re-order what they do show, non-chronologically, in a way that suits Facebook's interests over the user's.
I do not use any fancy software, just a local TLS proxy, netcat (or equivalent) and ubiquitous base userland UNIX text processing utilities. I use a text-only browser (not lynx) for reading HTML and a pager (less) for reading formatted text.
I use the m.facebook.com or mbasic.facebook.com sites, not www.facebook.com because the site on the www subdomain uses GraphQL instead of hyperlinks. Not to mention the mbasic subdomain has no ads.
The way websites use GraphQL instead of simple hyperlinks is unnecessarily complex, reminiscient of a Rube Goldberg cartoon.
Interesting. I'm surprised to hear that you've been so successful with that technique. My statement was slightly inaccurate. I have no idea how difficult it is to extract the content from the page once you're able to get it. What is more immediately challenging was to get FB to serve you the page at all. See https://github.com/RSS-Bridge/rss-bridge/issues/2047
There were a handful of pages on Facebook that I used to keep tabs on that would have had to have been public as I do not have an account, but now I can't get FB to let me see some of them without a login (for example https://www.facebook.com/MadisonAudubon). I have no idea if my IP as been flagged, which they appear to do aggressively, or the pages were made private or FB introduced some addition settings and the pages are now configured or what.
I use a login. (Because I am primarily checking messages and notifications.) I cannot access that MadisonAudubon page at all without signing up or logging in. Apologies as I should have realised that the discussion was about retrieving "public" profiles/pages without being logged in. I seem to recall a time when that was universally possible but at some point, quite some time ago if I remember correctly, it started to become limited. I would imagine the reason public pages are created on Facebook is to gain access to Facebook users, who are more heavily surveilled than non-users. Public websites like https://madisonaudubon.org could mirror the content they post on Facebook. In practice, this never happens. Calling Facebook a "walled garden" is misleading. It may be "walled" but there's nothing garden-like about it.
In this instance it is to enable use of any TCP client, namely ones that do not support TLS, e.g., original netcat, tcpclient or an early version of a text-only browser. I use a variety of clients and with only one exception I only trust the proxy to make remote TLS connections. If one uses a client with acceptable TLS support, then there is no need for a TLS proxy.
Note I am referring to a local proxy, listening on the loopback device and under the control of the user, not one listening on a network interface connected to the internet and certainly not one operated by a third party as a "service".
shameless plug: see autossl.so[1]. it is not a proxy per se, but an LD_PRELOAD-able lib which upgrades plain socket connections to TLS for applicaitions which does not support TLS themself (or using too old SSL versions).
I have been wondering if, instead of bothering to try to scrape Facebook via DOM, it might be worth pursuing a basic image matching approach to divide a Facebook page up into regular "chunks" that are just saved as images.
I'd personally be happy with an RSS feed of Facebook as just a series of images in a feed, but I guess once you've gone that far you could relatively simply run it through OCR to get most of the text.
Unfortunately the RSS bridge doesn't work and also seems to be unmaintained, this is surprising given it seems to be set as one of the "core" bridges.
Data archival, retention and accessibility is absolutely fundamental and it's unfortunate to know that so many companies are hell bent on stopping individuals from using open-means to access such data (although I am not surprised, and from a rational perspective I can understand their reasons for making it difficult).
> Unfortunately, the Facebook bridge is very broken and no longer maintained.
True. There is no maintainer for FacebookBridge.
As for now, the only way to fix FacebookBridge that I see is:
1. using existing Facebook account
2. fetch pages using Selenium (simply speaking - web browser)
I say instead of maintaining that bridge we ask whoever is still using Facebook to move somewhere else. That would be easier instead of maintaining that bridge because Facebook will actively try to patch any holes in their walled garden.
Back when I showed my tool to generate RSS feeds [0], someone recommended a extensive list of similar tools [1]. In case this doesn't exactly meet your needs (e.g. needing to self-host), you might find a useful alternative there.
[0] Show HN: RSS feeds for arbitrary websites using CSS selectors
Some rss feeds only include a snippet of the entire item text. One nice thing about rss-bridge is that bridges can expand an existing feed to include the entire item content.
In essence the project is a scraper that generates web feeds.
ttrss has (an official?) plugin to do that (fetch full text from title only RSS feeds), and I'm sure other feed readers can do it from the box or via plugins
The official plugin uses a php port of Mozilla's Readability, which is used for Firefox Reader Mode. There is also the 3rd party FeedIron that is more configurable.
Thank you Eugene for your work, I really like these kind of projects for a open web! I have a question: do you have a list of hosted instances, as Invidious has [0], or are you planning to do it in the future? So people like me can use it without opening their own instance?
>>> Dear so-called "social" websites.
Your catchword is "share", but you don't want us to share. You want to keep us within your walled gardens. That's why you've been removing RSS links from webpages, hiding them deep on your website, or removed feeds entirely, replacing it with crippled or demented proprietary API. FUCK YOU.
You're not social when you hamper sharing by removing feeds. You're happy to have customers creating content for your ecosystem, but you don't want this content out - a content you do not even own. Google Takeout is just a gimmick. We want our data to flow, we want RSS or Atom feeds.
We want to share with friends, using open protocols: RSS, Atom, XMPP, whatever. Because no one wants to have your service with your applications using your API force-feeding them. Friends must be free to choose whatever software and service they want.
We are rebuilding bridges you have willfully destroyed.
Get your shit together: Put RSS/Atom back in. <<<
Not gonna happen but I fully agree with the sentiment.