Ah! But crontab.guru is such a beautiful UI/UX! I'd argue it offers the user everything it needs to perform what it needs to perform.
I will painfully admit I hassle with going from PT/PDT --> UTC more than I should, but after a while it was pretty easy to commit a simple number of hours (+7/8, depending on daylight savings timing) to memory.
I know of someone that set their cron server to a weird time zone (not representative of the office or computer's physical location) so that when someone told them "it needs to run at 9:00", they could punch that into the crontab and the results would be ready before 9:00, even if there was maybe a transient error or something.
The time between midnight UTC and 9 AM Pacific is a magical, horrible window during which billions of dollars of servers throughout the world heat up producing mundane reports.
Yes! Unfortunately it is no longer mainstream since the monetization opportunities are fewer. But a lot of tech-savvy people prefer it for many reasons:
1. You get a personalized view of what you have and have not read.
2. You can scan over a lot of posts very quickly, and pick out what you want to read.
3. You can aggregate a lot of different websites in a single place. No need to visit each website individually.
4. Increased privacy.
5. Less tracking.
6. Increased control.
7. Fewer ads.
Probably more reasons, but these are the primary reasons why I still prefer RSS.
> What if we don’t have control over the frequency of requests (e.g. using a service like Feedly)? Do those happen often enough that we’d need to host the app ourselves?
I know the wording is a bit vague, and I know that most services don't let you customize this. I added it after I suddenly started receiving tons of traffic caused by, I suspect, a single user. This person was purposefully fetching feeds multiple times a minute.
Anyhow, if you aren't actively trying to abuse the service, you should be good. Some RSS readers have "boost" features to fetch feeds more frequently (often a paid feature).
Once I am able to add some good caching, then I may be able to remove that notice. But right now, the service is kinda overloaded and that is why some of the services (Twitter and Instagram in particular) may give you errors at the moment.
RSSBox used to have Facebook support (but only for public pages, no personal content), but when Facebook started cordoning off their API two years ago, I had to turn it off since I was unable to get my application approved. The code is still there, but I am doubtful it would work even if you manage to get an API key that works. I think the best option may be to scrape the web content now, unfortunately.
I have assumed for a while the only way to convert FB -> RSS would be to scrape the home page, but from what I recall the HTML & DOM is all kinds of messed up - intentionally obfuscated to prevent adblocking. From a quick look just now it does seem like it would be a nightmare to try to parse it as-is - and I would guess FB changes a lot of the output regularly anyway to defeat adblockers, making efforts to keep up pretty challenging.
It almost sounds like a problem best solved with OCR, rather than scraping per se. Build a simple model to recognize “posts” from screenshots, and output the rectangular viewport regions of their inner content; then build some GIS-like layered 2D interval tree of all the DOM regions, such that you could ask Puppeteer et al to filter for every DOM node with visibility overlap with that viewport region; extract every single Unicode grapheme-cluster within those nodes separately, annotated with its viewport XY position; and finally, use the same kind of model that lets PDF readers you highlight “text” (i.e. arbitrary bags of absolute-positioned graphemes) in PDFs, to “un-render” the DOM nodes’ bag of positioned graphemes back into a stream of space/line/paragraph-segmented text.
You are correct in that it is somewhat starved of resources. The free Heroku instance that I host is running on the free Heroku dyno (512 MB RAM). I do not have a good caching solution currently, which is why Twitter and Instagram are almost always returning errors now. I suspect a single person is responsible for most of the issues (see GitHub issue #38). It's actually amazing how well it runs considering how much traffic is thrown at it.
At some point I hope to get enough time to implement a caching solution, which should hopefully resolve most of these issues.
Instagram used to have an open API, but that is closed down now. The app is currently using some private-ish endpoints, but they are ratelimited. I need to add caching. More people have started using my app recently, and I have not had time to add caching yet.