I wonder how this would work with more and more sites behind Cloudflare and stuff. Websites really don't want this since in today's economy, wasting a human's time is paramount (they call it "engagement"). A computer, even working on behalf of a human is not enough.
Note that this is only needed if the website has no RSS feed whatsoever. If the website has a partial/truncated RSS feed that only contains headlines/partial text, you can use the "Article CSS selector on original website" feature.
That will retrieve the list of items using RSS but will fetch the article content by getting the URL and grabbing the HTML element you specified (like "article.post").
As my use case for RSS is "seeing a blurb for the 99% of things I don't wish to read, once and only once", instead of a "partial/truncated RSS feed" that only contains headlines/partial text I'd call that an "unbloated feed".
I tried to dig it out of the PHP source but without a local checkout it was non-obvious: I wonder if any such "synthetic rss feed" system honors the ETag and/or Last-Modified and/or cache headers of the target page, or if every rss feed refresh unconditionally loads the upstream page only to throw away 90% of the generated html
And that's not even getting into the raging tire fire that is akamai / cloudflare / whatever anti-bot technologies. I did see support for http proxies, but it wasn't clear if that was something one could set on a per-xpath-feed basis or whether the whole system had to run under one proxy (potentially $$$)
Proxies won't help - most of the aforementioned providers now do TCP, TLS and browser fingerprinting and use a heuristic approach. You need to be able to provide a consistent fingerprint to all of those to pass.
Proxies are actually pretty useless, you need to go one layer lower to be able to fake those fingerprints, what you need is a VPN instead (by VPN I mean an IP-level tunnel, not a public VPN provider - those IPs are already blacklisted and often are so for good reason).
XPath still rules at web scraping though often CSS selectors are better. With CSS selectors it's much harder for the user to shoot themselves in the foot with their selector design and most web devs already know them. So, really it's best to mix both: CSS for most and fallback to XPath for more complex operations.
For example, the xpath selector used in the article `//li[@class="blog__post-preview"]` would break if one more class is added which happens very often in the real world while CSS selector`li.blog__post-preview` wouldn't. (The correct xpath here would be `//li[contains(@class,"blog__post-preview")]` or even more accurately `//li[contains(concat(" ", normalize-space(@class), " "), " blog__post-preview ")]` - yeah it's realy ugly).
Either way both CSS and XPath selectors are actually really easy to learn and would be great if more tools adopted these methods like FreshRSS did! I made some interactive cheatsheets with all of the edge cases if anyone is interested in all of the web scraping parsing weirdness :) https://scrapfly.io/blog/css-selector-cheatsheet/ and https://scrapfly.io/blog/xpath-cheatsheet/