I work on a project which helps you produce RSS feeds from web pages that don't offer their own using the page URL as input and simple selectors to identify the web page elements to be used in the feed.
Here's what it can produce for the Reuters World News mobile page:
The downside is that certain changes to the HTML structure (e.g site renaming/removing class attribute values used as selectors) could cause the feeds to break.
A major problem with Reuters' RSS feeds while they lasted is that Reuters pushes new URLs with updates to existing stories and kills or redirects the previous URLs. So for a major developing story you'd see the same article in your feed 5+ times, since the feed was just a dumb push of every URL added to whatever category you were subscribed to. Still better than nothing, I guess.
The issue with this solution is that there doesn't seem to be any way to specialize it at all. I was subscribed to Reuters' politics feed [1] specifically, and I got other types of news from other sources. But I don't see any way to do that with this method. The articles unfortunately do not have the category in the URL.
Printing each story exactly once is tough. The RSS feed serial number thing never worked. Sites with multiple RSS servers and a load balancer would return a different serial number. I ended up taking the MD5 of the title and description fields, with HTML markup deleted, and discarding new feed items with a duplicated MD5.
Artem, do you know if there is a parameter for ordering the RSS results by time for the Google News RSS results? This is very helpful.
Finally, I pay $2000+ annually for a competitor to NewsCatcherAPI. It may be worth connecting. I signed up for a trial earlier and the two issues for me would be (i) the range and depth of publications and (ii) not being about to track mentions / references in the body of the article.
I may not be your target audience but one of your competitors is pulling in 10m articles with the full article content per day. I use the API for timely alerts for PR monitoring -- my priority is that I pick up mentions of companies / individuals wherever they happen quickly.
Your pricing is a lot more competitive. I just wonder whether you are looking to move in the direction of range and depth in the future, or whether you're targeting a different market segment.
Obviously you could build an aggregator with an API that gives you a stream of news. People have been building niche news aggregators on Wordpress for almost two decades by plugging in some RSS feed urls.
It's just the kind of obvious lame idea that stops me from seeing more compelling usages of this sort of cool API.
Short answer: seems like OK as long as you do not resell the full body text.
Long answer: Google does the same thing everyday. Each country has its own laws. Usually, news are in a special category that is less protected with copyrights.
I worked for a company doing this in the early 2000s. Perfectly legal to save website text for private use (a search for example). It doesn't need to be news either.
I was worried they had removed it completely. I'm using this for a simple web page which shows RSS feeds from multiple news sources across the political spectrum, and the Reuters feeds have been the essential pivot for me. It is the only source I significantly trust to be neutral in these trying times.
Thank you so incredibly much for this very simple solution. I was worried I'd have to spend many hours on some complicated fix. I'm glad I no longer have to (unless Google kills of their news RSS).
I have an antique Teletype set up to print news from the Reuters news feed, and it's stopped working because of this. So I tried this new approach via Google. All you get is the RSS feed titles, not the content. The "description" is just a link to content elsewhere, with the title as link text. The real Reuters RSS feed had a few sentences of copy for each story, roughly what radio stations would read.
Associated Press seems to have dropped their RSS feeds too, or hidden them well.
CHINA SLAMS TRUMP OVER UIGHUR LAW AMID BOLTON ACCUSATIONS
(JUNE 18TH, 7:16 PM)
A NEW LAW AIMED AT PUNISHING CHINESE OFFICIALS INVOLVED IN MASS
INTERNMENTS OF UIGHURS AND OTHER MINORITIES IN XINJIANG CAME AS
JOHN BOLTON ACCUSED PRESIDENT TRUMP OF SUPPORTING BEIJING?S
CRACKDOWN.
NYT's RSS feed may not be a good source if you want updates through the day. I left the program running and the RSS feed hasn't changed in hours. Maybe it changes all at once when the next edition comes out.
I read RSS feeds with an app on Android called Aggregator. Reuters feeds included a short description of the story inside the feed, but Google News doesn't have them. The descriptions in the feed allowed me to precisely filter and label the entries based on keywords.
Anyway, I also use the Google News RSS trick described in the article, as replacement for now. Not sure how long it will last, however.
Grey area, lived in it for a while and made some money there, but at the end of the day it depends on how the original creator feels about your derivative work - or you personally :)
Yep. It works for most news sites, it's just filtering the recent news from Google News based off the URL. Oddly, it doesn't work for CNN. Never understood why. Maybe "cnn" is a stop word to this search engine and ignored. Dunno.
And, of course, it works today. It's a Google product, so enjoy it while it lasts.
> Did it work? Consider subscribing to my newsletter to get more useful content like that. It’s free: (...) I am a co-founder of NewsCatcherAPI — ultra-fast API to find news articles by any topic, country, language, website, or keyword. ...
It looks like an unfortunately automatically placed ad, making me look for a continuaton of the actual content (a direct answer to the question) that never came.
Here's what it can produce for the Reuters World News mobile page:
https://createfeed.fivefilters.org/index.php?url=https%3A%2F...
Here's what it can do for the main site:
http://createfeed.fivefilters.org/index.php?url=https%3A%2F%...
The downside is that certain changes to the HTML structure (e.g site renaming/removing class attribute values used as selectors) could cause the feeds to break.