I've also been thinking about processing news with LMs, but from a different angle.
One big complaint I have for reading news through RSS, is that there's no natural hierarchy/priority to the news. There's no front page, no headline, no size in RSS feeds. Given the way news agencies generates those feeds, there are _tons_ of repetition, tiny updates, some insignificant one-liner interview about some significant events. Not to mention the "no update at this point" updates. Entries that are not informative look exactly the same as---but often outnumbers---the entries that are informative.
An ideal news feed processor to me, would be one that reads through last weeks RSS feeds, and merges the all those tiny updates into coherent articles, ranked by the significance of the event. Sort of turning newspaper into a journal.
The merging and reflow should be well-within an LM's capability. However, I'm not sure if OpenAI's API can swallow an entire week's worth of RSS, or produce multiple full-sized articles, but this is something that I'd like to try when I get some free weekends.
A related idea that I’ve had is to present a time-lagged newsfeed and use AI to link to any follow up stories.
“hey, remember how everyone was panicking about the price of eggs a few months ago? Well, prices are normal now but only one person wrote about it so you probably didn’t hear that”
People get the impression that mostly bad things are happening because “this just got much worse” is newsworthy but “this isn’t as bad as it was 5 years ago” isn’t, and improvements tend to happen slowly whereas disasters can happen quickly
I had nearly the same idea as you around surfacing what is "important" vs just a large list of RSS articles.
The main differences compared to what you are thinking are two things. One for the `Significance of the event` I've used the number of publishers talking about that event. So more publishers == more important. Two, I've done this in a daily fashion instead of a weekly report.
I can also confirm that the LM has the capability to do at least a days worth (2500+) of articles. I would doubt its capability to produce an entire article but it does a great job at a small summary.
Are you curating the list of publishers somehow? I would imagine the AP/AFP newswire repost circuit/echo chamber would result in overestimating importance of a lot of crapola and underestimate the importance of investigative pieces for example.
This was an issue when I first started. With minimal sources a lot of time the top collections were low quality SEO articles.
After adding a sufficient amount of sources I've noticed a decent reduction in the echo chamber. Although by ranking importance by the most talked about topic, it is going to have some sort of echo chamber.
Adding left and right leaning publishers for instance has helped. Although one might say something is good and something is bad, the embeddings pick it up as the same topic.
One big complaint I have for reading news through RSS, is that there's no natural hierarchy/priority to the news. There's no front page, no headline, no size in RSS feeds. Given the way news agencies generates those feeds, there are _tons_ of repetition, tiny updates, some insignificant one-liner interview about some significant events. Not to mention the "no update at this point" updates. Entries that are not informative look exactly the same as---but often outnumbers---the entries that are informative.
An ideal news feed processor to me, would be one that reads through last weeks RSS feeds, and merges the all those tiny updates into coherent articles, ranked by the significance of the event. Sort of turning newspaper into a journal.
The merging and reflow should be well-within an LM's capability. However, I'm not sure if OpenAI's API can swallow an entire week's worth of RSS, or produce multiple full-sized articles, but this is something that I'd like to try when I get some free weekends.