Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: I created a news shortening algorithm and am not sure how to utilize it
42 points by superdario on June 17, 2022 | hide | past | favorite | 63 comments
I've been developing my own algorithm for news shortening. Basically, it takes a news article from the site, does some calculations and spits out up to 5 the most important sentences. Demo can be seen here https://excerptdaily.com

I created it because I read news and I hate reading Bible-size articles full of unnecessary information just to find the main point

It has a purpose, it really does solve a problem: * Save people's time * Inform you as fast as possible * Give you the main point of an article in 5 sentences * Save you from clickbait or half clickbait titles

Starting my own news website without any connections or audience doesn't make sense, also I'm bad at marketing. I firmly believe this is a very good solution. I just don't know yet how to utilize it?

Should I offer the power of algorithm to some podcast that have audience and their own news website, should I offer it to someone who wants to build a news website...




Can you articulate why this is different to other text summarising solutions?

It may be that yours is easier to integrate than using AWS APIs[1], or performs better than what's available on say, npm[2]. It may be that your algorithm is designed specifically for news articles.

If you can articulate where this fits into the market of other solutions - that will help inform how best to utilize it.

[1] https://aws.amazon.com/blogs/machine-learning/part-1-set-up-...

[2] https://www.npmjs.com/package/text-summary


Word 2007 had an "AutoSummarize" feature which was later removed.[1] I wonder how well it would hold up today.

[1] https://youtu.be/o30nPCgdq0I?t=100


Why is it different? Hard to tell, I don't know how others work. I was focusing strictly on news articles


If it's hard to tell the difference, it'll be hard to sell the difference. (Cheesy rhyme intentional)

But seriously, you could probably do some research to work out your solution's strengths, relative to existing solutions.

When you know what it's strengths are, try to find people who want those strengths.

Ultimately, you're asking the question 'is there a market for what I've built?' - but you've phrased that question differently.


> If it's hard to tell the difference, it'll be hard to sell the difference

That's going into my quotes list. Rhymes, is short and gets an important point accross. Bravo.


I’m not sure how relevant it is. Dropbox entered into a market with around 10 competitors, and trounced them all. In general it’s a mistake to worry about the competition.


Comparisons to competition are just a way to map out the market, and where you may fit.


Everyone else is focusing on news articles as well, so that's not a problem; however without comparison to other approaches it's impossible to tell if the solution is 'powerful' or even any good at all.

This here - http://nlpprogress.com/english/summarization.html - would be an overview of what algorithms are currently considered 'good' and what results they achieve on some datasets commonly used for evaluation and comparison, it would be interesting to run your solution on that data and see what you get.


Wow awesome, I'll try that


I would try to optimize it for one specific use case, not have a general API that does this. I think it's much easier to sell something more niche.

Find a job that requires reading long text. Let's say something in healthcare where they need to read a lot of journal articles. Now you're not a "summarization API" you're a way to reduce your time spent reading medical journals, time that could be better spent savings lives (your new tagline: "Less reading, more saving lives" -- half kidding). You can also optimize your tool to summarize journal articles which are written in a very specific way. When you sell, you can set yourself apart by being made especially for them.

I wouldn't sell to podcasters or anyone in media because they have no money (just look at how much writers are paid, look at media company valuations...).


Media companies not having money?

Hmm medical journals, it never crossed my mind


Every newsletter that links to articles


Extractive summarisation of news isn't very hard, I guess it doesn't hurt to put it in an API wrapper and have a pay per use model, but don't expect this to sell.

Also FYI, depending on the News outlet the important info is usually at the top - it's maybe the first thing they teach you in Journalism (don't bury the lede). You don't need to read the "Bible-size" article if you read the first paragraph and it's well written.

However, if you did abstractive summarisation instead, that might be more interesting esp for financial news - you might have buyers.


I have reservations about this because in a lot of the news today the actual details, the important ones, are so buried in the body of the body that it probably won't be surfaced given all the fluff.

EDIT: this is to say, garbage in, garbage out. The product you have built is probably great.


It's worth noting that this is such an old problem that traditional news-writing styles incorporate a solution: they require that the author cram the most important info into the first few sentences, then expand on it for the rest of the article.

I'm not sure how well it's followed these days, and I'm sure that few places other than very traditional news publishers enforce it at all. Old archival newspapers (think 19th and early 20th century) tend to be entirely in this style, though.


This is called the "inverted pyramid" structure [0]. It's definitely less common in the era of online media.

[0] https://en.wikipedia.org/wiki/Inverted_pyramid_(journalism)


A good bet is to create an API for it and sell it on Rapidapi or some site, with a possible free tier.

I think a lot of devs would benefit from a good text summarisation algorithm (haven't tested yours just general advice) and since youre good at programming and not marketing make some npm modules, composer packages, gems and the whole shebang.

Soon you'll be making a few $k a month depending on how good and fast your api is. The free tier will help you get some search engine traffic. An on page demo is also very useful in this regard.

Also this has chance of landing some big co with deep pockets who just finds your product a good fit and couldn't be bothered to hire a dev to do this just yet. So make sure you have a $$$ unlimited plan. Good luck.


I was thinking about APIs, but then I'm limited to devs only


Don't underestimate the devs. It's a huge market with less competition (that's why suggested npm too). One of my sites makes very good money for doing something very trivial, something that almost every dev can do if they spent like 5 days on it, yet I have people paying $47 / mo for over 3 years just because I give them a super easy way to do it with an API.

Most devs generally have good budget or power to make purchases under $50 / mo without consulting their manager and also you're mostly dealing with very nice people.

The only issue I've ever faced when dealing with devs (and it could be just my own myopia) is that they are most often perfectionist. So whatever service you provide, better make sure it's the absolute best of the best - which it should be anyways.


Use the technology to identify which articles talk about the same topic. Then create a site where it is possible to get the full picture of an event by easily accessing the different sources.


Do people want this problem solved?

> Yes. Saving time is good. > No, as a news reader I prefer accuracy to speed. As a news reader I care about information coming directly from a source rather than summarised. As a reader I want easy access to a source of information and not have to go through a hoop of using another website/app. As a reader I want to read on mobile/laptop interchangeable and with the same interface. As a reader I want to take in deep knowledge and not summaries. As a reader either I a) read high density articles deeply and do not want a summary or b) read low density articles quickly and do not need a summary.

If so, are there current solutions?

> Yes, see other comments.

If so, are you able to do this cheaper or to a higher quality than the current solutions?

> Find the cost of other services and compare that to the cost you can deliver this solution at scale to actual users. > Find a metric to compare your services accuracy/speed to other peoples solution. > X-axis; quality. Y-axis; cost. Plot all the solutions. Is your cheaper for some level of accuracy than a competitor? That’s promising!

If so, how can you get this packaged to users?

> mobile, web > premium sources and pay walls > reader apps

Just a few stray thoughts. A friend and I worked on something very very similar. Ultimately we stopped as we found that no one really wants to use this and pay for it. It’s “cool engineering” but people like reading just fine. Also tweets exist! You can find summaries easily that are human created and better synthesis.


" A friend and I worked on something very very similar. Ultimately we stopped as we found that no one really wants to use this and pay for it. It’s “cool engineering” but people like reading just fine. " This tells a lot


This sounds like a very useful feature for a personal RSS reader, like the text-equivalent of a thumbnail, so you can decide if the article is worth reading.


Depending on the quality, you could offer it as a tool for lawyers to access all the paperwork they have to process.

> If dispute resolution is the social function of the law, what we have is far from the most efficient way to reach fair or reasonable resolutions. Instead, modern litigation can be understood as a massive, socially unnecessary arms race, wherein lawyers subject each other to torturous amounts of labor just because they can. In older times, the limits of technology and a kind of professionalism created a natural limit to such arms races, but today neither side can stand down, lest it put itself at a competitive disadvantage.

https://www.newyorker.com/news/daily-comment/you-really-dont...

https://news.ycombinator.com/item?id=31787599


The website exactly the way you have it seems pretty great to me. You could add a non intrusive ad for revenue (I personally dislike ads, but what can you do) and then market it to the best of your ability. I think you've already made something really cool here.


That is the problem: to market it


One bad example may not be useful, but FWIW the first blurb I read on your demo got the salient facts of the article badly wrong:

"What Range Anxiety? The Mercedes-Benz EQS 580, Reviewed Mercedes-Benz first gave us a glimpse at its electrification strategy in 33, with its first battery-electric vehiclethe EQC 23 crossovergoing on sale in Europe in 21. Sporting a range of around 2770 miles, 354 km the 402 hp 296 kW SUV never made it over to this side of the Atlantic."

I was quite interested in an EV with a 2770-mile range!


I don't understand what you mean


"Sporting a range of around 2770 miles" is not a valid excerpt.


Uff I need to check what went wrong


actual article: 'Sporting a range of around 220 miles' excerpt is: 'Sporting a range of around 2770 miles'


Maybe it could be useful to summarise YouTube videos. If you can extract the captions and display a summary, it would be great (to summarise tutorials, travel guides, and other videos).

macOS also has a built-in summary feature (in System Preferences > Keyboard > Shortcuts > Services > Summarise), you can use that to summarise news articles in Safari and other apps, but it doesn't work on videos, only text.


It might be interesting to expose it as a browser extension that provides a summary without leaving a site. Many non-news sites link out to news articles as references. Reducing the amount of time reading the references would be nice, but I'm unsure how it'd work in practice. It might be worth experimenting with.


Hmm browser extension... Seems nice


Man, that's awesome. How does it work for recipes? I know there are a few solutions out there for this, but I absolutely cannot stand the 10 pages of narrative and prose before getting to the actual list of ingredients/steps.

I'd be curious to know if you've tested it against recipes.


Also a few months ago demoed here on HN: https://news.ycombinator.com/item?id=29795482


Most recipe sites include a microformat for recipes, so are trivial to strip of the fluff. Lots of recipe apps can do it for you (I use Paprika 3—browse to the page in the app's webview, hit "add recipe", works like a charm 99% of the time, no further effort) and I bet there are browser extensions for it, too.


For recipes. Ufff I don't know, I've never tried. What recipes?


This space already has a few solutions:

https://tldrthis.com/

http://autotldr.io/

It might make sense to look at their monetization strategies.


What's awesome, is that there is always room for another. Common economics that real keeps folks from even trying to enter a space, is because they don't think they can contribute. What's great about that, is you can do exceedingly well just being a B-level player in a field of A's and do the distributed model of B-level effort, but in many spaces, the economics can out-perform A-level effort in one-space.

There is ALWAYS enough room for one more contributor here. Often it's how innovation truly occurs and drives a particular thing further than it otherwise would have gone.

:)


No, what's awesome is that people on HN tell you other people who are doing the thing you want to do, so you can see how they're doing it, how well they're doing it, and if there actually is any room for you, which is in no way a given.


Is one of these the reddit bot? I've seen it frequently on reddit comments and it always seems to be highly upvoted, so it must do a pretty good job. I want to say it's the autotldr_bot but not sure.


Came here to say this. I feel like text autosummarization has been commoditized by now.


That is cool, currently I'm not looking to monetize, I just want to help people, honestly just to contribute


Can we see a demo? Best I've seen in this space is https://smmry.com, but probably a customized GPT-3 model would be even better.


Do people actually want news summarized? It's entertainment, parading as something intellectual. I don't want my Netflix summarized, for example.


> I don't want my Netflix summarized, for example.

There exists an entire YouTube subgenre of videos summarizing feature-length movies in a few minutes.


TIL


Definitely! I generally get summaries in the form of 5-10 minute daily news podcasts though, when reading I like whole articles that go into some depth.


Your algorithm seems similar to the premise of Axios.com All their news is brief (albeit longer than your summaries), and follows a very tight format.


Well look at that, it is very similar


This looks great. Care to share some details of your approach? Opensource some bits of it maybe to attract attention of fellow developers?


Do a comparison website which takes articles about the same topics and shows their main points. it could be useful for spotting obvious biases


Seems like something Feedly would be interested in.


Birth of the buy-out win. :) Yes .. indeed they would.


Nice, thanks


cool but I would like to control my own news feed. Could this just be an API where you zip it a url and it gives you a summary?


Open source it, put it on your resume.


>I created a news shortening algorithm and am not sure how to utilize it

I'd use it to shorten the news


Please contact me herve76@gmail.com I am a developer and I would love to try it.


I don't have an API exposed


Herve here might be your motivation to work on it, as per your initial question :)


Is this funded by Jeff Bezos or is half the news actually negative articles about Elon Musk currently?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: