Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Co-founder here, thanks for posting!

Let me add some context:

Many people add "Reddit" to their search queries to find authentic product reviews. We fine-tuned a BERT model to extract product mentions from over 4 million Reddit comments and posts with Named Entity Recognition (NER). The result is a list of the most popular products across many subreddits.

No platform (including Reddit) is resistant to fake reviews and spam, but we think it's happening less frequently here for various reasons:

- Redditors and other forum members are more interested in boosting their ego by showing their depth of knowledge on the topic (and correcting others on the topic), whereas corporate websites are more interested in raking profit by displaying (potentially) dishonest information.

- Enthusiasts in subreddits are pretty good at spotting dishonest or fake content, which results in immediate downvotes. The whole karma system helps with trustworthiness.

- Most subs are moderated well and spam gets removed quite quickly

That being said, good fake reviews are technically almost impossible to detect, even with sophisticated network analysis of the reviewer's profile.

Looria is still in Beta and we're working on improving our classification, summarization, and sentiment analysis. Let me know what you think!



I should be happy that you have automated a process that I manually perform frequently, but instead it gives me anxiety.

The set of incentives means the more successful you are, the more you will directly be attacked in the form of bad faith posts. Reddit is not better because it is a better system or because it has better moderation. Reddit is better because there isn't an entire industry (SEO) devoted to corrupting Reddit posts yet. Even google hasn't done a stellar job of winning that arms race, and they have vastly more resources.

You add value by performing the manual labor of finding and "reading" through posts, but the core value is generated by Reddit's abuse prevention team. If you were to be acquired by Reddit, my trust would go to zero. If you took outside investment, my trust would go to zero. My trust is already a little low because wirecutter specifically avoided items they couldn't get affiliate links for and now they are completely corrupted by new ownership. Wirecutter was the last review site that I used to make a purchase decision.

More than anything, I am worried that the dark mirror of this website, the "reddit optimization" industry is probably just as advanced and validated by efforts like this.


Goodhart's Law in action. Any "product quality" metric will be gamed to the point of uselessness as soon as it gains traction. In other words, services like this one can only work as long as nobody knows about them.

The only stable solution is for each person to curate and maintain their own set of sources, so that there are no high-value metrics for marketers to target. Exactly the opposite of what this service is trying to do.


This was literally my first thought. If this works at all, it will quickly be ruined.


You are obviously correct that the more value this tool has, the more work bad actors will do to take advantage of that, reducing the quality of the result. However, I do think Reddit is more resistant to it than some other platforms due to the large amount of distributed work done by the participants. Assuming that a given subreddit is moderated in such a way that the vast majority of participants are individuals interested in high quality products then the Reddit voting system does help to keep other stuff out.


I'm going to be honest...

I hate that it is in my interest for this to never become popular. It's a situation where my faith in the listed products is inversely proportional to the popularity of your site.


If the site becomes influential, brands will be motivated to pollute the source data, ruining both the usefulness of Reddit for reviews and Reddit communities as a whole.


There is also the fact that as you become more influential the value of your influence increases proportionately. The offers for "selling out" just keep growing bigger as you grow more popular. It also becomes easier to convince yourself that you're not really "selling out" if you just take money for this or that. And, like any business, of course you do need to monetize somewhere.

It's just an unfortunate alignment that goes against the core thesis of the (great!) idea.


Aren't reddit reviews already influential, isn't that why the site exists in the first place? If you're right, then it follows that this is already probably happening :)


that was the first thing I thought of. I know the owner means well but if reddit becomes a bigger target that it already is for mass marketers then the value of the reviews there become even less important as marketers dilute the reviews and it becomes a spiral into uselessness. Maybe it could be mitigated by adding things like "commenter history" and give commenters that have been around for a while with more comments a heavier weight that news accounts that seem to only post in one or two subreddits a couple of times and then disappear. Maybe somethings like

weight ~= max(num_months_existencenum_commentscomment_score, 0 account not active in the past year) or somesuch, obviously adjusted.


Looria's ranking algorithm has no effect on what happens to the actual quality of content for Reddit users. Volunteer moderators cannot be expected to keep up with paid bot programmers. Paying objective moderators is essentially impossible, and the whole business model is built on harvesting review data for free anyway


Maybe you can join a niche subreddit that does the same exact thing. It needs more users: https://www.reddit.com/r/shouldibuythisproduct/


Reminds me of the old fivestar.io - which aggregated Amazon reviews back when they were somewhat reliable. It would bucket them by price category (highest rated products in the $100-$250 category)

Back when searching "best [product] 2014" was actually a way to find good stuff on the internet - fivestar was able to generate those answers on-the-fly. I'd find I'd spend an hour or so researching something like a router, plug "router" into fivestar, and my pick would be at the top of the list. Those days are gone for the same concerns others have raised for your site. fivestar worked until reviews started to get gamed.

This is an awesome website and I see myself using it. I hope it doesn't meet the same fate!


Seems Amazon is your go-to when I drill into a specific product. It's clear to anyone though that an "REI" tent is not going to show up on Amazon (or a "Specialized" bicycle as another example). Maybe you can broaden your outbound link pool and use an algorithm to determine where to send the click.


> We're running sentiment analysis to identify the emotional tone behind the mentions. Handling multiple mentions in a single sentence and filtering out things like questions that shouldn't count as an opinion requires some effort. In addition, a minimum sample size is required to get statistically relevant results.

That's great that you are taking that into account, but I'm not totally sure how to interpret the value bars. So I presume a high value means lots of positive engagement. But how does a lot of negative engagement show up? I think it's valuable to know when there is a lot of chatter about something and it's mostly negative. Or when it's controversial.

Maybe something like a statistical distribution graph or weighted color gradient, so you could tell at a glance the density/quality/depth of discussion, and also the distribution of sentiment.


Great project! can't wait until /r/thinkpad is supported an we find out which model is their favourite!


I bet it’s T420 or T490


Agree. And while I don't agree that the T420 is a worthwhile recommendation, the reddit product recommendation groupthink on that item hasn't changed in nearly 10 years.


Funny, the demographics who post/comment on Reddit are just about the last people I want product advice from.

If there was a "salty old mfer posting on a forum in 2010" search I'd be much more inclined to use that. Google used to do ok for finding those but the SEO arms race killed it.


Doesn't work with all subreddits, returns error 500: Internal Server Error for an unknown subreddit: https://looria.com/reddit/eink/products


We don't cover all subreddits yet, but we'll improve the handling for unknown subreddits. Thanks for reporting!


> extract product mentions from over 4 million Reddit comments

How far back are you searching? I'm noticing products on the top of a lot of these lists that were popular 10 years ago.


There is an FAQ on each page with a question "What's the time frame for this analysis?", and the answer is posts and comments from within the last two years


As someone who has been doing this for over a decade with Twitter data in a specific niche (https://reviewsignal.com - for web hosting) it's interesting to see this getting so much attention. I played with Reddit data years ago as well but never moved forward with it. The volume really wasn't there for my needs. Why the volume mattered is exactly the reason you are kind of touching on, fake reviews. You need an overwhelmingly large sample size to drown out fake reviews.

I am also curious what sort of spam filtering mechanisms you have in place? Just the spam filters before content ever hit sentiment analysis or relevancy analysis was 98% in my data. I imagine Reddit is better than Twitter, but there is still is going to be spam. What measures do you have in place and do you determine them?

Do you take upvotes into account with weighting reviews? That was/is a concern I had when working with reddit data. I used retweets as a proxy for sorting popularity, but not any other weighting.

I'm definitely interested in this segment as I've been doing it for a long time, if you want to talk please reach out my contact is in my profile.


Can you please add CarAV, budgetaudiophile, and headphones to your list of subreddits?


r/buyitforlife too please!


Awesome initiative! Are you accounting for unpopular products currently? I expect unpopular products might also see a lot of comments.


With no ill assumption and no aggression intended: what is your business model and idea of giving back to the communities you are distilling the value from? And how much did you think about the incentives this would place on spammer and SEO artists to pollute your data source?


The way we use it in our apps is to take a pool of identified experts for a search query, augment it with relevant news from other sites about the topic, sprinkle a little bit of quantitative KPI data over it, calculate various NLP scores, and then shake it well and voila, you get a nice aggregate of socially validated data. Works well and it reduces the noise significantly.


Would be awesome feature if you could input your reddit handle and it would automatically filter to those subreddits that you subscribe to, rather than the full list.


Love it! Pasting my top level comment here: This looks really neat! Do you define "Products" anywhere? It would be cool to have categories like 'Most discussed cars|phones|computers|movies|websites|etc'. The nerd that I am would love 'Most discussed programming languages on *'

P.S. Are you raising money from angels? Drop me a note if you are. breck7@gmail.com. I really need something like this!


Interesting, probably going to use it in the future.

Feedback for fine tuning your analysis: Cyberpunk 2077 is probably on top of PC gaming for the wrong reasons


I think it's neat. There's lots of negative feedback on here, which I would encourage you not to get down about. As always on the net, folks are probably far more likely to leave a message if they have some objection or disagree with some aspect, rather than if they just plain like it.

So yeah, I like it.


Cool product, could use some work. My first search revealed:

https://imgur.com/d0a8Qcc.jpg

The picture shows it does have armrests though. It would be great if you could have some button there to report incorrect information for each product.


I am amused that the top-recommended "bug killing gear" is... a video game.

https://looria.com/search?category.subcategory=Bug-Killing+G...


Most discussed makes it sound like you highlight polarizing products, not necessarily just good ones. You may want to highlight your sentiment analysis on the front page. It took me some digging to find it.


Fun product, I can see myself using it.

A feature request - I’d like to be able to adjust the time frame. Things like basketball shoes move a lot year to year, and it can be hard to find ones from 18 months ago.


This includes subreddits which are run and moderated by corporate accounts like /r/outlier.


Please add a firearms & ammo section :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: