Show HN: My recommendation engine for Hacker News

samwho · on June 20, 2023

Aww, thank you for using my Memory Allocation post as the placeholder text. <3

FieryTransition · on June 20, 2023

I often wish I could sort Hacker News into two categories. Actual software/tech/STEM and everything else. I think both are interesting, but often, the niche tech stuff gets drowned out fast. So this is great for that :-)

julien040 · on June 19, 2023

I just released a new update, thanks to everyone's feedback. Now, you can sort results by relevancy, age, or score using the select.

moritzwarhier · on June 19, 2023

This is a joy to use and also ot fits very nicely with the other highly ranked post by Nielsen group. Kudos!

febed · on June 20, 2023

Which other post? There is so much churn on HN that it’s hard to know which post you are referring to

SkyArrow · on June 20, 2023

Likely https://news.ycombinator.com/item?id=36394569

moritzwarhier · on June 20, 2023

Yes that was it, sorry, should have included the link

dpe82 · on June 19, 2023

This is great. I often come across some HN post on a topic I am interested in and then want to go look at other posts in the same topic cluster to expand my exposure. This looks awesome for that.

I don't know if it would be useful or even work, but is it possible to let the user adjust the vector distance threshold and then apply the other sorting parameters to the results? Eg. if I want to go broader, but then sort by high score or something so I see popular posts within an expanded (but still relevant) cluster?

lettergram · on June 19, 2023

Checkout https://askhn.ai

The content is ranked by how people discuss the topics and who discusses them

If you just do embeddings on posts you might miss relevant content. When people who have knowledge of AMD discuss intel and believe that content is relevant to AMD, the content will be ranked

julien040 · on June 19, 2023

I thought about an algorithm with weight adjustable by the user. Now, the API returns a field with the distance between the post and the query (the square of the Euclidean distance). It's used by the interface to rank results by relevance.

Perhaps I can compute a score for each story, where each field has a weight and rank the results using this score. For example, the score could be 0.2 x score + 0.1 x comments + 1/distance - timestamp/ 10^9. The stories with the highest rank would be shown first, and the weight (0.2, 0.1, 10^9) could be adjusted by the user, as some might prefer recency while others prefer popularity.

juliusgeo · on June 20, 2023

It might be useful to pose this problem in terms of a precision vs. recall curve.

benzible · on June 19, 2023

Hmm I tried searching "elixir" and found nothing related to the language. HN Algolia gives me exactly what I want. On what basis do you say it's "not helpful"?

julien040 · on June 19, 2023

Yes the search doesn't work very well for one word. Try to input an url about elixir like this: https://hn-recommend.julienc.me/?q=https%3A%2F%2Fnews.ycombi...

I may have used the incorrect term. HN Algolia is effective for searching for a particular story. However, I am unable to utilize it to find related posts on the same topic that do not contain the same words.

Solvency · on June 20, 2023

Out of curiosity related to the word vectorization algorithm...why does one word not perform as well? Whats the cause/rationale?

julien040 · on June 22, 2023

It's pure speculation, but articles embeddings are computed using 512 tokens, which is roughly equivalent to 400 words. I think that using only one word does not allow the model to fully understand the context.

danvayn · on June 19, 2023

hey Julien. I love the product but the search doesn't seem to be doing the best for me. For example, I looked up Tailwind and got plenty of results but none of them actually involved Tailwind.

Maybe a tagging solution is the way? if you determine a set amount of popular keywords for a topic and filter around those, you can offer more relevant results. With some sort of public tagging system you can also have SEO friendly pages around tags and get people browsing stuff they wouldn't normally search for.

julien040 · on June 19, 2023

At first, the website concept focused on getting posts similar to a URL. Querying with text didn't yield relevant results.

Your solution appears better suited for this use case. Thank you.

wseqyrku · on June 20, 2023

What I really need for HN (and any other news feed for that matter) is something like "google discover" i.e. a content-based recommendation system with some sort of feedback mechanism.

So I would get relevant information to me (I can skip, visit, like, dislike) whether or not it's popular. That last point is important because HN home page doesn't give you that, and most of posts could get lost in oblivion just because the first few folks did not find it interesting.

akomtu · on June 19, 2023

HN needs a simple feature: a weekly digest view that shows the top 30 most commented posts (it should completely ignore flags and votes).

ColinWright · on June 19, 2023

You mean like the one that's emailed to me every week?

https://hackernewsletter.com

balder1991 · on June 20, 2023

Thanks, I was considering something like this as I used ITTT to send me weekly top threads from certain subreddits, but now with Reddit going south…

dxbydt · on June 19, 2023

Pls sort by recency. Otherwise you see 13 year old articles most of them obsolete/irrelevant to the current situation.

julien040 · on June 19, 2023

By sorting by recency, I was worried I would get less revelant results. Perhaps I should add a thresold to not have too old posts

julien040 · on June 19, 2023

You can now sort by recency. I hope this helps.

dxbydt · on June 19, 2023

Very fast turnaround. Kudos! Works very nicely now.

Try this: https://hn-recommend.julienc.me/?q=sf%20crime With the Newest filter vs the Oldest filter. ( btw the default Relevance filter gives only tangentially relevant results for this query. Whereas the Newest & Oldest are on point. )

RileyJames · on June 21, 2023

Love it.

This response is very reactive heavy, where as it’s elixir I’m more interested in.

But well done on the execution. It does exactly what it states.

I’ve bookmarked.

I often search HN for additional articles and discussions based on something I’ve just read. Next time I’ll use this tool.

fewald_net · on June 19, 2023

Great project. I learned about the faiss library. Out of curiousity, did you also try it with doc2vec?

julien040 · on June 19, 2023

I didn't try Doc2Vec. I wanted a hosted solution because I wouldn't have been able to compute all this locally (more than 100,000 posts).

If you tried it, did you have great results with? I may use it in future projects.

fewald_net · on June 20, 2023

Yes, I am using it on a not so small dataset (roughly 1 million docs) and the output is a fairly efficient model. I am using gensim with pre-trained word vectors. New docs can be inferred via .infer_vector().

Overall my approach is less automated than what I have seen in your codebase so it’s likely a bigger investment. I am happy to share more.

julien040 · on June 20, 2023

It's very interesting. I may try it in the future.

jimmySixDOF · on June 20, 2023

The blog post link on GitHub was a nice walk through of your method and I was interested in what you think the hit rate was for getting successful text for embeddings from TFA links. 100K is a good sized corpus but wondering how many got skipped due to paywalls or 404 links or any other problems ?

julien040 · on June 20, 2023

Thank you for reading it.

The hit rate is low. I've only tried to get embeddings for stories with a score greater than 100. SQL Query "SELECT count(*) FROM story WHERE score > 100;" gives me 155,228 stories and the corpus size is 108,477 stories.

108,477/ 155,228 = 0,6988236658

The main problems were 404 links and posts that weren't articles (such as tweets).

sogen · on June 19, 2023

A comment about search results: "design system" is related to design, "system design" relates to computing

It seems search takes the two inputs as the same.

Also, search doesn't seem to work when using just 1 word.

julien040 · on June 19, 2023

Yes it's an issue. Sadly, I can't fix it. I'm using the closed source "text-embedding-ada-002" model from OpenAI.

As I can see, the longer the input, the more accurate the results. Perhaps you can try something longer, like "What is a design system for UI?"

sogen · on June 28, 2023

Yes, adding context helps.

Thanks!

sukki07 · on June 20, 2023

This is amazing, thank you for this. Makes finding stuff a lot easier

swyx · on June 19, 2023

i like the idea of this but wont remember it because my muscle memory is tuned to news.ycombinator.com. perhaps i can recommend a chrome extension instead of a website?

julien040 · on June 19, 2023

Thank you for suggesting this.

The API is already made and can be found at https://github.com/julien040/hn-recommendation-api. I don't think it would be too difficult to build a Chrome extension that fetches it.

TechBro8615 · on June 19, 2023

An iOS share widget would be cool too. Since you support putting the input text in the URL, then maybe someone can make a Workflow for it and share it here.

julien040 · on June 19, 2023

Are iOS share widgets using Apple Shortcuts? I wish to learn more about this technology, so it would be a pleasure to try building it.

TechBro8615 · on June 19, 2023

Yeah, my mistake, the app is called "Shortcuts" now. I get confused because it was an app called Workflow that was acquired by Apple [0].

You can use the app itself to make some surprisingly powerful shortcuts, and then share them in some kind of text based serialized form (don't remember the details). I'm sure there are also ways to make them programmatically, but I doubt it would be necessary for this use case.

Seems like you basically want to extract the URL of the HN page, store it in a variable and then append that variable to the URL of your recommendation engine. There are probably more fancy variants of "extract text" that you could use, too - I'm not sure of the details.

[0] https://en.wikipedia.org/wiki/Shortcuts_(app)

julien040 · on June 19, 2023

Here is the v1 of the shortcut: https://www.icloud.com/shortcuts/a7e9d236b35342c5aed1d022801...

For now, it only pushes the shared URL to the recommendation engine. If I have more time, I'll try to find a way to extract the URL from the HN page.

TechBro8615 · on June 19, 2023

Wow, very fast turnaround time!! I just added it and it works :) Nice job!

2h · on June 20, 2023

This URL fails

https://hn-recommend.julienc.me/?q=Go

julien040 · on June 20, 2023

Oops, on the API side, there is a check to ensure the text is long enough (5 characters), but I forgot to add this check client-side. Thank you for pointing out the issue.

Try this https://hn-recommend.julienc.me/?q=Golang if you want stories related to Go.

Edit: add link

rjrobben · on June 21, 2023

i didn't expect the embeddings have such simple yet useful application, thanks!

passion__desire · on June 19, 2023

One feature I would like for an Recommender Systems to have is : explicit ability to jump in and out of filter bubbles or research rabbit holes. Another example would be, put yourself in the shoes of another, e.g. what content is liked by game developers generally. apart from general gamedev content, what do they like, where do they take inspiration from, etc.

I remember there was a project built on instagram which allowed a person to view instagram as it looked like to a particular celebrity.

julien040 · on June 19, 2023

I'm a bit divided on this feature. On one hand, I would like to have this feature; it would be awesome to see the recommendation of people from different jobs. On the other hand, I'm a bit concerned about privacy. The system must ensure that each group is big enough to avoid the leak of someone's recommendations. I don't want anyone to know exactly what I'm liking and what I'm watching.

If I recall correctly, myCANAL (the French Netflix) used to have a similar feature. You could access the recommendations of personalities of the channel, but it was curated manually.

4hEn · on June 19, 2023

I search for a url I know was posted and it doesn't show it. It shows unrelated articles.

julien040 · on June 19, 2023

The data is a few weeks old. Do you know when the URL was published?

4hEn · on June 19, 2023

It's 10 years old.

This search query https://hn-recommend.julienc.me/?q=paul%20graham returns articles that are missing both words of the query

julien040 · on June 19, 2023

The website features only stories with a score greater than 100 but I don't think that is the problem.

Unlike HN Algolia, it doesn't match words; it uses embeddings so stories are matched by their similar meaning rather than similar words. To find it, you might try to be more specific, such as "Paul Graham Y Combinator <facts of the article>". I'm sorry HN Recommend doesn't match your use case

rounakdatta · on June 19, 2023

Nit:

> Resources to learn about distributed systems

I thought Murat Buffalo's blog would come up at the top. That's a gold, and I'm confident that it was shared on HN as well (maybe a year or two back).

Otherwise neat and useful!

balder1991 · on June 20, 2023

The layout is currently buggy on Firefox.

julien040 · on June 20, 2023

Hi, are you talking about a problem like this one? https://cln.sh/MFG3DPZn+

balder1991 · on June 20, 2023

Yeah, when there’s no thumbnail.

julien040 · on June 23, 2023

It's fixed now. Thank you for reporting it.

lfkdev · on June 19, 2023

A time filter is needed