Past threads. Others? *How the Soviets Put Out Oil Well Fires by Using Nuclear B...

simonebrunozzi · on Sept 22, 2021

Hey dang, I always find these mentions to past threads particularly useful.

Should it become a small feature to add to HN? Perhaps a sticky comment on top of the others?

aasasd · on Sept 22, 2021

There's already the ‘past’ link.

dang · on Sept 22, 2021

It will turn into a feature eventually, yes!

yitchelle · on Sept 22, 2021

While it might slow down the site, is it possible for a list of previous threads to added by a bot?

dang · on Sept 22, 2021

In principle yes, but it's good to have someone look over the previous thread to make sure it's interesting. Also, there are often previous related threads which don't have similar titles or URLs and yet relate to the same story—hard for a bot to find those.

I think maybe coming up with an initial list mechanically and then having a way for community members to curate it may be the sweet spot.

dredmorbius · on Sept 22, 2021

"Statistically improbable phrases" are one way to branch such searchs.

So, the bot looks for the URL and title matches, then looks for tuples within those sets (2--3 word chains seems to be a sweet spot), and which of those seem to cluster on those aprticular articles and comments, but not a tremendous number of others.

"Operation Trojan Shield" would be a good match for the An0m sting being discussed elsewhere. "The FBI" or "First Amendment", though not highly prevalent, are still sufficiently used elsewhere that they probably would not be.

Someone would have to keep tally of the tuples, though.

(Tiptoe Through the Tuples...)

dang · on Sept 22, 2021

We've worked on things like this in the past without success. Even just saving the html of the web pages that get submitted to HN is a nontrivial problem, and extracting text from them for similarity searches even more so. If people wanted to work on this as an open-source thing, we'd be open to supporting it somehow, but it'll be quite a while before we get back to working on the problem at this level ourselves. I think relying on the community to co-curate related-links lists (and duplicates) is likely a better and easier strategy.