Hacker News new | past | comments | ask | show | jobs | submit login

I bet language models could definitely help here, yeah. Perhaps something like (1) get content offeed items as they come in, (2) embed the content, (3) use those embeddings to group items. Probably not that difficult to be honest



Yep. That would be a classic sort of k-means problem. Just throw them all into a standard embedding, like the OA API embeddings, run k-means from sci-kit, then convert them into a list-of-lists: one RSS item (containing a list of title-URLs) per cluster.


The problem with this approach is determining the what k is for the k-means. But again, we could use the “elbow” technique to determine what’s the optimal k and then start grouping them together. I wonder if there are any automatic sophisticated clustering algorithms?


Hierarchical and DBSCAN don’t require upfront knowledge about the number of clusters.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: