I'm currently building this sort of thing and it's a glorious feeling. Clustering is hard, like "search engines before publishing of the page rank algorithm" hard. Sure there are lots of options, some very good ( I have fond memories of HotBot's advanced search, it was my go to search engine for a few years ) but tractability changed drastically after that paper. Now search engine theory and building a basic search engine from scratch is suitable for students instead of postgraduates. It's fun to work on a bleeding edge.