Hacker News new | past | comments | ask | show | jobs | submit login

Search engines would benefit enormously from using data from social bookmarking sites. If you haven't lately used Delicious to search, I highly recommend it. The breadth of search isn't as far-reaching, but all of the results have essentially been pre-filtered by however many users have bookmarked it -- it's unlikely that many people would bookmark a spammy site or site with crap content.

I think Google results could vastly be improved if they tied together their normal rankings with a "how many people have bookmarked this URL (and to some extent, URLs from this domain)" metric. I've worked closely with Delicious' set of data, and it's nothing short of incredible. Billions of instances of people categorizing sites and vouching for their quality are going unused in the search space.




If black hat seos started to believe Google was paying specific attention to Delicious bookmarks, Delicious would be flooded with fake accounts faster than you can say "link farm".

History of search:

1. New company releases new search engine, better than the competition because it pays attention to a previously ignored signal.

2. Seos work out how to game the signal.

3. New search engine is now as crappy as the search engines it replaced.

Really, internet search is like macroeconomics: soon after you understand {how to prevent recessions, how to find the best webpages}, the problem vanishes and is replaced by something even more complex and incomprehensible. Furthermore, this has already happened.


This has interesting implications, I think, in the business of search.

It's almost like a negative network effect. The more people that use a search engine, the more people try to game it, and the less useful it becomes for finding what you want. This seems like it'd make things much easier for new competitors like duck-duck-go to bring a superior product... at least until they gained enough market share to be worth SEO time gaming.


You're quite right. But pretty much every strategy a search engine uses is susceptible to black hat SEO. I think there's something to be said about having the system use how others are reacting to your comments rather than just what your content is and who links to it. Yes, you could farm out Delicious accounts and have them bookmark your site, but from a SE perspective it's easier to remedy this problem (Delicious requiring captchas periodically, looking for accounts which have an uncanny amount of similar bookmarks in a span of time, etc) than it is to come up with strange rules about content that the black hat is free to change.

Judging by the votes, apparently it is more valuable to point out why a certain idea might not work, even if all of the alternatives have the same problem, than it is to suggest something with exciting possibilities. Nothing against you, personally -- your comment is quite right and I agree gaming is a major concern.


There has been a bunch of research on use of delicious tags in particular. See for instance: http://heymann.stanford.edu/improvewebsearch.html

If I recall correctly, Yahoo tested using this data a few years ago and found the signals not to be as useful as others they used at the time.

Of course, Bing and Google have been working to include more social signals in rankings: http://searchengineland.com/what-social-signals-do-google-bi...


Really cool study, thanks for sharing. I don't think the research proves that social media wouldn't be a worthwhile signal. They really don't look into how bookmarking data could be applied to results, they just look at how bookmarked sites relate to search results and the web at large. The strongest conclusion is that Delicious users only bookmark a sliver of the web's content, and while the researchers considers this a weakness. I would consider it a strength.

The way I see it, search engines are entirely inclusive, while bookmarking sites are selectively inclusive. For certain ultra-specific queries, I'd much prefer the search engine approach -- I want as much breadth as possible. For other not-so-specific queries, I'd much prefer the result sites to be vouched for by people. Looking one step further, probably for a decent chunked size of my search queries, I'm OK if the number of sites it is searching over is only about 10,000,000 (my estimate for how many sites on Delicious have been bookmarked by more than 20 users).

What I find exciting is that the two can be effectively combined. Take all the normal results that a search engine would show, but give it a boost based on the log of how many times it's been bookmarked. It really is that simple.


The problem is that normal humans don't USE social bookmarking sites. Nor do they tweet/share much of value on facebook. Google is VERY good at serving up great results for things that you could imagine the "linkerati" might link to. Start drifting into travel, home remodeling, etc., and it gets pretty grim.


This is a type of sentience analysis. Google could also give better weights for links that are associated with things like "I love Ebay", rather than "[example.com] is a fraud!", but they don't yet (AFAIK).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: