Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

tl;dr: Yes they do, and most of it is worthless.

There's data, and there's data. I distinctly remember that Google used to have an "Instant feed" on Google News (I may remember the name wrong), where it showed relevant results from Twitter in real-time. They don't do that anymore, because Twitter no longer allows Google access to the real-time data. Google now has to crawl for it, which takes much longer and completely destroys any semblance of immediacy. They had to take the feature down (or, in any case, I can't find it after about 15 minutes of searching).

Facebook has always hid as much data as they could. Searching for "site:facebook.com" in Google does, as you say, return about 7.5 billion results. Glancing through the first few pages, a majority seem to be from various organizations whose profiles are left completely open to the public.

I'm not contesting that Google has some data from Twitter and Facebook (though on rereading my post, it does sound like I am saying that). Rather, I am saying that Google no longer has accurate, up-to-date information on a majority of profiles from Twitter and Facebook.

If a majority of the data from a site is not complete, then Google has no choice but to penalize those pages in its results, because it has no way of guaranteeing that those results are relevant. It used to be able to do this with Twitter, but one side or the other let that deal fall through (I don't know which side is most responsible, but I heard, from here and other places, that Twitter was the primary culprit). It never had an information sharing deal with Facebook.

What Twitter and Facebook are getting pissy about is that Google is refusing to send Twitter and Facebook pages to the top of the results page. They want this done, no questions asked. Google cannot in good conscience do this, because they do not have enough data to make accurate relevancy predictions. Google also cannot use Twitter and Facebook data in the Search Plus Your World results, because they have absolutely no access to Twitter and Facebook's social graphs.

So, yes. Google has incomplete and outdated data from several billion pages from each site. It does not have enough data to guarantee relevancy, and so chooses to penalize those pages it does have, thus making Twitter and Facebook upset.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: