Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't understand how you compute that estimate.

I doubt you store the history of all searches ever? People don't need a google account to query the engine, others disable history, etc.

Are you saying you still have all searches ever made ever? Because you would need this to say a query hasn't been made before wouldn't you?



Why would you not store every search ever? It's only a few petabytes, and you can find out all sorts of useful info from it.


I don't know how they did it but I suspect that it wouldn't be very hard to model the distribution by sampling a few million queries and extrapolate from that.


You'd only need to store the list of unique searches, but even if that's true and the 15% number is true, that must be a huge amount of data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: