Hacker News new | past | comments | ask | show | jobs | submit login
Urs Hölzle on Google's first data center (plus.google.com)
170 points by cramforce on Feb 5, 2014 | hide | past | favorite | 39 comments



The interesting quote here is:

You'll see a second line for bandwidth, that was a special deal for crawl bandwidth. Larry had convinced the sales person that they should give it to us for "cheap" because it's all incoming traffic, which didn't require any extra bandwidth for them because Exodus traffic was primarily outbound.

This shows that there's more to being a successful business than just good code; you need to know how to run a business, too. How many people working on web search ranking would ask for a special pricing exception from their bandwidth provider and get it? The answer is, apparently, one.


> How many people working on web search ranking would ask for a special pricing exception from their bandwidth provider and get it? The answer is, apparently, one.

Anyone who knows how peering and bandwidth pricing works.

Basically, bandwidth prices amongst ISPs is a game of chicken and relative strength and perceived benefits; it's one of those areas where being good at talking and having sufficient traffic can make the difference between you paying consumer rates and the other party paying you to get you to host hardware with them in some circumstances.

(E.g. consider if you run a CDN - for smaller ISPs that pay high rates for bandwidth, it may save them a lot of money to get a pop in their datacentres to reduce their bandwidth usage; and in some circumstances it may even pay for them to pay to get a big bandwidth hog to serve traffic destined for certain other ISPs on their network, to "tip the scales" in peering negotiations)


Actually it is a pretty typical arrangement, provided you can deliver on your commitment to a certain ratio of inbound/outbound.


Maybe now it is but back in 1998?


Yes.


Here's another anecdote (also sourced from Urs Hölzle) about how Google transferred 9TB of data to their new east coast datacentre via fibre, for free.

http://www.dodgycoder.net/2013/02/googles-fiber-leeching-cap...


really


I have heard that Microsoft's hardware cost to serve 1000 searches is > $1 while Google's equivalent cost is around $0.07. If true, this is definitely a competitive advantage.


Both figures seem unlikely, given that Google served ~2 trillion searches last year and did not spend $151 billion doing it. (The entire company spent on the order of $20 billion in 2013, and that's for everything.)

http://www.statisticbrain.com/google-searches/ https://investor.google.com/earnings/2013/Q4_google_earnings...


The grandparent mentioned $.07 per 1000 searches, not per search. I think that brings the figure down to something pretty reasonable. ($151 million.)


$140 million on search? I haven't a clue if that is high or low. Seems low if the company's total expenses were $20 billion.


The distinction here is that's for hardware cost only. I imagine salary costs are much higher.


Ah I knew I was missing something there. Thanks.


Estimates of "# of searches Google serves" are always fascinating to me. I don't see statisticbrain.com publishing a source for their estimate, although maybe I'm just missing something.

AFAIK, Google doesn't really publish a precise number. They often say things like "Over a billion searches a day". But then they keep that published estimate the same for years during which growth is obviously happening. At some point, they'll publish a step function upgrade to the estimate.

Of the top of my head 2 trillion searches sounds reasonable but I think there is a lot of fudge factor in that guess.


Where did you get your $151 billion figure from?

2 trillion searches at 0.07 cents per thousand is 2 * 10^12 / 10^3 * 0.07 = $140,000,000.


I think that's 2,161,530,000,000 searches * $0.07/search (he mistook it for $0.07 per search)

On the other hand, your post mistakenly reads 0.07 cents per thousand, though you did the calculation correctly.


Math is hard, let's go shopping for data centers!


Given the phrase cost to serve 1000 searches are you sure I was mistaken?


He's saying that it is 0.07 dollars per 100 searches instead of 0.07 cents (which would be 0.0007 dollars).


This would be more aptly titled: "Urs Hölzle on Google's first data center"


Hypothetical: If AWS was available at the time Google was just starting, would it have made sense for them to go with AWS?


Unequivocally no, because it would have been too expensive, probably at least 100x the cost.

AWS is an order of magnitude more expensive than owning your own hardware, and Google was leveraging economies of scale to operate much cheaper than people that owned their own hardware (e.g. fault tolerant software vs. "enterprise grade" hardware)

People didn't realize it at the time, but "search" wasn't Google's only core competency. It was also "warehouse scale computing".

Of course Inktomi under Eric Brewer was another search company that realized that search is a perfect application for clusters of commodity x86 hardware, or networks of workstations (which AFAIK came from academic projects like http://now.cs.berkeley.edu/ )

(Don't outsource core functions: http://www.joelonsoftware.com/articles/fog0000000007.html )


How early did Google start doing "warehouse scale computing"? I was always under the impression they started doing that as a byproduct of their large size.


Obviously it took them a while to get to warehouse scale, but I think they were thinking efficiently from the beginning, like using Legos and cheap racks to hold naked motherboards. http://infolab.stanford.edu/pub/voy/museum/pictures/display/... http://commons.wikimedia.org/wiki/File:Google%E2%80%99s_Firs...


Here's a photo set of the initial server rack, showing the use of 1/8'' cork sheets to separate circuit boards from contacting each other:

http://www.flickr.com/photos/nationalmuseumofamericanhistory...


In terms of the software, pretty much since the beginning. Even if Google only had 20 machines at first, the difference in mindset and software architecture is to treat each machine as expendable. The code has to be written very differently to account for that, and I think that was done starting fairly early.

In terms of the hardware, I guess it was a few years, whenever Google started building its own servers, building its own data centers, etc.

There are a lot of challenges as you scale, but it's safe to say that it wouldn't have made sense at any point to outsource/use AWS. New ground was being broken.


Definitely not. The google product is not search or gmail or any of that stuff. The google product is dirt cheap computing. Everything else google does is just an attempt to exploit that competitive advantage. Running it on anyone else's stuff would erase the advantage.

Even in the collocation facilities google was developing this skill for example by installing their infamous corkboard servers with the drives attached by Velcro.


I meant the question more in the sense of starting off with AWS and moving to their own servers like it is popular for most startups to do nowadays. Though you make a good point about gaining the skills for a robust infrastructure early on being important.


It rarely makes economical sense to start with AWS. Very few people have bursty enough traffic for it. Most sites I've dealt with don't even have a day/night cycle that's pronounced enough to justify spinning up servers just for the daily peak until they're huge.

A lot do have a delusional expectation that building on AWS to handle that expected 10x overnight growth that they dream of justifies it, but that "just" justifies good caching and the ability to spin up frontends on AWS or similar if neeeded. Which ironically makes you likely to spend less on your dedicated hosting, as if your setup is ready to spin up cloud servers when the load goes too high, you can afford to get much closer to the wire before you add more dedicated hardware for your base load.

For most people, renting a dedicated server (there's a huge number of providers that charge month to month, with no commitment) will come out far cheaper for anything they use more than ~8 hours a day on average.

To me, if a startup puts everything on AWS, it's a sign they have poor cost control.


From experience at RBI I had one tool that ran on aws in hindsight it woudl have been cheaper to go with linode and that was a single small instance.

I noticed at silcon mikabout that prity much all the start ups where using aws - I suspect that 90% of them would struggle if they had to set up their own colo.


No, for the same reason it still doesn't make sense to go with AWS. Basically you need a large number of cores that are intimately integrated amongst a large quantity of storage. It was the fact that all the machines could have a bunch of drives on them for "free" (low marginal cost). By interspersing the data and the compute you get maximum bandwidth to local assets.


What classes of applications benefit from that level of integration between processing and storage?

It strikes me that both AWS and various PaaS providers (especially Heroku) encourage strong separatioin between the application and its storage. Applications are supposed to treat all persistent storage as network-attached, while the VMs or containers running the appliication are ephemeral. I guess you're saying this is untenable for some kinds of application.


Isn't that how EC2 works?


If I understand correctly, the problem is that persistent storage (either S3 or EBS) is network-attached, and you pay by the gigabyte and by the number of requests or IOPS.


Well, AWS explicitly doesn't charge for incoming bandwidth (vs. the "special" deal they got on the 15Mbps line item) which was the majority of their costs. It may have actually made sense, depending on their computing needs.


No, the already small margins would have been consumed in a heartbeat.


They weren't profitable at all for several years (until AdWords) were they?


By small margins, do you mean at the time or currently?


At the time obviously. And by small, I suppose I mean negative haha.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: