Given the introduction, I thought we were going to see a fantastic breakdown of cost estimates; capex and opex expenses, 95% percentile bandwidth pricing vs. Amazon's fixed rate per-byte pricing.
A real analysis would have even dug into the question of sysadmins vs engineering staff, and the potential for "the cloud" to allow programmers to actually program infrastructure, eliminating internal sysadmin staff completely (and outsourcing all the remaining sysadmin work to Amazon).
Instead, we got bizarre platitudes about how they "love computers". How boring. Early in my career, I spent a lot of time in the hot/cold, blaring fan world of data centers -- sometimes late at night, when a hard drive failed or a switch's fans went out. Anyone that thinks that it's pleasant to spend time in a data center is insane; whether or not it makes sense in terms of cost was the answer I'd hoped to get from this article.
> eliminating internal sysadmin staff completely (and outsourcing all the remaining sysadmin work to Amazon)
Man, this idea is one of the most insane things I hear. Amazon does not install your operating systems, Amazon does not install and configure you applications, Amazon does not troubleshoot anything related to the operation of the servers, Amazon does not do analysis on your traffic patterns and decide when you need more instances, Amazon absolutely doesn't care about how well your product runs as much as your own staff does, in other words there is no way you outsource your WHOLE sysadmin staff. All Amazon does is provide the hardware and network infrastructure to run on. It is not a magic bullet that says I don't need to have these people around. SysAdmin != server monkey - yes it is part of our jobs, but quite honestly it is a very very small part of it.
> Anyone that thinks that it's pleasant to spend time in a data center is insane
Call me insane because I sure do love spending time in a datacenter, it's one of the few times working in IT you really get to do something with your hands, and that does have a certain appleal to me.
You conveniently left out the first part of the quoted sentence, "allow programmers to actually program infrastructure". Almost everything you wrote can be programmed these days without a "traditional" sysadmin.
Granted that process is still under refinement, but with dedication you can achieve those goals.
Conversely, again with cost analysis, which is cheaper, a developer who can manage all that, or a sysadmin? I am sure there is a curve, small end a sysadmin, large end automation.
>>* All Amazon does is provide the hardware and network infrastructure to run on. It is not a magic bullet that says I don't need to have these people around*.
Even in this case, I would still like to excuse my self of the headache dealing with real estate, electricity, air conditioning etc. In other words supporting factors required to keep the data center up and running.
Unless you are going to need ridiculous amounts of servers, its just not worth your time dealing with other support operations and its infrastructure. Hiring, managing and maintaining the staff full time to this job is not easy and takes a lot of your time.
For some large company it makes perfect sense to focus on data center infrastructure and support. But for small companies you are far better off letting some one else handle it and focus on your primary work.
This comment doesn't make any sense. Companies who need to build their own data-center aren't going to get by on Heroku.
For everyone else there's co-location, where you don't worry about those things anyways. Unless you have a just terrible facility I guess. I mean sure, one time in the past three years I've complained to our colo folks that it seemed warmer than it should be while one of the A/C units was being serviced.
Other than that your argument doesn't make much sense. It makes me think you've either got no actual experience with this sort of thing, or you're co-locating at "Bob's BBQ & Server Emporium"...
>>Call me insane because I sure do love spending time in a datacenter, it's one of the few times working in IT you really get to do something with your hands, and that does have a certain appleal to me.
We are both saying the same thing. You need to draw a line between "fun" and how much feasible it is to afford that fun. Most people don't need a data center, but still want it.
All passion and fun talk aside. Sometime you need to look at it very pragmatically. That was my point.
It was a real analysis that went beyond just counting the beans. As a privately held company, getting rid of passionate people doesn't make the stock go up, increase anyone's bonus, or benefit their long term goals - and blogging about their values certainly cannot hurt in attracting top notch talent.
If you remove any rational analysis based on productivity, developer time, or finances, then we're only left with a question of personal taste -- and it's impossible to argue against personal taste.
An evaluation based on personal taste is useless. If I say "I like orange because it's a great color" -- then either you already like orange, in which case my analysis does you no good, or you don't like orange, in which case my statement can simply be ignored.
The same applies here, in that the article could have been coalesced to a single sentence: "We don't use the cloud because we don't want to. We like dorking around with hardware and data centers, just like someone else might link tinkering with their car or collecting bottle caps"
> I spent a lot of time in the hot/cold, blaring fan world of data centers -- sometimes late at night, when a hard drive failed or a switch's fans went out.
If a hard drive or a fan failing requires someone's attention at the data center in the middle of the night something is wrong (unless your company has such high uptime requirements that even redundant systems need to be fixed right away). Either your company can't afford to provide proper redundancy, or someone on the technical side has failed to implement redundancy or failover properly (which happens, but should be corrected). For example we had an entire router fail and this just caused a blip. We didn't have to do anything right away to fix it, just figure out what went wrong and bring it back up the next day. Also we don't really end up spending very much time in the data center itself (in fact I work remotely full time).
> We like dorking around with hardware and data centers, just like someone else might link tinkering with their car or collecting bottle caps
We do like to consider ourselves experts -- we don't just dork around and tinker (at least, not most of the time)
> If you remove any rational analysis based on productivity, developer time, or finances, then we're only left with a question of personal taste
In part this is a matter of taste. But as I said:
"This culture means when we hire technical staff, we hire people who share this passion. I believe that this passion translates into a better product. Whenever someone does a cost analysis of cloud vs self hosting there is no row in the spreadsheet for “Work Productivity Increase due to Passion.”
Just because there is no good method to put it on a spreadsheet, doesn't mean it doesn't have value for the company. There are a lot of technical reasons we like to have control over the full stack. But I think it largely comes down to our culture. The culture of a company is very important. A culture could be all about how the numbers fall after you calculate your capital and operating expenses etc. Somehow it seems to me though that many people don't seem very happy in those cultures -- and this will effect employee retention and the sort of talent you can attract.
For me, the best justification for “doing it ourselves” is that it’s what exceptional companies do. We dig under abstraction layers not knowing what we’ll find. We dig under Linq to SQL. We dig under Redis clients. We really never consider the vendor’s platform good enough.
The result of a million small optimizations is a platform that others can’t create.
But come to think of it in an another way. If you actually let some one else focus on the issues you don't necessarily need to handle you can focus well on issue you actually need to work on.
But, yes its all about personal preferences and I respect your preferences. In my case I would actually let some one else do the job I don't necessarily have to do and use the same time to do well in my actual job.
Instead of 12 physical cores, 96GB of RAM and a 2TB SSD Array pushing 1M IOPs on dedicated hardware for my PostgreSQL database servers, I'd need 1TB of RAM in an AWS box because I'll be lucky if I can even break 10K in IOPs.
Does the price make sense then?
I have yet to see any significant AWS deployment that doesn't feel like it could be done better, more reliably, and much more cheaply as a co-located setup.
You can't really create a massive co-located setup on demand for big jobs, then tear it down ... use the right tools for whatever job you're doing. EC2 being more expensive for your Postgres deploy doesn't mean it is an expensive toy
If you really have a need to setup and teardown a bunch of cores for the occassional large batch, nobody's questioning that the cloud is an economical way to do that.
But I've never had to do that. Not in a way that would make the development time involved economical anyway. Wait two hours for this once-in-a-blue-moon processing job to finish, or spend a day setting up a process to handle such jobs quickly in the future?
It'd really take something exceptional (again, with low IOPs demands) to have that make much sense unless I was already hosting in the cloud and had invested money in making such a task quick and cheap.
Instead of 12 physical cores, 96GB of RAM and a 2TB SSD Array pushing 1M IOPs on dedicated hardware for my PostgreSQL database servers, I'd need 1TB of RAM in an AWS box because I'll be lucky if I can even break 10K in IOPs.
Does the price make sense then?
Does it? That's the point of analysis. It may be that you can scale up your organization on AWS, and then make the decision to scale something like a database vertically once you actually need it.
The choice should be based on a rational cost analysis, however. Not just money, but developer time, productivity, and a potential loss of focus on the core competency -- which should be your product, not your commodity infrastructure.
I have yet to see any significant AWS deployment that doesn't feel like it could be done better, more reliably, and much more cheaply as a co-located setup.
That's a stretch. Especially the "cheaply" part. The operational costs involved in building and maintaining a significant co-located deployment are huge, not to mention the capital expenditure involved in enterprise networking hardware, servers, cages, racks, PDUs, etc.
I don't firmly fall on either side of the debate -- one must balance the requirements and costs, like anything else.
However, I do firmly believe that we should have programmers automating the entire software system administration job away, leaving only the question of hardware provisioning. That's why we have "devops" style teams nowdays, and I only expect that trend to grow.
No, it doesn't. You're throwing up a spectre. This are pretty easy numbers to come by. The point in this one example being vertical scaling options are pretty constrained on something designed to really only effectively scale cores and RAM.
> That's a stretch. Especially the "cheaply" part.
All this is a straw man. I really have a hard time believing "I don't firmly fall on either side of the debate". In fact, I call BS.
Who actually has to wire their own PDUs unless they want to? Or is forced to buy cages, racks, etc?
If you want to I suppose you can, but I haven't hit a Tier 1 data-center where that's even an option. Unless you were to buy an unfurnished cage perhaps on the terms you got to hire your own people for the build out.
Otherwise, for most deployments, even for something like Reddit, you're talking about stacking a couple 48 port switches, racking a few servers, and making sure you don't screw up airflow with bad cabling. Just spend $500 on an experienced cable guy to wire it up. Cabling isn't the most fun.
Staffing costs are a drop in the bucket. Racking a couple dozen systems, configuring your ports is pretty trivial.
If you have a business with dependable positive cash flow affording you the luxury of signing a three year hardware lease, it doesn't make cash sense to do anything else for 99% of deployments.
The "core competency" stuff is just a salve for developers who want to live in a homogenous environment. Life gets easier with a competent IT person. Not harder. You still have to monitor processes, setup syslog servers, archive logs, monitor disk space, load, available RAM, bandwidth, trace down abusers, setup mail servers for at least internal monitoring, create backup procedures, automate deployments, figure out how to compile some old library.
All of this stuff is where your IT staff effort goes. Not in the once-in-a-blue-moon-we-have-to-rack-some-servers tasks. You aren't better off because Amazon is handling your power requirements instead of a good colo. Those things are details you never have to think about either way. Saying so is a definite straw-man.
The problem is that it's touted as a great way to scale web apps. Since DB performance is usually the limiting factor in webapp scaling, this doesn't appear to stack up. Needing a lot of IOPS is pretty par for the course.
I guess my question is why go for reasonable when you can get incredible performance per dollar with RAID-10 SSDs? A few dedicated DB machines with SSDs and many cores can get you absolutely monstrous throughput without going for any of the more exotic DB solutions.
What a silly argument. "If you just want to use someone else’s computers, it means you don’t love computers — at least not every aspect to them."
Uh, don't they use Dell? If they love them so much, why not build their own? Why stop there? I love software and programming languages, but it doesn't mean I'm going to use my own compiler or run a company on my own libc.
There is always the question of degree and practicality. So in terms of libc / and custom compilers that is perhaps ad absurdum. There really wouldn't be any practical benefit taking it to that far for most companies. However, people that love programming languages often probably wish it was practical and that they had an excuse to write compilers at some point.
As far as Dell goes, I honestly have mixed feelings at this point. When I started at the company there was less than 10 people, so the idea of having centralized firmware updates etc seemed practical. I'm not sure I really feel this way any more -- I go back forth. But sticking with Dell seems to make sense at this point so we have more uniformity in hardware and management.
A friend of mine has the uncanny ability of re-formulating every problem he is given in such a way that the logical solution to the problem is always: "Lets write a parser". He loves writing them. Not quite 'writing your own compiler' but it's going that way.
The practicality argument doesn't apply when they already say they're doing more than necessary because they love computers and if you don't that you don't really love computers.
When someone says that you don't love computers because you did whats most practical or cost effective for you, how can they say that they love computers when they do or don't do something else because its most practical or cost effective to them?
If it saves my time, is cheaper or easier to use amazon rather than running my own data center, that doesn't mean I don't love computers. If they love computers so much that practicality is out the window, then why aren't they using their own operating systems and libc's?
Clearly there's no difference in effort between a day installing and configuring a Dell server and several thousand person years of writing a solid, scalable operating system, so your question is perfectly reasonable.
I understand that what I'm saying is a bit of a stretch, but when they claim people don't love computers for not doing that they're doing, they may as well take it to the extreme. People have written operating systems, maybe these people should say everybody else doesn't love computers.
I mean, obviously I don't expect anyone to build everything they use. My argument was aimed at the "you don't love computers" statement and not at the fact that they don't write their own OS.
You don't really love computers unless you're working out every bit of the quantum mechanics necessary to design and model its integrated circuits on a chalkboard.
Whom is going to do a better job? A group of educated, passionate people that can troubleshoot their own problems in realtime? Or "Peggy" at Cloud Vendor X that deals with 400 other clients?
That depends on whether the fix that Peggy deploys for one client also fixes the same problem for the other 399.
In other words: It depends.
There's no general answer, because deciding where to draw the abstraction barrier between you and your vendors is an unsolved, difficult, and evolving problem.
The reason I'm still wary of the cloud is that the abstractions are still too leaky (to borrow Joel Spolsky's turn of phrase). When you start abstracting away core system calls (like fsync), things work great 99.99% of the time. But when that .01% bites you, it bites hard. We don't expect core system calls (like fsync) to fail. And when they fail, the fact that our code is two or three levels levels of abstraction higher means that we often have no way of fixing the issue. The cloud will be at a disadvantage to "real servers" as long as its abstractions are leaky enough to be distinguishable from real hardware.
Keep in mind that screwing with syscalls is an AWS thing, not a virtualisation thing. They make fsync wierd for good reason though, and they really don't have much they can do.
Right. The question, however, was about working in the "cloud", which usually means that you just have access to the VMs, not to the underlying hardware. VMs are great, but there's nothing like root access to the physical computer to help you diagnose hardware issues.
Remember the interface between the vm and hypervisor is standardised and it is running many other functionaly identical vms. Also remember that platform they are running on is usually homogeneous, very well maintained, monitored, and understood.
Also keep in mind that with 'hardware issues' you can reploy onto another machine in minutes, rather then having to wait for say a replacement part from dell/etc. Your standard procedures should come into play in both cases (Failing to a secondary server, bringing a tertiary server into standby, etc.)
Cloud or not, if you're writing serious software (and by that I mean software that depends on fsync succeeding) you'd better expect system calls to fail and have code to handle that.
Because there's an extra virtual layer between you and the hard-disk, fsync() does not do what it says it does, which means that a database cannot actually guarantee that a transaction was successfully completed.
In case your traffic is huge, this can cause big problems that are very hard to fix. And it did cause problems for Reddit, as EBS is notoriously awful in this regard (and others, like big latencies on access).
Basically when working at big scales, nothing beats having complete control over your infrastructure. It may be tough and time consuming, but at least you can identify the problem and fix it.
It is my opinion that something like Google's search cannot be built on top of Amazon's AWS.
Saying that because fsync() doesn't work in AWS that it won't work 'in the cloud' is much like saying that because you can't take a cheap sedan to the bush that a 4wd can't possibly exist.
Google can't exist (well) on top of AWS as AWS is designed for the deployment of (qusi)stateless applications. Saying that you can't build a data intensive application on top of it is like saying that your machinegun sucks at grating cheese.
OVM was designed to run a search engine origionally (darkmatr, now defunct due to lack of comparitive profitability), so you could quite easily build google on top of it. It wouldn't surprise me if google would work really well on top of that stack.
Most modern hard drives don't actually do what most people expect an fsync does anyway. This is an interesting thread about Apple's F_FULLFSYNC ioctl that discusses the problem (this pat is by Apple FS engineer Dominic Giampaolo): http://lists.apple.com/archives/darwin-dev/2005/Feb/msg00072...
Poorly written argument full of red herrings. From what I can tell, their answer to the "why aren't you in the cloud" question is that they "love computers so much that they want to do it all themselves because it's so fun".
If I asked you why you were eating pizza and you responded "because I like pizza" would you still consider that a red herring argument?
All they are saying is that they are not in the cloud because they don't want to be in the cloud and because they quite like not being in the cloud. It works for them. Why is that expression at all controversial?
If they'd labeled their blog "Why being in the Cloud is a bad idea" or "Why being in the cloud would cost Stack Exchange a lot more" or "Why nobody should ever use the Cloud" then I could see the reason for the dispute. Then their arguments would be very unconvincing.
It is certainly valid for Stack Exchange to make a choice because it fits within their culture and then explain that that is why they are doing it. You might think it's a dumb decision and that's fine. But they clearly are not trying to convince you that they've made the best possible decision. They're just saying that good or bad, they've made the decision that they want to make. And it's clearly a decision that has been working well for them so far anyway. So why do people keep bugging them about it?
You're correct about alternative titles for the blog post. However, the assumption is that on a technical blog, readers would expect technical answers--at least I did. You're not going to tell your boss that the next product should be developed in Java because you like Java. They want concrete reasons.
Would you have preferred a gigantic stack of technical chaff designed to rationalize and obscure the fact that they're doing whatever seemed like fun at the time? Because I'm sure we can find you some examples. ;)
And taken on those terms - not on the cost accounting of cloud vs local - it makes sense to me. Is it a convincing "no matter what" argument against cloud hosting in all cases? I'd say no - there will be cases where sheer scale or fluctuations in scale make cloud hosting more attractive. as the article admits.
I suspect that this is part of the problem with many big manufacturing companies these days. Ford was a leader for so many years was because their leadership was passionate about engineering and manufacturing. Henry Ford was an engineer for the Edison Company before he was a manager and is famous for obsessing about every detail of the manufacturing process. That sort of spirit infected the company up through the 1960's and 70's when the MBA's took over and everything was reduced to a column of numbers.
It appears that many early tech companies (TI, HP) have headed in the same direction. Maybe 30 - 40 years is the limit before the accountants take over.
A Ford bought today is less expensive, considerably safer, more efficient, and more reliable than a Ford from the days of Henry Ford.
Meanwhile, today's manufacturing companies routinely make objects – like the processor driving the machine that you are reading this on – that are orders of magnitude more complex than the most advanced technology of 1959.
And all of this was done by companies that employ lots of accountants. I'm not sure why you think that accountants and engineers don't routinely coexist, or that Henry Ford didn't employ plenty of accountants back in his day. Being able to manufacture good stuff in bulk at reasonable prices is a major exercise in accounting, and always has been.
One key reason I suspect is that you haven't been able to get SSDs in the cloud so their scaling up approach would be more difficult. Having your own servers means you can "NewEgg your way out of it" (http://news.ycombinator.com/item?id=3243133 :-).
I'm growing to understand "the cloud" not as AWS/Rackspace/Heroku/whatever, but as a mindset where:
1) you're not tied to hardware
2) your application scales horizontally
3) you can easily (and dynamically) add (computational) resources as needed, preferrably managed by your application.
4) you build fault tolerance and redundancy into your application
This article basically takes the approach that "the cloud" is AWS.
OT: I've heard claims that dedicated servers beat cloud VPS (i.e. AWS) in cost. But most of the dedicated servers I've seen cost $100 and up for a month. Is there a comparison on the cost/computing_power with benchmark for review?
What you care about at any reasonable scale (ie. the point at which costs matter) is $/serviced request. That usually ends up being dominated either in $/cpu cycle or $/iop. The current problem is that most cloud platforms suck iops which makes them far more expencive then they really should be.
Cloud vs. dedi vs. colo in terms of raw prices though, depends entirely on your price of money and the price you can negotiate with your provider.
Amazon has very good .NET support. So does Rackspace. So does Azure (obviously). In my personal experience the quality of cloud platforms has been comparable for both Linux and Windows environments. Is there any evidence that .NET cloud platforms are of lower quality? I'm genuinely interested. Please elaborate.
Personally, I doubt there is a real quality problem with those platforms. But running .NET on those platforms looks unnecessarily expensive to me. If I were married to Microsoft (as SO is) then I would avoid 'the cloud' for that reason.
Given that one wishes to use Windows/.NET, I feel the decision follows pretty naturally from an analysis of costs - no vague argument about culture necessary
I would say cost (referring to license fees) is more relevant in deciding whether or not to choose the .NET stack in the first place. You're not going to avoid license fees by avoiding the cloud.
Their culture argument is a funny one. But I have no beef with it. If owning and tinkering with the hardware makes them happy, more power to them.
As an aside, I've been using Azure lately for my .NET projects. MSFT has done a nice job with that platform and don't get enough credit for it IMHO. I especially like SQL Azure.
You're arguing that competition is sufficient for quality, not that it's necessary. The first vendor might choose to provide a high quality product regardless, which could, in itself, provide a barrier to entry preventing competition.
The real win of competition is that it allows a market to be commoditized, at which point it actually operates like we all learned in our Econ 101 class.
I think we are overanalyzing here. This is like a car company that maintains a garage to attract people who like to look under the hood. It's taking a strategic people decision when hard data isn't easily available. Everything about sys admin job descriptions seems like a digression.
Why is everyone attacking this article? So what if their reason is subjective? They want people who enjoy working on the whole stack. They have fun doing that, and that's all the reason they need.
They aren't saying that "people who use the cloud don't care as much". It's just their thing.
They're the masters of their own company, and as such, can do things however they like. One of the benefits of a private organization.
A real analysis would have even dug into the question of sysadmins vs engineering staff, and the potential for "the cloud" to allow programmers to actually program infrastructure, eliminating internal sysadmin staff completely (and outsourcing all the remaining sysadmin work to Amazon).
Instead, we got bizarre platitudes about how they "love computers". How boring. Early in my career, I spent a lot of time in the hot/cold, blaring fan world of data centers -- sometimes late at night, when a hard drive failed or a switch's fans went out. Anyone that thinks that it's pleasant to spend time in a data center is insane; whether or not it makes sense in terms of cost was the answer I'd hoped to get from this article.