StackOverflow Update: 560M Pageviews a Month, 25 Servers

noelwelsh · on July 22, 2014

StackOverflow shows just how powerful using a fast language can be. Compare the TechEmpower benchmarks for, say, Java vs Rails on a very simple JSON serialization benchmark (we can assume .Net would achieve comparable performance to the JVM):

http://www.techempower.com/benchmarks/

The Java servers get around 900K requests/s on beefy hardware while Rails squeezes out 6K. That's a 150x difference! Any real application will be slower than this, and on cloud hardware you can expect to be a order of magnitude slower as well. You just don't have much headroom if you use languages like Ruby before you have to scale out. And once you scale out you have to worry about sys. admin. and all the problems of a distributed system.

It's a one off cost to learn an efficient language, but it pays returns forever.

Maarten88 · on July 22, 2014

> StackOverflow shows just how powerful using a fast language can be. Compare the TechEmpower benchmarks [...] (we can assume .Net would achieve comparable performance to the JVM)

Except those TechEmpower benchmarks show .NET is not nearly as fast as Java. I think StackExchange prove that the platform is NOT the most important: it's much more important to make performance a priority in all engineering decisions, to benchmark everything beforehand and develop new technology where there's no good standard solution available. (think Dapper, protobuf-net, StackExchange.Redis)

bhauer · on July 22, 2014

I'd like to add one caveat with respect to our C# tests: we did not run the tests on Windows in Round 9. With the help of contributors, we've recently revamped our benchmark toolset, and have yet to pull the Windows implementation of the toolset up to date. The C# data in Round 9 is exclusively from Mono on Linux.

To more accurately judge the request routing performance of C# tests on Windows, see Round 8 data. For example, see Round 8 limited to Java, C#, and Ruby tests on i7 hardware [1].

Other important notes on that data:

* The http-listener test is a low-level test implementation.

* The rack-jruby test is running on Torqbox (TorqueBox 4), which is based on Undertow and has very little Ruby code [2]. This test is mostly about Undertow/Torqbox, with a bit of JRuby.

Another interesting test is Fortunes on i7 in Round 8 [3]. Fortunes involves request routing, a database query, ORM translation of multiple rows, dynamic addition of rows, in-memory sorting, server-side templating, XSS countermeasures, and UTF-8 encoding. Here you will see Java frameworks at about 45K rps, ASP.NET on Windows at about 15K rps, and Ruby at about 2.5K rps.

[1] http://www.techempower.com/benchmarks/#section=data-r8&hw=i7...

[2] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

[3] http://www.techempower.com/benchmarks/#section=data-r8&hw=i7...

noelwelsh · on July 22, 2014

I agree with your general point, but I believe you need to choose components that have a chance of being fast if you ever want the overall system to be fast.

I didn't notice .Net in the benchmarks -- no .Net languages are listed but .Net is there as a framework. It's still x20 faster than Ruby.

hudo · on July 22, 2014

Those test there are much more web server + platform tests, and can be hardly taken as java vs .net vs something

ZoFreX · on July 22, 2014

> That's a 150x difference! Any real application will be slower than this

How many applications are rate limited by the speed of the front-end language? Not that many; the speed of the backing store will usually be the bottleneck.

raverbashing · on July 22, 2014

Unless of course, the front-end processing takes so much time because it's complex and/or the language is slow.

One example, Django template processing is slow

idlewan · on July 22, 2014

    Java servers get around 900K requests/s [...] while Rails squeezes out 6K

Yes, but that's for the /json test, which serves a static object that the framework converts to json.

I'm not sure it is really representative of the overall speed you'll get with whatever framework you'll be using: how often do you have a static page that's best served by your framework instead of leaving nginx serve it directly?

I prefer watching the test results that actually involve the database (/db, /queries, /fortunes and /updates), as it shows less raw speed for serving static things and more overall speed for your dynamic pages.

With /queries for example, Java does ~11.3K, while php (on hhvm) does ~10.7K, Python is at ~7.7K and Dart ~12.8K (For the fastest framework of each language. Rails still does bad though).

noelwelsh · on July 23, 2014

The reduced variance of the /queries benchmark suggests to me that the database is the bottleneck in this setup. This would be the first thing you'd attack if you were optimising for speed. For instance checkout the section on StackOverflow's caching -- they have 5 levels of cache, including in-memory caches on the web servers. They are doing a lot of work to take the DB out of the equation.

I prefer the /json test because it gives me a performance ceiling. I know that's the max performance I can expect, and it's up to me to design my system to get as close to that as I can.

On the other hand the db tests don't really tell me much. Does my data and data access patterns match theirs? Probably not. So it is difficult to generalise from their results. If the systems you build are always stateless web servers talking to a relational db then I can see it might be more useful.

bhauer · on July 23, 2014

This is a good point. It's true that for the high-performance frameworks and platforms, the database is the bottleneck. Or, more accurately—considering the trivial query and small payload—the overhead of the database driver and the wire protocol are significant factors. That said, there remains a fairly broad distribution over the Single-query test and Fortunes tests. For the low and medium-performance frameworks, the overhead of their ORM and other factors are more significant than the database. I find it can be illuminating that in many cases, the ORM code necessary to marshal result-sets into usable objects is more costly than the underlying queries.

Meanwhile, the 20-query test is a bit pathological as it runs into a brick wall with the database wire protocol and efficiency of the database driver. Many otherwise high-performance frameworks and platforms become bottle-necked waiting on those 20 queries per request. But you and I agree, a 20-query-per-request scenario should be the exception and not the rule. When developing a smooth-running web-application, it's common to aim for zero queries per page load. (For those who find this to be crazy talk, note that I'm saying we aim for that ideal, and may not necessarily achieve it.)

I too particularly enjoy knowing the high-water mark set by the JSON and Plaintext tests. When we add the next test type (caching-enabled multiple queries), we should see some interesting results.

skndr · on July 22, 2014

Conversely, you might be able to iterate faster with a 'web' language and create a product people really use in a shorter time frame and worry about scaling later!

noelwelsh · on July 22, 2014

5 years ago it might have been true that RoR was that much more productive it was worth using, but that doesn't match my experience today using RoR and Scala.

Update: To try to avoid turning this into a language war, there are other good reasons to use something like RoR. Ease of hiring is one. Using known tools is another, if you're not building anything you expect to get load.

pjc50 · on July 22, 2014

Strange, to me Java and ASP.NET (as used by SE) are web languages.

brixon · on July 22, 2014

No, they are general purpose languages that can be used for the web.

Languages like ColdFusion and PHP are web languages that don't make much sense outside of the web.

jacques_chester · on July 22, 2014

> It's a one off cost to learn an efficient language, but it pays returns forever.

It also imposes costs forever.

Software is never finished.

BrandonM · on July 22, 2014

Java 8 + an IDE like NetBeans (which isn't even the best Java IDE) is actually a very pleasant development experience. I am a total convert from Python in vim to Java in NetBeans+jvi.

Refactoring in Python or Ruby or Javascript or whatever dynamic language is extremely painful compared to doing it in a static one. The "Software is never finished," argument can also work against dynamic languages.

tjdetwiler · on July 22, 2014

Have you tried IntelliJ?

BrandonM · on July 23, 2014

Yeah, I have. It seemed objectively better, but I started using NetBeans first, and it's what we all use at work. The workflows and keyboard shortcuts are ingrained enough at this point that switching doesn't really seem worth it right now.

noelwelsh · on July 22, 2014

Your point is that more efficient languages are slower to develop in? This hasn't been the case for a while, in my experience.

kbenson · on July 21, 2014

> The cost of inefficient code can be higher than you think. Efficient code stretches hardware further, reduces power usage, makes code easier for programmers to understand.

I'm curious what the reasoning is for "Efficient code ... makes code easier for programmers to understand". To my mind, efficient code (in this case, I assume coding to the hardware, as they mention elsewhere), has many benefits, but making it easier to understand is not one of them. A useful comment by the code may help, but that's not a result of efficient coding, that's a result of good commenting practice.

gambler · on July 22, 2014

In complex systems low-level efficiency is far less beneficial than high-level efficiency. To achieve high-level efficiency you need to have a clear understanding of how the system works as a whole. That is impossible is the code is unintelligible bag of tricks.

I've seen this a lot when working in PHP. I wrote some websites from ground-up (using something similar to my own framework). I optimized int multiple times to great results. The biggest benefits didn't come from making a particular function faster, they came from realizations that large chunks of complexity in templating, routing and permissions checking subsystems simply weren't necessary. It doesn't matter how clever those chunks were written, because I got rid of them completely.

I think this is what the author speaks about.

mark-r · on July 21, 2014

They may just be trying to express the thought that simpler code is both easier to understand and often more efficient. Don't forget Joel Spolsky's contributions to the effort; he's the one who coined the term "architecture astronaut".

lukeholder · on July 22, 2014

link[0] for those interested.

[0] http://www.joelonsoftware.com/articles/fog0000000018.html

hueving · on July 21, 2014

Definitely. Some of the most efficient C code I have seen is incomprehensible to anyone but a C expert and even then it takes several minutes to understand all of the nuances of what might only be a 4-line function.

mkr-hn · on July 21, 2014

I experienced exactly this when I cracked open the source to a GNU tool to see if I could figure it out.

dmitshur · on July 21, 2014

Sometimes "efficient" and "easy to understand and be sure is correct" don't have to be mutually exclusive; see this example [1] of Java and Go.

Note that the code is autogenerated, so it should be equally efficient. The Go version also happens to be very simple and no different than most humans would write by hand (without trying very hard to optimize).

[1] https://gist.github.com/shurcooL/9f94bbd021b4693cf584

nostrademons · on July 21, 2014

The code samples in your gist aren't doing the same thing. The Java version decodes the UTF-8 stored in the protobuf into Java's native UTF-16 Strings on first access, while Go strings are native UTF-8 and so only a nil check is necessary.

Or are you saying that languages should always use UTF-8 natively? I would agree with you on that, but disagree that this proves your point that "efficient" and "easy to understand and verify correctness" aren't mutually exclusive. pb.getSomeId().charAt(1000) runs in constant time in Java (albeit failing to give correct data if getSomeId() contains codepoints higher than \uFFFF), but pb.GetSomeId()[1000] will give you garbage data in Go if your field contains non-ASCII text. To get a valid codepoint in Go, you'd need to do string([]rune(pb.GetSomeId())[1000]), which runs in linear time and omits the check for valid UTF-8 bytes.

dmitshur · on July 21, 2014

That's a good point, it seems the Java code is checking if the entire string is valid UTF-8 while the Go version isn't. I wonder why the behavior of generated protobufs is different between the two.

kbenson · on July 21, 2014

I'm not disputing that efficient code can be easier to understand in some instances, just that the assertion that efficient code is easier to understand. Logically, disputing that just requires a single instance of efficient code that is hard to understand. I think that's a trivial enough example to be self evident. In reality, I'm just wondering what they were trying to convey with that statement. I doubt they would have written it without reason, so I'm curious to the reason.

dmitshur · on July 21, 2014

Fair point.

It's possible when they said "efficient", they did not mean "hardware efficient, runs fast, doesn't use a lot of resources" but "developer efficient, faster to write and easier to understand". But that entire phrase as is doesn't make sense, I agree with that.

It's also possible to be a simple mistake. Maybe they went overboard with marketing phrases and claimed something that isn't quite true.

sklivvz1971 · on July 21, 2014

We never said that. The closest I can think of is that simple code (short stack, no DI, layering, etc.) is simpler AND faster at the same time, and in my talk I give specific examples.

kbenson · on July 21, 2014

To me, that sounds like you are saying simple code is often efficient (or that efficient code can by simple), not that efficient code is simpler (which is how I interpret the original text).

To be very clear, I don't doubt the author (you?) had a good point to make, just that it was unclear what that was from the way it was presented, and to such a degree that it took me out of the flow of reading the post, and perhaps a less ambiguous way of expressing that concept (or omitting it, if it's expressed succinctly elsewhere) would be clearer and more effective.

sklivvz1971 · on July 21, 2014

Sorry, I'm not the author, I'm the guy at Stack Overflow that's being cited by the article (Marco C.).

kbenson · on July 21, 2014

All, true, which is why I outlined my assumptions and left it open for others to help try to figure out. :)

Unfortunately, if it is just poor communication on their part, we'll likely get no answers that can be considered more likely than others without original author or someone related (work-wise) piping up.

electrum · on July 21, 2014

The generated Java code is just bad. You can simplify it by adding imports and removing the redundant variable: https://gist.github.com/electrum/d2f7668f2a7eb95420a9

Of course, as others have mentioned, it is validating the UTF-8 but the Go version is not.

lttlrck · on July 21, 2014

It's not clear to me what argument the example is supporting.

The Go code is very concise and the Java code is very verbose.

The Go code does look more efficient but what is ImpressionData? Where is it checking the string is valid utf8?

kisielk · on July 21, 2014

Why would you send a string which is not valid utf8 over the wire in your system? That's the kind of validation that should probably be done prior to that. Or if it hasn't, you can do the validation manually on the receiver. There's no reason to incur the cost on each transmission.

itry · on July 21, 2014

    users=db.get("SELECT name FROM users WHERE group='admin'")

is more efficient and understandable then:

    users=[];
    tmp=db.get("SELECT name,group FROM users")
    for (i=0;i<tmp.count();i++)
    {
      if (tmp[i]['group']=='admin') users[]=tmp[i][name];
    }

abritishguy · on July 21, 2014

But who in their right mind would do the second one?

Obviously you can make code both less efficient and less readable. But starting with code that a competent programmer has written, I find it seldom makes it more readable when you make it more efficient.

michael_h · on July 21, 2014

  But who in their right mind would do the second one?

Oh abritishguy, I can assure you that the world is awash in such magical code snippets.

abritishguy · on July 21, 2014

> competent programmer

angersock · on July 21, 2014

It's amazing the sorts of things otherwise sharp programmers will write when they're tired, hurried, and devoting cycles to more interesting/complicated parts of an implementation.

mseebach · on July 22, 2014

> true scotsman

mtdewcmu · on July 21, 2014

>>But who in their right mind would do the second one?

Surely you jest. I don't know who, but their handiwork is everywhere.

pjc50 · on July 22, 2014

People using ORM can end up writing the second one without it being obvious.

autokad · on July 21, 2014

i tend to agree. take dijkstra's algorith implemented with a fibonacci heap or something very simple like

for i = 1 to 100 if x = this do that and something with i else something else with i

easy to read, less efficient than: if x = this for i = 1 to 100 do that and something with i else for i = 1 to 100 do something else with i

from my experience, compilers usually dont hoist very well

edit: sorry, spaces didnt come out well

CodeCube · on July 21, 2014

This is where a good abstraction layer can pay dividends. Of course, a bad abstraction layer (as evidenced by Spolsky's "leaky abstraction" post (http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...) on the opposite hand can add immeasurable cost.

edgyswingset · on July 21, 2014

Not to mention, you'd never write code like the second one in a .NET ecosystem, unless you were purposefully trying to write bad, unmaintainable code.

lnanek2 · on July 21, 2014

Really hate the separate sites thing. I use a half dozen of them and they are computer related, so it is a pain in the ass always having to register and not being allowed to comment/answer at first, etc.. They have a few improvements now like importing your profile from other sites, but they shouldn't even have so many separate sites in the first place, just a tag or a category or something.

mbesto · on July 21, 2014

There is a deliberate reason the sites are split. Spolsky talks about in depth here: http://vimeo.com/37309773 Since it's largely sociological, it's also the reason my money is on SE as opposed to Quora in this space.

bduerst · on July 21, 2014

Thanks - that's a great watch.

Essentially they're trying to keep SE from becoming a "chat" community, and keep it focused on being a Q&A community.

With Yahoo Answers, Usenets, even Subreddits, you have people having conversations within posts, which generates noise for people who were looking for answers to questions.

judk · on July 22, 2014

SE has chat forums now, integrated with the main Q&A forums.

recursive · on July 21, 2014

You get +100 rep bonus for linking your accounts, so you shouldn't have a problem being allowed to comment.

JelteF · on July 21, 2014

You do need to have more than 200 rep on one site though.

r00fus · on July 21, 2014

Which one? I'm a full member on all the StackExchange sites I care about for now with the +100 cross-membership bonus.

iancarroll · on July 22, 2014

In order to get the +100 bonus on all sites, you must have one profile on the network with >200 reputation.

rayshan · on July 21, 2014

How? I can't seem to find this option in account admin panels.

recursive · on July 22, 2014

It happens automatically when you use the same account on multiple SE sites.

zo1 · on July 21, 2014

It happens automatically, if I remember correctly.

ProAm · on July 21, 2014

I agree. Diffuses the quality of the related sites. Means you have to deal with twice as many moderators. I dont see why they couldnt have done a "subreddit' approach where you subscribe to the SE category you want and it shows on your homepage.

icebraining · on July 21, 2014

Hum, each subreddit has its own moderators as well. They also have a front page with questions from the different sites[1], though it's not filtered by the ones you have an account on.

[1] http://stackexchange.com/

dmitshur · on July 22, 2014

But you don't need to create new accounts with different passwords, and login/logout separately whenever you navigate between subreddits.

icebraining · on July 22, 2014

You don't need to create accounts with different passwords on SE sites either. In fact, SE has pushed for single sign-on like probably no other large site did.

ProAm · on July 22, 2014

But you still have to go to a different site to have any hope of getting the chance to ask your question.

  It'd be like going to Home Depot and needing information that is related to building supplies but the employees tell you you must go to a different store down the street to even ask your question.

icebraining · on July 22, 2014

On Reddit you have to post separately on different subreddits, there's no way to simply "post to Reddit" either.

I don't agree with your analogy; I think it's more akin to going to Home Depot to ask for vegetables. I think it makes perfect sense to have a separation between Cooking.SE, Photography.SE, Christianity.SE, etc.

There are a few edge cases, primarily in the tech sites, but I don't think those disprove the model.

ProAm · on July 22, 2014

Yeah Im speaking primarily about the technical SE's sites. I agree with the completely isolated SE (Photography, Puzzles, religion).

But when you get the related topics servers, programming, security, tools, web development, web apps... All these items are so closely related it's often disadvantageous to try and ask a vertically silo'd question.

I get they are trying to create a detail 'manual' for all questions, but you get to a point where it's just better to read the manual.

tomblomfield · on July 21, 2014

"One problem is not many tests. Tests aren’t needed because there’s a great community... If users find any problems with it they report the bugs that they’ve found."

I'm often surprised at the paucity of test-coverage in relatively large companies.

http://nerds.airbnb.com/testing-at-airbnb/

mseebach · on July 22, 2014

I think we're going trough a thesis/anti-thesis cycle on tests - in the beginning, there was militant testing, 100% coverage, testing getters and setters etc (as well as more complex stuff, obviously). Then some people started coming around to the idea that there are actually large swathes of code that is simple enough that testing doesn't actually add much value especially compared to the effort of writing them, then that probably got a bit out of hand (to what you're referring to). Maybe the pendulum will swing back and we'll find a good heuristic for just how much testing is the right amount that isn't all or nothing?

yazaddaruvala · on July 22, 2014

It really also does depend on the language, or rather compiler. Specifically, tests are far more useful when static analysis doesn't exist.

kasey_junk · on July 22, 2014

Actually, in the beginning there was no testing and even justifying having automated test infrastructure or spending developer resources on unit tests was a hard sell to management (what are they paying that QA dept. for after all).

aturek · on July 21, 2014

That bullet point surprised me. It comes right after "110K lines of code. A small number given what it does"; to me this reads like they didn't write tests because that would add complexity.

Can any SO devs give us more details on "not many tests"? Or how many bug reports get filed vs rate of change of software?

jc4p · on July 21, 2014

The things that obviously should have tests have tests. That means most of the things that touch money on our Careers product, and easily unit-testable features on the Core end (things with known inputs, e.g. flagging, our new top bar, etc), for most other things we just do a functionality test by hand and push it to our incubating site (formerly meta.stackoverflow, now meta.stackexchange).

You can look at reported bugs here: http://meta.stackexchange.com/questions/tagged/bug -- it's pretty frequent, at least one every couple of hours or so. The last commit in our "Tests" project was 14 days ago, where as the last commit in the code base was 4 minutes ago.

jacques_chester · on July 22, 2014

You should come down our way and do a tech talk about it.

If only for the wailing, gnashing of teeth and rending of garments.

jc4p · on July 22, 2014

Your people would literally murder me. Are you back from west coast / toronto?

jacques_chester · on July 22, 2014

I am, yeah. We do tech talks every Tuesday. I can promise free food that's not as good as yours.

mistermann · on July 22, 2014

Indeed, they seem to have achieved what many have said is literally not possible.

Also, some might remember when they had Uncle Bob on the podcast, the topic being unit testing, and iirc Bob preaching the gospel (~ you must write plentiful unit tests, or else) and Jeff and Joel (especially) more or less saying they don't quite get why.

(Am I remembering this story correctly? I swear that's how I remember it.)

jacques_chester · on July 22, 2014

It's been my experience that outside-in TDD seems like impossibly strict nonsense until you do it with someone who is good at it.

Then after a while it seems like an obvious way to work.

habosa · on July 21, 2014

> With their SQL Servers loaded with 384 GB of RAM and 2TB of SSD, AWS would cost a fortune.

I have next to zero experience with server administration, but 384GB seems like a lot to me. Is that common for production servers for popular web services? Do you need a customized OS to address that much memory? Seems like you'd really need to beef up the cache hierarchy make 0.38TB of RAM fast.

AaronFriel · on July 21, 2014

There are a few erroneous replies to your comment, re: Windows support for that much memory.

Memory support for versions of Windows:

                           	  User-mode process	   Kernel-mode
                           	    address space  	  address space
                           	  (process bit-ness)
                           	32-bit        64-bit	        all
  Server Operating Systems:	                    	
    Windows Server 2008 R2 	  4 GB*      2048 GB	    2048 GB
    Windows Server 2012    	  4 GB*      4096 GB	    4096 GB
    Windows Server 2012 R2 	  4 GB*    131072 GB	  131072 GB
  Client Operating Systems:	                    	
    Windows 7 (x86)        	  3 GB*          n/a	       2 GB
    Windows 7 (64-bit)     	  4 GB*       192 GB	     192 GB
    Windows 8 (64-bit)     	  4 GB*       512 GB	     512 GB
    Windows 8.1 (64-bit)   	  4 GB*    131072 GB	  131072 GB
  
  
  * requires IMAGE_FILE_LARGE_ADDRESS_AWARE flag, otherwise 2GB
  ** All units use binary exponents (GB = 2^30 bytes)

yuhong · on July 21, 2014

process address space != physical memory

claudius · on July 21, 2014

Linux (SuSE enterprise something, possibly with a custom kernel, I’m just a user, not admin) runs just fine on 5.2TB machine, though I can easily imagine you’d have problems if you tried it with, say, DOS :)

The system obviously behaves a bit different from a standard desktop machines, e.g. different areas of RAM are differently fast, depending on the core on which your current process runs, you need to disable individual CPU lines displayed in top etc., but apart from these, mostly everything seems “normal” to me.

tekacs · on July 22, 2014

> e.g. different areas of RAM are differently fast, depending on the core on which your current process runs, you need to disable individual CPU lines displayed in top etc.

this is NUMA, in case anyone 'new to sysadmin' or to architecture is trying to Google this

bradyd · on July 21, 2014

I don't believe that much RAM is uncommon for large scale database servers. 384GB RAM is only about $5000 from Dell. They also have a new server model coming out that supports up to 6TB of RAM [0].

[0] http://www.dell.com/us/business/p/poweredge-r920/pd

valarauca1 · on July 21, 2014

Most people over estimate hardware cost. Buying a 300-500GB ram server maybe very expensive. But buying a distributed data base expert likely costs at least an order of magnitude higher per year.

jaequery · on July 22, 2014

Same server in AWS, it'll cost $50k+/month for the same thing. SO just bought it one time fee for probably around $30k or so, I'll say they are doing something right!

kondro · on July 22, 2014

The don't have servers with that much RAM, but a 244GB RAM server with 8x800GB SSDs and 32 cores costs <$5k a month ad-hoc and about $1300/month with a 3 year commitment.

valarauca1 · on July 22, 2014

$1300/month w/ a 3 year commitment is $46.8k over the lifespan. Which is more then 4x more then the actual hardware cost. Your ROI on a physical server with the same stats at that price tag would be 8 months.

habosa · on July 22, 2014

Not saying its totally worth it, but the upside is that they take care of power costs, physical space costs, networking, and if the hardware fails they'll put your application on a new server. Basically you take all of the risk out the equation.

egwor · on July 22, 2014

Sure, but add power and cooling for these things plus physical space? (I remember when these big db's needed the floors in the data centre reinforcing since they were a > metric ton )

Still, I agree re. the difference in price. I'd buy too

Nick-Craver · on July 22, 2014

Worth noting: AWS also doesn't have Enterprise SQL available, so plan on buying that yourself. That's a not-so-insignificant piece of the overall cost to many people.

superuser2 · on July 22, 2014

How common is it to buy one extremely powerful database server? Wouldn't you need the high availability properties of something like a Galera cluster? Or are most people accepting the single point of failure at huge database server?

(I honestly don't know, it's just surprising.)

emilv · on July 22, 2014

While you do want HA you also want to keep things simple. You get great HA with just one backup machine. The times you lose two machines are so few that you might crash the site with software failures far more often anyway.

Databases have inherent locking problems that takes more resources to resolve with more machines, more so in some database systems than others. When scaling you often hit a point where you get down to macro logistics, so imagine a car highway: You can get more throughput by adding more lanes. But what that all cars are going to merge to one lane at some point in the middle of the trip (because of a tunnel, bridge or something: it's impossible to have more than one lane at that point)? Now you won't get any throughput benefits of the multiple lanes after all! Just more latency because you get queues up until the single lane and because of the queues all cars are going slow and need to accelerate on the single lane, so the average speed is low too. You are also having too much resources after the single lane because you can never fill all those lanes. It might be better to have one beefed-up single-lane road all the way that people can go fast on. Basically: Remove locks and you get better overall performance.

Yes, this is on the expense of HA. Yes, the costs of scaling up grows asymptotically faster than the costs of scaling out. So this is definitely a trade-off in some sense.

jtseng · on Aug 1, 2014

They do have a cluster: http://www.brentozar.com/archive/2012/09/microsoft-sql-serve...

They just run everything through one live database all the time under normal conditions. Personally, I think this is much simpler to manage than to be regularly spreading out queries over many queries.

EvanAnderson · on July 21, 2014

Windows Server handles that much RAM fine, and, so long as it fits, SQL Server will end up caching the entire database in RAM.

If I remember correctly they're using Dell R720's for their SQL Server machines. At current Dell retail pricing 384GB of RAM would be about $5,900.00.

At that price I'd vote for throwing money at raw hardware and reaping the benefits of caching the database in RAM, versus trying to pay programmers to come up with clever techniques to deal with having less RAM.

x1798DE · on July 21, 2014

It's not a server, but I've seen machines used in physics simulations have something on the order of 256 GB RAM in them. As far as I know they cost maybe $20k-$30k, so not absurdly expensive.

I don't think addressing that memory is a problem on any x64 architecture. Windows Server 2012 has a 4TB upper limit on RAM, apparently [1], and as far as I can tell there's no reason for this other than product differentiation between different "levels" of the operating system.

1. http://msdn.microsoft.com/en-us/library/windows/desktop/aa36...

mrb · on July 21, 2014

$4k is all you need to build a whitebox 256GB machine: dual-socket G34 motherboard with 16 DDR3 slots lke the ASUS KGPE-D16 ($500) + two Opteron 6320 (2 * $300) + sixteen 16GB registered ECC DDR3-1600 (16 * $160) + chassis/PSU ($500) = $4160.

Doing this on Amazon r3.8xlarge ($2.8/hr) would cost you $8100+ over 4 months.

A DIY server can make sense in some cases. Eg. you are a very early-stage startup with almost no funding, and you are good at assembling and troubleshooting computers, and are ok with mere warranty on parts as opposed to full vendor support from HP/Oracle/Dell, go for it. That's how many startups started (see Google and their makeshift half-donated half-loaned boxes in 1998).

ZachPruckowski · on July 21, 2014

Don't forget to factor in the costs of running, cooling, and maintaining that much hardware (also bandwidth costs). I'm not saying it's $2k/month, I'm just saying don't compare the physical box and AWS without factoring everything in. At the end of the day, if you don't need the support, you're probably right that a dedicated box makes sense there.

adventured · on July 21, 2014

AWS has among the highest bandwidth costs you're going to run into, if we're talking co-location, dedicated, or self-hosting. Their bandwidth costs are the sole reason I won't go anywhere near AWS yet, I consider their prices outrageously high.

10tb = roughly $1,000 per month

They're between 5 and 50 times more expensive on bandwidth than the options in dedicated / colo / self hosting.

dekhn · on July 22, 2014

You can negotiate with AWS.

liotier · on July 22, 2014

I don't believe that startups in their garage days have anywhere near the necessary leverage to negotiate anything with Amazon...

jaequery · on July 22, 2014

Agree, not many people actually realize just how slow and expensive AWS actually is. Kudos to SO for going with their own hardware.

dekhn · on July 22, 2014

they negotiate, still. I know several startups which went to amazon, said, "hey, your competitor is cheaper, what is your value prop" and suddenly had 50% discounts, etc. It's amazing how much you get simply asking!

lnanek2 · on July 21, 2014

Even before OS'es supported that much people were sticking that much RAM on expansion bus cards and plugging them in and treating them like fast disks. So it is going to happen either way. The performance boost for the DB is just too much compared to it thrashing disks constantly and there are quite a few benefits to having fewer machines compared to many DB servers as well.

daigoba66 · on July 21, 2014

384 GB is fairly easy to achieve today with 16 GB DIMMs x 24 slots (many general purpose servers have up to 24 DIMM slots). Beyond that you need to jump to 24 GB DIMMs which are fairly expensive.

buckbova · on July 21, 2014

SQL Server only reads data from memory. When something is requested it is first looked for in RAM (or virtual) and if not found is copied from disk to RAM.

The more RAM you have the better without exception here.

snowwrestler · on July 21, 2014

I believe the idea is to hold as much of the DB in memory as possible, for speed, so huge RAM totals on DB servers are not unusual. 384 GB should be well within modern OS reach. Once you're computing at 64 bits, the next theoretical limit on addresses would be somewhere around 16 million terabytes.

edit: wrong number

callesgg · on July 21, 2014

Not that much, we have 128 gb ram on our hardware. That is pretty standard for upper midle class rackmounted servers.

64 bit computers has a memmory limit so large that it is not understandeble.

The ram is still lightning fast compared to other stuff.

Sanddancer · on July 21, 2014

384 gigs is pretty common these days. With Windows, you need at least the server edition to address that much, but if you're running a server, you're not gonna be using the desktop versions of the OS anyways. Pretty much everything else, it's just plug and play. The RAM is quite a bit if you're doing a typical cloud deployment, but SO seems to be more in the model of making sure all your eggs are in a really strong basket. Besides, at the end of the day, RAM tends to be a lot cheaper than lag.

abritishguy · on July 21, 2014

Not true. Windows 8.1 64 bit would be happy with that much.

ddorian43 · on July 21, 2014

*everything else is plug and pay

dekhn · on July 22, 2014

The simplest rule of computing is, if you have a problem that can be solved economically by just throwing more RAM into a single machine at it, you should do it (instead of any other dimension, and assuming you don't have other constraints).

No, that's no common for production servers for popular web services, though- they tend to shard across other dimensions because the web services are often CPU or network-bound.

octo_t · on July 21, 2014

You can get servers with 384GB of RAM now from a vendor like Dell, without needing to call them for just under $14k.

junto · on July 21, 2014

Their SQL Server is Microsoft SQL Server running on (shock horror) Windows.

Says in the article:

Windows 2012 is used in New York but are upgrading to 2012 R2 (Oregon is already on it).

Nick-Craver · on July 21, 2014

To be clear, most servers are moving to 2012 R2 (most already have) but the machines our SQL clusters run on, specifically, will not.

This is because you have to effectively (or literally) rebuild the Windows cluster from scratch and we just don't get that level of benefit from the 2012 to 2012 R2 upgrade. There are quite a few improvements we care about: native NVMe, better dynamic quorum, better DSC support, better SMB, and such...but not enough to make the upgrade worth it.

Nick Craver - Stack Exchange Sysadmin & Developer

Just-A-Guest · on July 22, 2014

Do I get this right? You can not upgrade a server cluster to the next version of the operating system without complete new installation??? Could you please write something about these kind of things, this is very interesting!

Nick-Craver · on July 22, 2014

I'm not sure a full writeup would have much more detail, but yes. With Windows clustering (on top of which SQL 2012/2014 Availability Groups are based), you can only have a homogenous OS version across the board. Since they all must have the same version, you can't have something like a single 2012 R2 instance in an otherwise 2012 cluster...and since you can't do that, you can't upgrade one at a time. Yay! New cluster time instead!

Trust me, we're bitching about this as are most people and I think changes must be coming there. It doesn't matter how many fancy features you add to the OS if we can't upgrade to it, so they'll have to stop and address that problem.

chiph · on July 21, 2014

I've recently started using their micro-ORM Dapper, and I like it a lot. I get the performance of hand-coded SQL, but without the tedious mapping from SqlDataReader to my entity.

teh_klev · on July 21, 2014

I've been using Dapper for around three years, "it just works" and is very fast. I've been around the block a few times with ORM's - LINQ to SQL, NHibernate, Entity Framework, Massive, SubSonic. For small to medium size projects there's no contest, provided you don't mind hand crafting your SQL. Hat's off to Marc, Sam and team for building this.

Aaronontheweb · on July 21, 2014

Dapper is great - I promptly dumped NHibernate and Entity Framework for it. It's the only ORM I've found that makes it trivially easy (as it should be) to use stored procedures and views in queries.

atonse · on July 21, 2014

+1 for Dapper - I'm a big fan. It is very fast and we use it to serve our live (mostly read-only) traffic, while using something like Entity Framework for the backend Administrative UI, which has a lot of inserts and updates.

chiph · on July 21, 2014

There's a couple of user-contrib projects that do similar things for inserts & updates. Dapper Extensions is the one I'm using.

https://github.com/tmsmith/Dapper-Extensions

Aaronontheweb · on July 21, 2014

I wrote a Dapper extension for working with SQL Server's geospatial queries and types a couple of years ago - have they added anything like that yet? Otherwise I'd be happy to add it.

Nick-Craver · on July 22, 2014

Marc Gravell added in the ability to put in pretty much any custom type without adding any dependency weight about a month ago. You can see the commit here: https://github.com/StackExchange/dapper-dot-net/commit/e26ee...

Look towards the end at the tests for example of how to hook up a custom type (it's pretty simple).

Aaronontheweb · on July 22, 2014

Cool - the one I wrote was years ago. Looks like it's not necessary any more.

j_s · on July 22, 2014

Another library worth mentioning in this space is Massive, which uses .NET dynamics a lot (focusing slightly less on performance than ease of use). https://github.com/robconery/massive

pbz · on July 21, 2014

Dapper is nice, but if you want to have the compiler check your queries, and still run fast, I recommend LINQ to DB: https://github.com/linq2db/linq2db

(not to be confused with LINQ to SQL from Microsoft)

Nick-Craver · on July 22, 2014

To be fair, the compiler is checking against what you told it the database looks like. This can help with lots of things, but still has the fundamental disconnect problem you'll get at runtime with either approach.

A huge problem with the tradeoff we had (well, still have in some areas) is the generated SQL is nasty, and finding the original code it came from is often non-trivial. Lack of ability to hint queries, control parameterization, etc. is also a big issue when trying to optimize queries. For example we (and by "we" I mean, "I made Marc Gravell do it") added literal replacement to Dapper to help with query parameterization which allows you to use things like filtered indexes. In dapper we also intercept the SQL calls to dapper and add add exactly where it came from. Here's what that looks like, replicate for the other methods:

    public static IEnumerable<T> Query<T>(this DataContext db, SqlBuilder.Template template, bool buffered = true, int? commandTimeout = null, IDbTransaction transaction = null, [CallerFilePath]string fromFile = null, [CallerLineNumber]int onLine = 0, string comment = null)
    {
        using (db.Connection.EnsureOpen())
        {
            return SqlMapper.Query<T>(db.Connection, MarkSqlString(template.RawSql, fromFile, onLine, comment), template.Parameters as object, transaction ?? db.Transaction, buffered, commandTimeout);
        }
    }

And here's the our marking method tailored for our code, but you get the idea:

    private static string MarkSqlString(string sql, string path, int lineNumber, string comment)
    {
        if (path.IsNullOrEmpty() || lineNumber == 0)
        {
            return sql;
        }

        var commentWrap = " ";
        var i = sql.IndexOf(Environment.NewLine);

        // if we didn't find \n, or it was the very end, go to the first space method
        if (i < 0 || i == sql.Length - 1)
        {
            i = sql.IndexOf(' ');
            commentWrap = Environment.NewLine;
        }

        if (i < 0) return sql;

        // Grab one directory and the file name worth of the path
        //   this dodges problems with the build server using temp dirs
        //   but also gives us enough info to uniquely identify a queries location
        var split = path.LastIndexOf('\\') - 1;
        if (split < 0) return sql;
        split = path.LastIndexOf('\\', split);

        if (split < 0) return sql;

        split++; // just for Craver

        var sqlComment = " /* " + path.Substring(split) + "@" + lineNumber + (comment.HasValue() ? " - " + comment : "") + " */" + commentWrap;

        return sql.Substring(0, i) +
               sqlComment +
               sql.Substring(i);
    }

This results in a comment at the top of the query like this:

    /* Models\Post.LinkedQuestions.cs@105 */ 
    select top (@top) 
           p.Id 
         , p.Title
         , p.Score

It's such a simple approach using the caller member attributes, but it saves so much time tracking things down.

mistermann · on July 22, 2014

I'm curious if the comments above exist in the source code, or have been added here for the benefit of readers on HN?

kmontrose · on July 22, 2014

That's a straight copy/paste from the Stack Overflow codebase (Helpers\SqlMapper.cs:332 at time of writing, if you care). Now the typos and snark are out there for everyone to see.

Akkuma · on July 21, 2014

Another awesome one which I've used on a previous project is SqlFu, https://github.com/sapiens/SqlFu

MichaelGG · on July 21, 2014

So this is what, a 2000 request/sec peak? Over 11 servers, that's like 200 requests/sec peak per frontend?

The problem with scale-up is if you actually have to get a few times larger, it becomes super expensive. But fortunately hardware is increasing so much that you can probably just get away with it now. There's probably a crossover point we're rapidly approaching where even global-scale sites can just do all their transactions in RAM and keep it there (replicated). I know that's what VoltDB is counting on.

Nick-Craver · on July 22, 2014

Peak is more like 2600-3000 requests/sec on most weekdays. Remember that programming, being a profession, means our weekdays are significantly busier than weekends (as you can see here: https://www.quantcast.com/p-c1rF4kxgLUzNc).

It's almost all over 9 servers, because 10 and 11 are only for meta.stackexchange.com, meta.stackoverflow.com, and the development tier. Those servers also run around 10-20% CPU which means we have quite a bit of headroom available. Here's a screenshot of our dashboard taken just now: http://i.stack.imgur.com/HPdtl.png We can currently handle the full load of all sites (including Stack Overflow) on 2 servers...not 1 though, that ends badly with thread exhaustion.

We could add web servers pretty cheaply; these servers are approaching 4 years old and weren't even close to top-of-the-line back them. Even current generation replacements would be several times more powerful, if we needed to go that route.

Honestly the only scale-up problem we have is SSD space on the SQL boxes due to the growth pattern of reliability vs. space in the non-consumer space. By that I mean drives that have capacitors for power loss and such. I actually just wrote a lengthy email about what we're planning for storage on one of our SQL clusters...perhaps I should echo it verbatim as a blog post? I'm not sure how many people care about that sort of stuff outside our teams.

Nick Craver - Stack Exchange Sysadmin & Developer

voltagex_ · on July 22, 2014

>I actually just wrote a lengthy email about what we're planning for storage on one of our SQL clusters...perhaps I should echo it verbatim as a blog post? I'm not sure how many people care about that sort of stuff outside our teams.

I'm sure some DBAs and devs here would find it interesting.

bagels · on July 22, 2014

This is definitely the case. Every writeup from those in the trenches I read and share with coworkers.

cagenut · on July 22, 2014

  I actually just wrote a lengthy email about what we're planning for storage on one of our SQL clusters...perhaps I should echo it verbatim as a blog post? I'm not sure how many people care about that sort of stuff outside our teams.

Yes! please yes. The occasional real world gem like that is what makes scrolling past a thousand javascript-framework/obscure-language/alternative-datastore posts here worth it (for me).

zaroth · on July 22, 2014

I'm most impressed that all your traffic is passing through a single active HAproxy which is also terminating SSL?

TLS Session IDs and tickets of course are absolutely essential, but I'd be curious how many peak TPS (number of full handshakes / sec) you are seeing on HAproxy.

The alternative, fanning out your SSL termination to your IIS endpoints, unfortunately means running HAproxy at L2, so you lose all your nice logging.

Nick-Craver · on July 22, 2014

I'm not sure how many full handshakes/sec we're running - the logging isn't as readily available there because we're explicitly not logging the listeners involved on separate proc binds. We are logging the traffic on the port 80 front end they feed. We have tuned the performance there via best practices and taking into account very recent HAProxy changes for various SSL windows added by Willy, Emeric and help from Ilya Grigorik. They're all awesome and tremendously helpful. We are on a version slightly newer than 1.5.2 (we build straight from the repo).

Even with all that SSL we're only running around 15% CPU at peak so it's not having any trouble. Most of that CPU usage does come from the SSL termination though - it ran around 3-5% CPU before. We're also working on much larger infrastructure changes that mean SSL termination much more local to our users, which means the HAProxy load will drop to just about nothing again. I'm working on a followup to my previous SSL post: http://nickcraver.com/blog/2013/04/23/stackoverflow-com-the-... that will explain why it's not already on for all hits...I think hacker news will have fun with hat one.

All that being said, there's no reason we can't forward that syslog traffic from the listeners to our logging infrastructure to get at least a counter out of it. If you're curious how we log, I'll explain a bit.

We feed it into a logging display called Realog built by Kyle Brandt and Matt Jibson in Go. Here are a few snapshots of the dashboard: http://i.stack.imgur.com/OIqhm.png http://i.stack.imgur.com/iQOP9.png http://i.stack.imgur.com/bfpUb.png http://i.stack.imgur.com/JZNy6.png

This lets us forward on custom headers to logging that the web servers are sending back. It tells us how much time we spent in SQL, redis, etc. We can graph it (bottom of the first link) and identify the root cause of any issues faster. It also handles parsing of that traffic from the syslog entry already, so we use it as a single logging point from HAProxy and it handles the JSON generation to forward that traffic data into logstash (a 300 TB cluster we're working on setting up right now).

As soon as we get the logstash piece working well and dogfooded thoroughly, I'll poke Kyle to open source Realog so hopefully some others can get use from it.

Nick Craver - Stack Exchange Sysadmin & Developer

joshyeager · on July 22, 2014

My team would definitely be interested to read that. Please do!

qntmfred · on July 22, 2014

yes please

sklivvz1971 · on July 21, 2014

Yes, the idea is to scale up as much as economically makes sense and only then scale out, but due to Moore's law we're still scaling up.

Not everyone has google-like problems that are betters solved by a battery of cheap boxes.

threeseed · on July 21, 2014

Actually most of the websites today are better solved by a battery of cheap boxes.

You simply put a load balancer (Nginx/ELB/HAProxy etc) in front of a fleet of smaller web/application servers that dynamically scale depending on traffic. That way it is cost effective, far more reliable, easier to scale and you can tolerate DC outages better.

mwsherman · on July 22, 2014

As with most systems, the data store is the hard thing to scale out. That requires intelligence at the application level, and that might include CAP issues.

So for us, the SQL data store is the real “up” part of the equation. We have a fair amount of headroom there, so if we can keep sharding (and other data strategies) out of the codebase, so much the better.

(Load-balancing HTTP requests “out” is not a big deal and we are doing that.)

sklivvz1971 · on July 22, 2014

Marco/Stack Overflow here.

Simply put, this is not true at all. Most of the websites today run happily on a single box and the choice of architecture is frankly irrelevant because of the abysmal amount of traffic they get.

Large, and by large I mean top 100 sites, are large and typically complex systems. Each one is different and bespoke. Making broad generalizations doesn't really work, because typically they have parts that need to be scaled up, parts that need to be scaled out, etc.

Finally, if you read the article you would know that we already use a load balancer with multiple web servers. What we are discussing here is whether it's better to have 100s of cheap/cloud boxes versus a few big ones.

In our use case (we want fast response times) the second solution is demonstrably better.

mwfj · on July 21, 2014

Not quite sure how you derive a peak figure from a monthly average; but on average it's more like 560000000/(30 * 24 * 3600) = 216 req/s

MichaelGG · on July 21, 2014

I took 4 hours at 20 days to get the peak, figuring it probably won't be worse than that. The average over the month isn't a number you'll see in practice. Most perf engineering work seems to go to keep things working nicely while at peak. (And making sure you have good 95th percentile times.)

lemcoe9 · on July 21, 2014

Makes me wonder why smaller website teams need dozens of engineers to keep their infinitesimally smaller app running.

Swannie · on July 21, 2014

From my limited experience, it's a lack of the kind of systems thinking present in companies like Stack Exchange, and present on the High Scalability blog, that results in a poor architecture.

And this is compounded by people who have little ability to troubleshoot performance issues. It's quite easy to hunt down the cruddy SQL queries in a DB, or realize that you spinning rust is too slow. But when it starts to come down to things like a blocking network fabric, which has some big fat buffer between two servers, that is killing your transaction speed - many will just start to blame the devs.

Of course, the second point is compounded by the first - the less systems thinking that went into the design in the first place, the harder it is to produce accurate hypotheses about the system in order to troubleshoot.

Whilst there is a lot to be said for the "right tool for the job", if you watch an artisan crafting something, you'll realize that despite having a huge number of tools, they actually get by with relatively few. This is a generalization, but they only use the full range when doing something new, solving a particularly tricky problem, etc. "All the gear and no idea" is certainly applicable in many start ups.

daigoba66 · on July 21, 2014

I personally think it's due to overly complex applications and systems with many moving (and breaking) parts.

Also StackOverflow often practices "scale-up" instead of "scale-out".

theandrewbailey · on July 21, 2014

More specifically, I think it's due to long feature lists and multiple problem domains. I'm also thinking about in-house line-of-business software.

ugh123 · on July 21, 2014

Also due to political managers that want to increase their head-count to give the impression they have more power.

valarauca1 · on July 21, 2014

There are hundreds of reasons at nearly every level of a company that can contribute to this situation. Its best summed up by Sturgeons Law, "90% of everything is crap."

edgyswingset · on July 21, 2014

Rigid adherence to certain technologies could be a culprit. "We're a Ruby shop!" is fine for problems where Ruby is great, but the limitations of the technology are bound to creep up somewhere. Not using a polyglot approach would be bad.

nostrademons · on July 21, 2014

It can also be caused by too much embrace of polyglot tools. "We use the best tools for the job" can easily mean an architecture with Redis, Memcached, Mongo, MySQL, Angular, JQuery, Node, and several dozen third-party libraries, all glued together with Thrift or JSON-RPC. If all you're doing is a messaging app, you might be much better off with a single in-process Java app.

Usually you want your solution to be just right for the problem; solve the essential complexity but don't introduce any accidental complexity.

swah · on July 22, 2014

I wish more people considered this alternative when recommending the usage of queues etc. I always remember of this post: http://williamedwardscoder.tumblr.com/post/18065079081/cogs-...

nostrademons · on July 22, 2014

Hacker News also runs on a single box using only in-language hashtables and lists. ViaWeb did too, and PlentyOfFish.

MasterScrat · on July 21, 2014

Guess they don't have Jeff Atwood and Joel Spolsky onboard.

theandrewbailey · on July 21, 2014

As of about 2 years ago, Stack Exchange doesn't have Jeff onboard either.

http://blog.codinghorror.com/farewell-stack-exchange/

teh_klev · on July 21, 2014

Joel is very much still around and active. Jeff went off to build Discourse [0] although I think he still has a place at the co-founder table as invisible-benevolent-dictator.

[0]: http://www.discourse.org/

tshaddox · on July 21, 2014

Probably because the engineers are a lot worse.

sz4kerto · on July 21, 2014

Extremely interesting read, especially because it goes against many fashionable engineering practices. They see (correctly) that abstractions, DI have their own compromises. Also, they're not religious (about technology), that's also quite rare.

ternaryoperator · on July 21, 2014

"SO goes to great lengths to reduce garbage collection costs, skipping practices like TDD, avoiding layers of abstraction, and using static methods."

I don't understand this at all. What does TDD have to do with reducing garbage collection?

numbsafari · on July 21, 2014

My guess is that they feel that the layers of indirection and abstraction often needed to make TDD work result in an object creation pattern that results in heavy GC load during normal operation. The references to "using static methods" is probably related to this.

ps. That's my guess, but I'd encourage you to post your question to the meta site for SO.

sklivvz1971 · on July 22, 2014

Marco from Stack Overflow here. We try to measure everything, including GC operation. We know that layers of indirection increase GC pressure to the point of noticeable slowness :-)

It's not a theoretical risk we're trying to avoid here, it's a real problem that needed solving.

bjourne · on July 21, 2014

IME that always happens when you try to perform tdd in combination with Javaesque encapsulation. The good solution to the problem is to not be so afraid of classes seeing each others internals. The bad solution is to add, factory patterns, dependency injectors and other useless layers just to try and keep your design both well encapsulated and testable.

edwinnathaniel · on July 22, 2014

You do realize that most efficient DI framework only inject the dependency once.

Separating Controller, Repository, and Services are good practices as well and let's be honest, we're looking at 3 methods layer at most.

Here's what happened in Java:

1. When you deploy your WAR, the DI will inject necessary component _once_ (and these components only instantiated _once_ for the whole web-app so there you go, Singleton without hardcoding).

2. A request comes in, gets processed by the pumped-up Servlet (already DI-ed). If the servlet has not been instantiated, it will be instantiated _once_ and the instance of the Servlet is kept in memory.

3. Another request comes in, gets processed by the same pumped-up Servlet that is already instantiated and has already been injected with the component (no more injection, no more instantiation, no more Object to create...)

So I've got to ask this question: what GC problem we're trying to solve here?

Some of the static methods are understandable but if Component A requires Component B and both of them have already been instantiated _once_ and have been properly wired up together, we have 2 Objects for the lifetime of the application.

I'd pay for a wee bit extra hardware for the cost of maintainable code ;)

Discipline (and knowledge of object graph) definitely help to reach to that point.

bjourne · on July 23, 2014

You probably meant to respond Marco's comment? And afaik Stackoverflow is written in ASP.NET, not Java.

edwinnathaniel · on July 23, 2014

C# or Java, the whole request pipeline processing should be more or less the same. Unless one platform does things less efficient than the other.

Rails and Django do things differently as to my knowledge they do it by spawning processes instead of threads. There are app-server for Rails or Django that may use threads for efficiency/performance reason but I am under the impression the whole LAMP stack is still 1 request 1 process (even though they re-use those processes from a pool of already allocated processes).

mistermann · on July 22, 2014

> The bad solution is to add, factory patterns, dependency injectors and other useless layers just to try and keep your design both well encapsulated and testable.

Not doing this will cause much self-righteous snickering from some.

jacques_chester · on July 22, 2014

Insofar as this is your guess, it's down to the language. In Ruby or Javascript it's very easy to TDD. In Java, much less so. (I'm not sure where C# lies on the spectrum).

rjbwork · on July 22, 2014

Somewhere in the middle, but leans towards Java.

jacques_chester · on July 22, 2014

Sounds about right.

At the risk of exposing my slow transmutation into a hipster programmer, I and a colleague found that mocking in Go was much easier than we anticipated, thanks to the way interfaces work.

rakoo · on July 21, 2014

What I'd like to see is how much cache hit they add on each level of cache. I remember some presentation of the facebook images stack where less than 10% (if I remember well) of the requests actually hit the disks; it would be interesting to see the patterns for the whole SE galaxy.

skeletonjelly · on July 22, 2014

As someone mentioned above, SQL Server hits memory first, and as they have 384GB of memory, most of the requests would sit there. That's just on the DB server. In the linked article it all authenticated requests hit the DB, with anon users getting a cached copy.

sklivvz1971 · on July 21, 2014

It totally depends on what we are caching (e.g. the community bullettins or hot meta questions get a gazillion hits...)

jonhmchan · on July 21, 2014

Relatively new stack dev here. I came in on the other side of the fence of a lot of these technologies (bread and butter is Python in Flask with Mongo on Heroku on a mac) but since I started here, I've been constantly and pleasantly surprised by how performant everything here has been despite my biases. It's mighty fun.

AlisdairO · on July 21, 2014

Stack Overflow is the example I'm forever using when arguing against premature scale-out. For non-trivial applications scale-out has substantial complexity costs attached to it, and the overwhelming majority of applications will never truly need it.

It's frustrating to see time wasted obsessing over trying to maintain eventual consistency (or chasing bugs when you failed to do so) on systems that could quite happily run on a single, not that beefy machine.

samstave · on July 21, 2014

>For non-trivial applications scale-out has substantial complexity costs attached to it

Forgie me if I am misunderstanding you - but non-trivial applications can actually require scale out.

From my perspective, StackExchange is not techinically that complex. They have built a very efficient, cost-effective and performant stack for their singular application and that works very well for them, but the complexity of their forum is not an extraordinarily complex problem.

AlisdairO · on July 22, 2014

By nontrivial I mean 'application which, in your document database, requires multi-document updates to perform some individual logical operations'. This is a rather low bar :-).

The fact is, the vast majority of projects that programmers are working on are less computationally complex than stack overflow. That's not to say that forum software is all that complex, more that most problems are pretty simple. Of course there are real reasons to use scale out - I simply advocate thinking hard about whether your problem will ever truly need it before taking the substantial complexity hit of coding for it.

AlisdairO · on July 22, 2014

Er, I don't seem to be able to edit, but I guess I should specify that I'm really talking about scale-out of writes here. Read scalability is an obviously easier problem..

nmjohn · on July 21, 2014

> AWS would cost a fortune.

Isn't that comparing apples to oranges? Taking any application built for large servers and putting the same application onto a cloud-based architecture will be more expensive.

Cloud-based architecture requires ground up differences in how the application is built. Now whether or not it is better to use a cloud-based approach or traditional bare metal is highly subjective and isn't my point.

incision · on July 21, 2014

>'Isn't that comparing apples to oranges?'

On another site, in another context, it probably would be, but here it's really presented as a contrast of scale-up versus scale-out - something the regular audience of highscalability will certainly grok.

In context...

'Stack Overflow still uses a scale-up strategy. No clouds in site. With their SQL Servers loaded with 384 GB of RAM and 2TB of SSD, AWS would cost a fortune. The cloud would also slow them down, making it harder to optimize and troubleshoot system issues. Plus, SO doesn’t need a horizontal scaling strategy. Large peak loads, where scaling out makes sense, hasn’t been a problem because they’ve been quite successful at sizing their system correctly.'

It's an acknowledgement of their relatively unique strategy and the short list of caveats that make it possible.

fideloper · on July 21, 2014

I love that all the traffic is running through a single HAProxy box - a software load balancer (vs a hardware LB like F5).

And they've moved SSL termination to it.

That's a great quality product, and easy to setup.

Edit: I work I'm software that supports MySQL,PG and SqlServer. SqlServerr seems to be the most stable and consistent in performance - they're hard to kill! One of my few liked MS products :D

ck2 · on July 21, 2014

This is the fascinating part to me, their SSD have not failed:

Failures have not been a problem, even with hundreds of intel 2.5" SSDs in production, a single one hasn’t failed yet. One or more spare parts are kept for each model, but multiple drive failure hasn't been a concern.

Nick-Craver · on July 22, 2014

Yep, still true. We lost one Intel 910 drive (PCIe SSD), and that was very abnormal - died so soon it was almost DOA. We hooked up directly with Intel for them to The replacement is still going strong as is another 910 we have.

All of those 2.5" Intels though, still trucking along! We're looking at some P3700s PCIe NVMe drives now, blog post coming about that.

kmontrose · on July 22, 2014

There's a real difference between consumer and enterprise grade SSDs.

For comparison, for a while we nearly had a 1 to 1 dev to killed (consumer) SSD in work machines (see: http://blog.codinghorror.com/the-hot-crazy-solid-state-drive...). Granted, we did adopt them (see: http://blog.codinghorror.com/the-state-of-solid-state-hard-d...) for local dev use much earlier than we started putting them in servers.

I've killed 2 in my time here, though I do horrible big-ish data things that are either not SSD friendly or actively hostile.

dekhn · on July 22, 2014

I would agree that hundreds of drives is a large enough sample that you would expect to see some failures.

That said, all the SSDs I've had have taken repeated pounding with no complaints. I accidentally swapped a few terabytes to swap on an SSD, and was simply surprised that the job I was doing finished faster.

programminggeek · on July 21, 2014

One smart thing they are doing is putting different sites on different databases. It effectively as a kind of partitioning to allow for horizontal scaling if needed.

If nothing else, it keeps less data in each table, so queries should be faster due to smaller datasets.

Sounds like they know what they are doing.

beachstartup · on July 21, 2014

beware that you can over-optimized this.

for example, do not put every single customer of an saas solution into a separate database.

csharrison · on July 22, 2014

I'm curious, why not? If your saas was small enough and had few enough customers with large data needs, separating them each out into a separate database seems like a viable solution.

beachstartup · on July 24, 2014

yeah, if you have an experienced team, you can easily refactor your one-customer-per-db model into something that will handle 1,000 new customers signing up every month. given a good enough team, you can even do this while doing other sort-of important things like customer support, bug fixes, and new development.

however, inexperienced developers and teams are not good enough, or fast enough, to do that, and end up painting themselves into a corner when they have thousands of live customers to support and need to change their entire application architecture + database schema when their backup, replication, and housekeeping tasks choke on 1,000+ databases.

and oftentimes, one-customer-per-db also means one-instance-of-application-per-customer, another anti pattern to avoid.

we see this kind of stuff ALL the time. it happens a lot - people make terrible decisions and then are stuck with them 5 years down the line and are looking at a monumental cost to redo. not everyone is experience enough to work their way out of an awful situation like that.

ksec · on July 22, 2014

It is fascinating to see how well its server it holding out. With Plenty of headroom to spare!.

With Higher Capacity DDR4, and Haswell or even Broadwell Xeon, and PCIe SSD getting cheaper and faster. It is not hard to imagine they could handle 3 - 4 times the load when they upgrade their server two years down the line.

But i would love to see Joel's take on it though. Since he is a Ruby guy now, and I dont think you could even achieve 20% of SO performance with RoR.

farresito · on July 22, 2014

It would definitely be several orders of magnitude slower. That's the price you pay when you are using a language like ruby: it's very productive and fast to iterate in, but way less performant. At the end of the day, what you use depends on your needs.

ksec · on July 22, 2014

Arh... I actually meant Jeff, not Joel.

Hardware is cheap, Programmers are expensive! But i am sure there is a line where this crosses over though.

judk · on July 22, 2014

The point of this article is that it costs more programmers to maintain a cloud than to maintain a single master database

tjdetwiler · on July 21, 2014

Can anyone provide context on "Heavy usage of static classes and methods, for simplicity and better performance."

Is performance _really_ an issue here?

MichaelGG · on July 21, 2014

I find the opposition to static classes odd. Unless you need the data encapsulation of an object, why force people to create an object to call functions? I find too much C# code loves creating objects for no real purpose other than that's how it works.

For SO, if we guess they max out around 2000 req/sec, then if a bunch of objects are being allocated for each request, there could be added GC pressure. From my own experience developing much higher-scale managed code, allocating objects is a real pain and people will take a lot of effort to avoid it.

Griever · on July 21, 2014

I think the real issue (although it doesn't seem to be an issue for SO) is testability. AFAIK, static classes cannot be mocked, and you are pretty much boxing yourself into only doing integration tests for everything. Not necessarily a bad way to go, just lots more setup per test.

mattmanser · on July 22, 2014

It completely depends if you do this:

    public static decimal DiscountValue(Order order)

Or this:

    public static decimal DiscountValue(decimal orderValue, decimal orderDiscount)

Or even this, which would farm the work out to SQL:

    public static decimal DiscountValue(int orderId)

The first would require a mock, the 2nd wouldn't and the third isn't unit testable.

It's a bit of a contrived example, as you're more likely to do the third method if you're trying to get some sort of summary data about an order out which would mean you don't really need/want to load the whole order object.

MrBuddyCasino · on July 22, 2014

Yep, that used to be one of the main reasons to have DI in Java. But now that everything (even static methods) can be mocked, that reason is no longer valid. Is this not possible in .NET?

tjdetwiler · on July 21, 2014

A singleton object doesn't need to be re-allocated.