I get a bit annoyed by this sort of post, blanket statements like:
"With our code base, you won’t be able to easily do test driven development or similar practices that the industry seems to love."
Why does the industry seem to love testing? Does the industry love testing? Does this mean you do or don't love testing?
It also feels like they wrote this blog post to show off: we write our c# code as if it was c code because we are better than all of you.
The fact is that every application is different, they chose to use a language that back in 2000's wasn't designed for performance and so they paid the price of using it for performance and didn't want to pay the scale out price.
Is their codebase testable, yes completely. Will adding unit tests to their code destroy their performance? no, there are always ways to make it testable and fast. Maybe in a few years we will see a "We thought testing was impossible but we made it work and also keep our codebase fast".
They're really not selling their codebase or practices to me to be honest. If their system is built like that it might have enough performance to run a top x site on only a handful of bare metal servers, but the article also implies there's a lot of "here be dragons" code - don't touch this, you are not clever enough to test this, and since we don't have tests, you might break it by touching it.
Which is just bad practice on the one hand, and elitist on the other.
I for one am thankful I don't have to work with those constraints, and I can adopt a better mindset - make it work, make it pretty, make it fast, for example. I think you cannot optimize a system if you cannot read and understand it.
Also note I explicitly said "system", not "codebase"; at a big enough scale, you can't look at performance as a factor of code, but of architecture and patterns. You need to be able to get figures from specific systems in the whole architecture, not individual lines of code. Make the architecture easy to analyze and optimize.
> the article also implies there's a lot of "here be dragons" code
Agreed, fine if enough of the core developers are around, but if the company merged, the app needed to be extended or altered significantly, or some of the underlying technology stack is deprecated/becomes unsupported keeping things working will be very expensive.
> If there’s been enough organizational turnover or enough feature creep in a product, then maybe doing a rewrite is the best option so your team has a collective understanding of the code. You can’t expect people to be productive in something that was a culmination of [complicated "here be dragons" code] made by people who no longer work there. At that point your technical debt balloon has popped, you are in possession of a toxic asset.
I think that they are in the situation where it can't be altered significantly which ironically is probably a good thing for stack overflow - they don't have reams of features.
> Which is just bad practice on the one hand, and elitist on the other.
I got exactly the same impression. Making a strawman out of the rest of the market and special-casing yourself always sets off a red flag. It indicates the possibility of a culture could be out of touch with reality, that chooses to rationalize its poor decision rather than facing reality. I wouldn't work at a firm with this kind of culture, it's a liability.
I once worked in a shop where a few teams made blanket rules for clubs of reviewers for different core technologies. There was a Python reviewers group a Puppet reviewers group etc.
The membership of these groups bore little resemblance on the technologies people used and had to work in day to day, but instead reflecting the current group politics of a large engineering organization. As the reviewer enforcement was wired into the deployment tool - appeasing these reviewers was required to be productive. In practice the reviewers would road block entire team's initiatives.
The end result was a proliferation of technologies across the company such that teams could practice team code review rather than adhere to the wims of the anointed reviewers.
I also found the "or similar practices that the industry seems to love" coming across as a bit vague and even arrogant.
On the other hand, I think that they have earned bragging rights - we're talking about a site that has massive scale and reach, and has a reputation for being a very efficient system. I appreciate that people like them share their views, even if they are contrarian and a touch arrogant.
> they chose to use a language that back in 2000's wasn't designed for performance and so they paid the price of using it for performance and didn't want to pay the scale out price.
In terms of more performant languages at the time what could have they picked? C++? It sounds like they would've lost a lot of the conveniences that a framework like .net gave them. Isn't the fact that they avoided paying the scale out price remarkable?
It's important to remember Joel Spolsky, one of the founders of stackoverflow is an old school Microsoft guy that managed the Excel team before starting his own company. He has always been a bit of a Microsoft fanboy, and used to advocate a lot for VB as a serious language back before it got swallowed by .Net and turned into a less powerful syntax for C#. The fact they chose C# isn't surprising at all given that. It actually would have been far more surprising to see them pick anything but C#.
I think the previous poster was a bit off the mark though. The language performance wasn't really the issue there, rather it's the fact that they picked a language that at the time really only ran on Windows, and as a consequence they were forced into running their web servers on Windows. That choice then forces them to scale up rather than out since each instance has license costs attached to it. For most companies running on Linux, it's trivial to scale out since your only costs are the compute cost (or the hardware cost in a non-cloud model), where as it tends to be far more expensive to scale up as more powerful hardware tends more towards geometric increases rather than linear. These days the choice of C# wouldn't be such a big issue as .Net core can easily run on Linux servers, but back in the 2000s using C# was putting a pretty big albatross around your neck by way of Windows licenses.
> it's the fact that they picked a language that at the time really only ran on Windows, and as a consequence they were forced into running their web servers on Windows.
If the founders and early employees were from Microsoft it might have been easier for them to use Windows Server since they were already pretty well versed in Windows development.
It's a pattern I constantly see: "Why did your startup use X instead of Y?"
"Ohh well X has this feature that Y lacks and so and so... ohh and the founder and his friend were pretty good at X and used it before."
I agree with you that it seems like a self-imposed limitation. At the same time, it makes one think how said limitations can actually foster creativity and efficiency. They mention in the post - they constantly gloat about this - that they could run SO on one or two machines.
I'd imagine that said machine would need to be a behemoth and not a t3.micro, but intuitively I feel that this would be much cheaper than the average horizontally scaled web application. Or in other words, that they're hyperefficient, regardless of architecture.
Does anyone have any insight on whether this intuition is on the right track?
Eh, not sure that follows. Here's the thing, costs aren't linear. If you do what AWS does and create some artificial "compute units" as a sort of fungible measure of processing power, what you'll find is that the sweet spot for price per compute unit is a medium power system. The current mid-range processors tend to be slightly more expensive than the low-end processors, but significantly cheaper than the high-end processors.
So, hypothetically, lets say you can get one of 4 processors, a low end one that gives you 75 units for $80, a mid-range processor that gives you 100 units for $100, a high-end processor that gives you 125 units for $150, and the top of the line processor that gives you 150 units for $300. If you normalize those costs, your 4 processors get price-per-compute values of $1.01, $1, $1.20, and $2. The best value is at the $1 per compute unit price point of the $100 processor. Logically if you need 150 units of compute power you have 2 choices, you can use 2 $100 processors, or 1 $300 processor. Clearly the better option is the 2 $100 processor. This would be scaling out. In the case of what SO did though, they took that off the table, because their formula isn't just the cost of the processor (ignoring related things like RAM and storage), but also includes a per-instance license cost. Their math ends up looking more like $100 CPU + $150 windows license times 2 totaling to $500, vs, $300 CPU + $150 windows license times 1, totaling to $450, which ends up making the more expensive processor the cheaper option in terms of total costs.
> If you do what AWS does and create some artificial "compute units" as a sort of fungible measure of processing power, what you'll find is that the sweet spot for price per compute unit is a medium power system.
At least when I look here the costs are linear.
E.g. c5.2xlarge is double a c5.xlarge, and a c5.24xlarge is 24 times higher than a c5.xlarge.
That's interesting, it didn't used to be linear, particularly on the very large instances. Oddly when looking at the prices for RHEL those are much closer to what all the prices used to look like (I hadn't actually looked at AWS pricing in a few years). I wonder if AWSes virtualization tech has just reached the point now where all processing is effectively fungible and it's all really just executing on clusters of mid-range CPUs no matter how large your virtual server is.
They were/are on bare metal so they have real dedicated cores and no VM overhead. Two bare metal cores is a hell of a lot faster than two virtual cores. I had been away from infrastructure during the rise of vmWare and didn't realize that the core count was potentially fake. I was sparring with an admin over resources one day and mentioned the number of cores on the VM and he just laughed. "Those are virtual, you don't really have four cores." Be careful in your assumptions about these VMs in the cloud.
He never managed the Excel team and he didn't write code for the product. He was a PM and wrote the spec for VBA. An accomplishment to be sure but not much different than all the other great individual contributors Excel has had over the years.
He did not have a management role. PMs write specs and talk to the customers. That time was also very much the "cowboy coding" era so part of the job was convincing devs they should implement it how he spec'd because he couldn't force them. I think that's part of why he was so popular with his blog aimed at programmers, he had lots of time honing his persuasive technique.
> It sounds like they would've lost a lot of the conveniences that a framework like .net gave them.
Would they have? It sounds like they put a lot of effort into bypassing the garbage collector by creating buffers up front and then never letting then reusing them so they can never be garbage collected. Best practice in embedded C code is to allocate buffers up front and then reuse them. The terminology in the above two sentences is different, but the way the code is written is exactly the same.
Once you lose garbage collection does .net give you anything over C++? I don't know enough about .net to answer the question - even if I did the answer probably depending on if you are doing something where .net has a saner syntax, or run of the mill code that looks almost identical in C++ as .net (C++ templates are legendarily bad for a reason, if .net is even slightly better it doesn't take too many generic things to make .net better)
> Once you lose garbage collection does .net give you anything over C++?
You don't need to lose garbage collection altogether. There's a middle ground here: if you can switch to manual buffer management for the most GC-heavy, speed-critical code, that still lets you use GC with impunity everywhere else.
> I don't know enough about .net to answer the question - even if I did the answer probably depending on if you are doing something where .net has a saner syntax, or run of the mill code that looks almost identical in C++ as .net
I don't know enough about either C#/.net or C++ - I was merely speculating.
I know there are some web frameworks for C++, but I've never heard of any big websites being built with them. I am assuming that, even back then, .net provided a lot of modules for building websites, not only templating but also auth, email, managing statics, an ORM (I know they ended up rolling out their own, but maybe they used .net's in the very early stage), etc.
Adding test coverage might indeed destroy their performance, in that they might expose where their system's bottlenecks are and realize that they've been focusing on the wrong things. When it comes to performance optimizations, folks' intuitions are almost always wrong. It's important to actually test and measure stuff like that.
I think it's true in general that modern CPUs are insanely fast at executing code within a single process, and context switches are orders of magnitude slower due to their overhead. So, monolithic systems are going to be intrinsically "fast" compared to systems that pass messages through an operating system. They seemingly made a good decision for their project there, whether or not they made that decision rationally.
It's possible they invested a lot in infrastructure for runtime performance profiling and monitoring and therefore know exactly what pieces of code to optimize. Reading the article, though, they're seemingly sweating the overhead of polymorphic function calls. The overhead of instrumenting the code would likely be higher than the overhead of polymorphism in a language like C#, so if their code was actually that performance sensitive, they wouldn't have it instrumented in production. If they did have it instrumented, they would likely have realized that their code wouldn't be bottlenecked by a little bit of function call indirection. Also, all the talented folk I know that stress the importance of runtime instrumentation also stress the importance of regression testing, so I suspect it would be pretty rare for a team to value one but not the other.
I'd like to take this opportunity to encourage people to learn about sampling profilers. What they do is periodically record the program counter and call stack of a process, then aggregate these to get a picture of where the program was spending its time. With a low enough sampling frequency they can have negligible impact on the speed of the software being profiled, they require no changes to production code, and they operate with fine enough granularity that they can easily tell you if (for example) a particular polymorphic function call is adding significantly to the runtime of a hot loop.
Often all you need to do to find out what the real bottlenecks are is to point one of these profilers at a running program and look through the results. Super low effort, potentially big payoff.
It's hilarious that you are skeptical of the competence of the SO core devs. You should do a little further reading, there is lots of content available online from the core devs who actually "do the work" there (ie, not the people who wrote this fluffy blog post).
I'm basing my opinion on reading the article. The claims that were made are consistent with what I've experienced working with folk with relatively limited software engineering skill. I don't think that's an unfair or mean thing to say. The whole premise of the article is to try and convince me that not following best practices can be a good thing. To me, the article seemingly reads as defensive, as if the author is actually trying to convince themselves or their coworkers that they aren't to blame for an accumulated decade of technical debt.
In reality, there's no blame to assign when tackling tech debt. Real projects are messy. Folk defer work for short term gains at the cost to their future self. Growing tech debt is just an eventual engineering problem to tackle like anything else. Best practices are distilled wisdom intended to keep those sorts of problems from growing out of control.
Re: the competency of the SO core devs, they very well could and probably are solid within their domain. Not knowing anything about them though, I think it's possible that they're a bit siloed off in their knowledge. If a lot of their core team were there from the beginning of the project and were relatively inexperienced at the time, they may not have ever invested their time into developing test tooling. In some ways it's very impressive that they're able to keep that large of a project going with so little testing infrastructure.
FYI, the first listed author of this post has the job title Principal Software Developer, which I assume makes that person is one of the people who write the code.
Fair enough, thanks for the correction. I guess the tone of the post doesn't convey that they know what they are doing to some people, although it should be obvious that they didn't just sleepwalk into having one of the highest "power to weight" ratio web apps on the Internet.
I think you need both. Intuition most of the time to prevent the problem of a little bit of slow everywhere adding up, and then profiling to make sure there is no huge thing hiding in the blindspot of your intuition.
Lines like that[0] sound like the goal is to please the industry, without questioning why the industry loves these things. It's good to question established best-practices, but it's also good to understand why they are best-practices.
[0] "With our code base, you won’t be able to easily do test driven development or similar practices that the industry seems to love."
Something not mentioned but is a much bigger problem than performance for most is that if you write too many tests, you may be unable to easily upgrade/update to newer versions without a significant amount of work to update all of the tests.
You’re right to call out performance as a reason for testing. Unfortunately, though, it’s usually not that easy. A problem not found in testing may be the result of inexperience or lack of foresight in design, such as the assumption that another adapter service in front of that enterprise cloud service that only exists in production isn’t a bottleneck, or that your code that had 100% unit test coverage turned out to leave nasty half-updated data due to non-transactional services and was deadlock-prone with memory leaks.
Isn't that a bit of a red herring? Either I have automated tests that will take time to update every once in a while, or I have manual regression tests that I have to run for every code change. The automated test path seems to be the much less time intensive path, depending on how often you make big changes.
On the other hand I really believe 100% unit test coverage is a bit of a waste of time. Unit tests should be reserved for algorithmic\calculative code, more mundane code logging\carting data from a to b type code should be covered by automated black box testing.
That depends on the tests. I've seen tests that are the code is implemented like X. I've seen tests that are the code uses some API to do X. I've seen tests that are the program does X via some user interface.
What you want are tests that the program does X, regardless of what computer it runs on, or what the interfaces are. However this is impossible long term. You MUST assume something that is subject to change in the future. Some changes are more subject to change than others.
Automated tests are an assertion that "this will never again change". You can make an educated guess as to what will and will not change, if you guess right then you don't need to rework your tests. When (not if!) you guess wrong you need to rework tests.
Thus automated test time + automated test maintenance is usually less than manual test time + manual test maintenance. Mostly because execution time for automated tests is so much shorter than manual execution time that the higher cost of automated test maintenance is recouped and then some.
The only way for framework\OS changes to impact that is if they happen often enough to make automated test maintenance "weigh" more than all the time saved by no longer having to manually run your test suite.
I don’t think we’re speaking the same language. I’m talking about how much time it takes to change the code in every unit test, etc. because you upgraded something that requires changing the test framework, and this library and that one, so you end up not only having to fix all of your production code, but all of the tests with their mocks, stubs, injections, annotations, xml config, etc. If you have thousands of tests, this can keep people from upgrading.
> you may be unable to easily upgrade/update to newer versions without a significant amount of work to update all of the tests.
That's a code smell. Tests should only be updated to reflect a new understanding of the requirements of the system. If you're writing tests that are dependant on the implementation of code and not the behavior of code, then you're too tightly coupling your tests to your code.
Because testing coverage is a typical way modern software engineers and managers show off an otherwise mediocre codebase.
>because we are better than all of you.
Well, at least they have something tangible to show for it. TDD elitism is much more ridiculous, because (as I've seen over and over) having high test coverage does not mean the code is bug-free or even properly modularized. If you want to brag about your tests, don't shot me your coverage, show me how you have 10 times less bug reports than the next guy.
The industry loves testing because the industry (big handwave, encompassing a huge pile of heterogeneous software development houses) is in the business of risk management and scaling practices.
If SO needs to double its engineering base tomorrow, can they? A well-tested infrastructure means you can trust engineers less deeply familiar with the system to make improvements without breaking pieces they aren't familiar with. The tests provide a bit of a behavior-expectations guarantee (not foolproof, as there's good language theory indicating why unit testing can't be sufficient for proof of correctness, but better than nothing). Without it, you have less safety net if junior programmers change component A failing to realize that component F deeply removed from it was assuming some behaviors of A.
One piece of this post worth noting:
> Currently, we’re trying to change this. We’re actively trying to write more tests and make our code more testable. It’s an engineering goal we aim to achieve, but the changes needed are significant.
... and that means if they have a reason to massively scale-up tomorrow, their codebase isn't starting in a good place to make that change. They've made a first-to-market risk tradeoff that they may have to pay for if they need to add a bunch of new engineers right now and have them operating independently by Friday.
> show me how you have 10 times less bug reports than the next guy.
You can't actually do that because there are too many other differences. However good tests do in the long run tend to result in high quality code.
If nothing else because sometimes the obvious fix for one bug breaks something seemingly unrelated where the obvious fix is to undo the first first. I've seen that go on for 4 rounds over 8 years in one company with different contractors doing the work every time (only by luck one of the few full time employees happened to review the code twice and think to check version control to see the history of the obvious fix).
Tests are more expensive to get an equivalent quality version one out the door in general. (if you have a 15 year development cycle maybe not). However by the time you are on version 3 you save yourself a lot of retesting the same thing again to get quality.
Though the most obvious affect of tests is as you say: a metric that management can brag about. Also one of the few metrics where gaming the metric for more isn't entirely bad (though it can be a waste of money!)
They've basically gone with an overly expensive Microsoft stack which meant that from the beginning they were fighting the wrong fight. Instead of thinking about scale for best performance, they had to think about best cost for minimum scale.
Furthermore they decided to bastardise their C# code to some fake low level untestable C dragon code which makes matters even worse. They have created a huge mess which is probably extremely hard to maintain and too precious to touch for a performance gain which probably still only scratches on the surface of what a true low level language could have achieved.
The best practice would have been to build the parts which matter most in a language which allows them to be blazingly fast and write nice testable and easy to understand code. Something like Rust or Go. But no, they stuck with C# for god knows what reason (maybe financial Microsoft incentives who knows?) and had to throw away every sensible coding practise in order to make it work.
This is literally a blog post describing how not to run a large scale website.
> The best practice would have been to build the parts which matter most in a language which allows them to be blazingly fast and write nice testable and easy to understand code. Something like Rust or Go. But no, they stuck with C# for god knows what reason
Whilst for work I've done approaching two decades of C#, for my own stuff I have largely used Go for years now so I'm far from an MS apologist, but that impression of C# is wrong. To be fair, the article leads partly to your conclusion so that's understandable.
C# with .Net Core, almost by design and by default, massively encourages "nice testable and easy to understand code" - much more so than most languages out there.
And unless the codebase is a hacked together monstrosity of the type that can be created using any ecosystem, .Net Core and C# are massively performant (pre-Core .Net was just about up there with some competitors but Core is in another league over the old Framework).
It's a shame that it doesn't have the mindshare outside of enterprise that it deserves, as C# on .Net Core basically gives you (practically, not syntactically) type-safe Node in terms of async/await, Go-like performance, Java-like power, and Ruby-like simplicity. Though to be honest it can take a fair while to advance from "hello world" given the atrocious naming and versioning mistakes Microsoft have made and how that impacts documentation and research.
> You massively overestimate C#. Microsoft devdiv marketing is working.
No, I don't. And it isn't based on marketing.
I've done C# since around 2001 and Go since around 2015 (in both cases tools, sites, and APIs) so my perspective on relative performance is from first-hand knowledge.
As regards Java I don't think the comparison is controversial, and I've done Ruby (and some Rails) on and off for many years and whilst some of it may be down to familiarity, I still find C# (after acclimatising to the .Net/Core Framework) to be equally readable, flexible, and simple.
You're totally free to disagree of course, I'm just taking the opportunity to correct the impression that I'm "overestimating" C# based on marketing when in reality there's no estimation happening at all - I'm speaking from personal experience.
In my opinion, the idea that you can have well-tested code OR performance is tosh.
If you don’t have good testing and do it regularly how can you refactor your application for speed if you’re not going to be sure that you’re going to introduce a regression?
And I think the stackoverflow team know it:
> Currently, we’re trying to change this. We’re actively trying to write more tests and make our code more testable. It’s an engineering goal we aim to achieve, but the changes needed are significant.
This makes it sound like a lofty goal but I think it just sounds more like. “Oh crap, we’re sitting on a mountain of tech debt”
No I know you can’t always expect to arrive at a project and find the sunny uplands of TDD and continuous delivery but if you do get to “optimised for speed” code (to me that’s a synonym for tech debt that’s slowing down delivery) all you can do is slowly add what should have been done in the first place (tests).
You're missing the development time dimension. The point of the SO post is that, given a limited development time, they chose to focus on performance and sacrifice testability. Many years later now, they are choosing the opposite.
This doesn't make their initial decision wrong. The site exists and has thrived, and has achieved all of its performance goals. So the decision was correct. If it has now become too hard to maintain, perhaps the decision to sacrifice some performance to achieve testability will be correct as well (since it seems they haven't finished yet, we can't judge the results).
It’s always going to be orders of magnitude more difficult/expensive to retrofit testability. I’d argue that they were successful despite their approach not because of it.
I actually have a lot of time for their design choices (monolith, scale up as much as possible, etc, not everything needs to be distributed microservices) but to find doing that by sacrificing testing is surprising.
It is when time to market with a feature-complete product is your primary concern, and testability isn't shown to be helping that metric for the developer's specific case.
Until customers start caring a lot about development slowing down orders of magnitude down the line and hold their providers accountable, development will continue cutting corners where possible. Unfortunately, testability in particular is one aspect which doesn't have amazing support in regards to how much value it adds.
I think that the “we can go quicker by not testing” is a fallacy. If you do TDD right it will make you quicker, not slower in the short term. Not just the long term...
Emphasis on "I think". Unfortunately, your thinking hasn't had a whole lot of support, as studies have found both findings in favor of and arguments against TDD, both short term and long term.
Furthermore, I would ask you to name the contexts in which you believe TDD to be better in the short term. Things differ when a lot of critical and non-trivial data-modifying code exists compared to an interface which changes requirements by the week, to name two extremes on opposite ends.
Things also change when the codebase grows beyond proportions for the devs to actively manage within their heads, or when TDD starts serving as a way to document your codebase for new people coming in. There are plenty of situations one can imagine these conditions not being present, and where a developer with a relatively high degree of correct output would slow themselves down with TDD, the same way some people need to overhead of pulling out their calculator and confirming the answer while someone else calculates it in their head without confirmation.
While I won't deny TDD can, or at least should, have benefits in the long term, it is fairly extreme to claim testing is almost always better without the proof to back it up.
Ok, my bad for “I think” - I should have said: in my 30 year experience I’ve yet had to come across a system were good tests were a hindrance to performance. I’ve had many a time where complex investigations into performance problems were uncovered by systematic and automated tests.
If performance is a main goal, then surely TDD with performance in mind will give you a better advantage
Your comment was interpreted by many as "you can develop faster with testing than without" which is quite controversial. If you meant "quicker" as in "faster code", then that seems more plausible.
I wouldn't have thought "testing is always good" would be a controversial claim! I could make the claim that dogs shouldn't eat potting soil and most folk would probably agree with me. I don't have any proof that it's actually bad though. My dog gobbled up some potting soil today and he will probably be just fine. I still think he's a doofus, though.
I personally write significantly higher quality code, faster, when I spend a few hours thinking through the problem in a document beforehand. One of the things to think about is how to best test it, and most of the time that means a unit test. Knowing that when starting to write the code influences the interfaces and structures that I come up with.
Anecdotally, one recent project at work required writing a relatively small, 100% self-contained program with significantly less serious / more silly requirements than the typical type of code I write. I initially didn't plan on writing any unit tests, and about 500 lines of code later I had the program working 99% correctly. There was one bit of overtly complex code that was a bottleneck, though, and it made the overall program 30x too slow for its purpose. The fix was to replace a conceptually simple iterative solver with some brain-melting closed form equations. About 8 hours of iteration and manual testing later, I was pretty sure I had correctly handled all the edge cases. My brain was melted, though, and I couldn't even remember half of the edge cases I had encountered. It took me another 20 minutes to write a unit test that compared the output of the iterative solver with the output of the closed-form solver. Had I done that from the beginning, I probably would have saved 4 hours and not melted my brain.
The thing with writing code and testing is that, if you're in the habit of it, unit tests typically take a less-than-linear amount of effort vs. writing the actual code. Once you have tests, they decouple code complexity from development and maintenance, and you can therefore achieve a more sophisticated overall solution than otherwise with the same resources. If you don't develop testing as a skill, though, then those benefits don't materialize because you're inefficient at it. So, while it may be true that a given project or team wouldn't benefit from prioritizing test coverage, it's seemingly due to lack of expertise rather than a lack of obvious benefit. If I'm working on a home improvement project and my toddler wants to help tighten a screw, I wouldn't hand her my impact driver and expect her to successfully use it let alone not hurt herself, despite it being an unambiguously more efficient tool for driving screws than her little green plastic hammer.
That kind of unit tests sounds great. I always reach for writing tests for eg. for library functions. Though I tend to skip when testing some app functionality which basically requires stubbing/mocking tons of services, and feeding them what output to expect on what input - in those cases, I don't find tests or TDD very helpful.
I find a lot of value in unit tests for isolated logic testing and in functional tests for end to end testing. Still that leaves a lot of code in a modern web application (like stackoverflow) where I find tests don't provide enough bang for buck.
It's not a model to follow because it was a way of working that fit their requirements at that point in time. Given a different company, project, team, ecosystem, etc, the decision may well be different.
Thus, you shouldn't follow their model unless you are identical in every way.
What you should follow is the way they arrived at a decision, not the decision itself.
> It’s always going to be orders of magnitude more difficult/expensive to retrofit testability.
I can't take this any other way than a huge overstatement. I'm yet to find any such example where retrofitting it is orders of magnitude harder.
Worst case it requires understanding the entire system again, and adding instrumentation here or there. That can be almost as hard as writing the entire system again, but usually it's much easier, because all the exploratory work has been done and tests are merely formal.
I'd say that correcting the bugs you'll find with those tests is way more work than adding the tests, and maybe they would be easier if they were discovered earlier. But again, the "orders of magnitude more work" is an overstatement.
Speaking from experience, if you have to go back after many years of development, where most of the team that made the original design choices has left and want to make changes without breaking the system and discover that half of the system has no tests then it’s going to be difficult to even make the smallest of changes.
When you then decide that you need to retrofit good tests, it always becomes an exercise in trying to reverse engineer the requirements (because half the documentation has been lost or is so hopelessly out of date it may well be lost), then yes, making something well tested is very hard.
Having said that, it often worth doing it because how else are you going to have the confidence that you don’t break the system with every little change, but it very much is going to require much more effort than if it had been done in the first place.
I read that statement as orders of magnitude more difficult/expensive to retrofit testability as opposed to making the system testable at the start, not as opposed to creating the system.
It's the project management tradeoff; they chose to acquire technical debt in favor of short-term performance, which I'm sure at the time was the way to go (you don't want to be left behind), but now they're paying off the debt, and it sounds like it's a big one.
> The point of the SO post is that, given a limited development time, they chose to focus on performance and sacrifice testability.
I used to think that this was necessarily a choice for a long time, in fact I only very recently started writing almost TDD (not completely, there's things that I will still write code first, but no longer do I leave it untested, and I also don't see a problem in writing contained parts of code and add the tests after when finishing that "contained part"), but this isn't necessarily a choice.
Tests help in many ways, and I learned this in doing any project, large or small for any amount of time larger than 1 month (or maybe 2 weeks). And sometimes it's not even the correctness of the program that is the most important, in the sense that you can write correct code for the most part correct and legible without tests. What I've found it really helps is when you have to tear down a prior specified behaviour/request/spec, or update it, or integrate it with other things.
Tests at least tell you that "all these assumptions you made and wrote in the past 3 months while writing this code still hold true", and this is important.
Of course they don't cover all things and I don't think you should test to infinity but, the thing is, once they're written and not overly brittle (UI's for instance are complex to test correctly because it's easy to make them overly brittle if there's any churn going in there) and you find new bugs or over-sighted assumptions you can add them to your test suite, so they keep accumulating value.
The other thing I noticed is very helpful is when you realise that you can't easily write tests for something. This is usually a sign that your code isn't as organised/designed as it should be, has coupling and/or other accidental complexity that is not required (there's exceptions where the problem domain and flows required are complex enough to make testing complex also ofc).
The time dimension and lee-way when writing code were before my main gripes with testing regularly but sincerely I think that my code with tests is much better the more I practice it and I can't see much/any slow-down from before in implementing things, and certainly the opposite after any significant time has passed or the code base has grown large enough to cover different requirements.
Something people rarely mention with tests is that their value has a Pareto distribution. Most of the value of a test suite will be captured by a small number of tests. (And inversely, most of your tests will never discover any bugs).
The upshot of this is that you will get disproportionate value from the first few tests you add to a project. I think its very rare to have a long lived project in which the first 10 tests aren't worth writing. The next 100 tests? That's a much harder question.
Actually I agree with you, and also forgot to mention that tests themselves because of being code should be cleaned up just like with regular code and tests simplified whenever possible.
I think the important question is what to be tested. I have written, even not that long ago, unit tests for a function that all passed in isolation and actually forgot to write a simple test with the context it would be called (which was very simple as well, but I just didn't), all isolated tests were good but the actual functionality was still broken.
I haven't been doing it extensively for a long time but for instance, if you have an user action that should update a record, write a job to be executed, then this should be tested. It's easy and does very small assumptions. You don't need to test how the record is updated or the job inserted, just that it happens. You might want to test that all failures provide correct error messages, but that is not as important as at least having the baseline of, if this action that is central happens both these things happen as well.
> What I've found it really helps is when you have to tear down a prior specified behaviour/request/spec, or update it, or integrate it with other things.
Honestly, in my experience, the opposite is true of unit tests. I've never been able to make a significant change to a piece of code that was thoroughly unit tested without having to more or less re-write the unit tests from scratch.
However, integration tests and whole-project automation do help tremendously in what you are describing.
I'm not completely convinced of exhaustive unit testing either as an end in itself. I also haven't been writing tests regularly for that long but I did extensively test quite a large surface with a game I'm writing for a long time in the past months but there it's also not exactly unit tests what I did and more verifying that actions produced the expected results and, where it was easy, also making sure some obvious side-effects were not part of the result.
Besides that I also haven't been thrown into codebases where I had to refactor in anger someone else's code or tests, I imagine it can get very frustrating as well when overdone or done badly (and not saying I wouldn't do it badly either - I think it can be easy to overdo unit testing without any relevant gains).
I guess what I'm leaning towards in my own practice though is thinking of unit tests as trash-able. That is, if you want to, using them to drive the design and verify along the way the small parts that comprise whatever end API (in the library sense not REST, even if it's just part of your "code" and not a library in itself) you're designing it's ok, if that makes sense to your development style, but once you have the API in place the important thing is that the API itself is kept under test.
At that point if the unit tests become a burden they should be trash-able, as long as the API - I imagine what you mean by integration? - is kept tested. I don't know if this flies in practice if you're working in an environment that requires unit testing for everything and what not or if it's actually better to loose the time and re-write them - annoying as it might be, they usually indicate that something is broken if they're failing which can be better than being oblivious.
All of this though must also be regarded in terms of what is the "unit" we're referring here to and I also think different languages and ways of organising programs can change a lot the way one sees units, apis, etc. Some languages you have more of a domain like organisation where you set up APIs to "reach" to the underlying stuff, that probably means a unit in those languages. Others that are inching more towards a "lower level" generate what looks like much smaller unit definitions - which would probably be very annoying to test extensively - sometimes looks like people talk about unit testing and one is using ft and the other m without having defined the unit of measure.
Very, very few people know how to make cheap tests.
I have one guy who writes fixtures that couples all of his tests. When I need to add functionality, first I have to replace his fixtures with basic stubs, just so I can add another bit of functionality.
And we have two people who copy his coding style. Probably to fit in.
I think, based on my own experiences and working with people learning to write tests, that 'simple' tests often feel more like the pejorative meaning of the word. They get uncomfortable, like you're debasing yourself to write such boring code. Surely I can write more interesting tests.
Yes, you can write more interesting tests if your goal is for nobody to ever touch your code. Mine, mine, mine. The best tests, I know where my new code went wrong simply by reading the test output (not the test). If I refactor I should be able to maintain that coverage even if the order or structure of things gets completely switched around.
Complex tests begat complex tests. If your old test is 2-3 lines of code, I just delete it and write a new one that does the same thing.
I never got to understand testing. It's a thing that every developer is supposed to know and do well somehow.
I always end up writing tests for my programs, but with all the stubs and fixtures and weird stuff I get into hairy problems that I don't understand. Tests are supposed to be helpful, but writing good tests is ridiculously hard.
Or maybe it's just because I'm a perfectionist, and I should be happy with crappy tests that at least give me a little bit of information about regressions
It's one of those things where people who really believe in it are very vocal and will shout over anyone who dares to think there are any problems. Not everyone loves them.
But, as I'm a test-lover, I have to say they're pretty amazing if you can get everyone on-board and writing good tests.
As for how hard they are... It depends on how the code is structured. If the code is designed to be testable, it's pretty easy to write simple tests. Integrations get a little tougher, but as long as you know what the other side should return, it's pretty easy to stub them out.
Testing existing code that was not designed to be testable is a lot harder. It typically will introduce a lot of side effects and won't be properly separated. It can be tested, but IMO just isn't worth the hassle. I've found that you end up spending as much time writing tests as you would have fixing the bugs that arise... And code doesn't get used forever. Modern dev culture insists that someone call for a rewrite eventually, no matter how good the code is, unless you're working in a system that already has decades of legacy code.
My advice would be to start a small project of your own (or wait until you get a new project at work that's from scratch) and write tests from the start. It'll teach you a ton about testing and help you get the system off the ground quickly and solidly. I did this for the last big server-side system I wrote and the tests helped tremendously. The bugs that QA found were generally things that I didn't consider or that they designed wrong. The tests helped me get everything working and keep it working while they changed all kinds of things.
Sadly, once that project ended up in the hands of other devs, the tests slowly died and I was the only one trying to fix them. I eventually went to another project for a while and came back to find the tests basically didn't work at all, and they died completely at that point. But they did their job at the start, and were totally worth it.
> I have to say they're pretty amazing if you can get everyone on-board and writing good tests
This is my main complaint about code coverage metrics. I want you to be good at test writing before you write the bulk of your tests, not after. It’s hard enough to get people to write good ones, let alone go back and fix the messy ones they wrote badly (recall I said good tests are easier to replace than bad ones).
If you write a bad enough test it can take longer to fix it than throw it out and start over. But since you didn’t clearly capture the constraints the first time, do you dare delete it? What aren’t you testing after?
I've seen a lot of people say not to design code to be testable. That instead tests should work around whatever shape the code should be in "naturally". Writing this out this seems stupid, it's been pretty influencial on how I code (for the past year or so, I'm a complete beginner).
To me, the shape the code should be is largely the same as easily-testable code anyhow.
You want functions that do small, self-contained things so that you can reason about them efficiently. That also makes them easier to test.
Yes, sometimes you end up with a monstrous function because of requirements... But then, that's how you have to test it anyhow.
That said, some of the more intense testing things, like dependency injection, definitely change the shape of the code into abnormal forms. So I can agree there that it's adding complexity to the code that it shouldn't, just for testing. Some projects may warrant the increased complexity to trade for stability. I don't think most do.
The most common case of this I've run into is should I use dependency injection just for testing. My function creates a Frob and interacts with it. By default Frobs persist changes to a DB and sync over the network. I don't want to do that in a test. Now I need to pass a frobFactory of (FrobConfig) -> Frob to my function.
Sure in theory I could use my new configurability for all sorts of things, but in practice I only end up using the alternative path in tests.
When do you not end up needing something like this?
I kept telling people for years that I'm awful at writing tests. You're coming to me for advice because I'm less awful than you are.
Nobody really seemed to get what I was trying to tell them. It parallels my experiences with Scrum "Well if it's not working you're doing it wrong". I love tests, and I hate tests. It's a very codependent relationship.
I did run into a string of people who had either very good value statements or very good questions, which helped.
But I still find tests I've written that violate the guidelines I try to set for the team. There are probably few moments in the week when I have, to be completely blunt, my head so far up my own ass as when I'm writing tests. The only time I am juggling more info at once is when I'm debugging, and in debugging I don't have to worry about the future reader. I doubt I am unique in either regard.
Unless I go out of my way to make the tests as agonizingly boring as I possibly can.
Pain is information, and an encouragement to change what you’re doing or how you’re doing it.
A little reorganization of your code can make an order of magnitude difference in how fiddly the scaffolding is. Does that mean your code structure is informed by tests? Yes, yes it does. But classic car designs are informed by access for repairs, and some vehicles are cherished for their ease, others become a topic of complaint and camaraderie. Really any industrial design has an element of this.
Oh absolutely, keeping things simple should never be restricted to just production code. Testing code should be simple too.
The idea that tests inhibits refactoring I find problematic though. Good tests enable refactoring. If the tests are too complex for anyone to change the code, the the code is too complex. Any complex system should be composed of simple components. Now this applies to monoliths as well as microservices.
I agree it is easy to fall into the trap of thinking that tests are boring though as devs become more experienced there should be less of that thinking, hopefully.
>> If the tests are too complex for anyone to change the code, the the code is too complex
In enterprise software, the pattern i see is specifically coupling rather than garden variety complexity. There’s a really pervasive idea that unit testing means 1:1 mapping of test class to production code class. To put it another way, there’s this really common but terrible idea that tests should map to your current implementation rather than the behaviour you want.
Worse, some check in test cases that should only exist locally during development, they do things like directly test private implementations. A related issue is no one ever deletes tests.
Testing done well, is utterly transformational. I took a piece of enterprise software, it wasn’t badly written per se but it was hard to release. 80%+ of releases were rolled back due to issues found in production. For each release, 1 of 7 “experts” were required to review your change to this component (to make it worse it changed fairly frequently). The problem was only 2 of those 7 had a track record of only approving good changes. I’m ashamed to say i was one of the other 5 - i reviewed and approved a couple of changes that had to be rolled back from prod.
So, i thought sod this and i grabbed the stingy nettles. The code was not written to be testable but fast forward 16,000 (i had a small team) test cases which execute in around 16 minutes and over the past years that software continues to be released as frequently but has NEVER had a prod rollback since. The 7 experts idea is disbanded, it’s just a normal pull request these days. Lead time on changes is about 1/3 what it was - all the faffing about exploring model output in excel for hours if not days is gone, it’s just all captured as tests.
It wasn’t cheap to do this but back of the envelope calculations, it repaid within 3 months just on lack of prod outages.
I think this is a huge problem. I’m not sure why it exists.
“If we have more tests it will be harder to change the code.” is an argument people make sincerely. Then... just delete the tests. If your argument against having these tests is you can imagine a point where they become a burden, just keep in mind that it’s really, really fast to delete code. “Untested” is a state we can get back to at any moment.
Maybe the tests will provide a lot of value until then. Maybe if we do a big refactor we’ll delete the old tests that are too low level and make new ones. That’s fine.
I joke that we should make the team watch one of those Hoarder TV shows, but it's a Tears of a Clown sort of situation.
Try telling people that you're deleting a flaky test and they'll stop working on their current story to try to stop you. It's as predictable as the phases of the moon.
> To put it another way, there’s this really common but terrible idea that tests should map to your current implementation rather than the behaviour you want.
I think if you were prioritizing, then the most bang for the buck comes from making your white box tests as stupid-simple as you possibly can. Since those demand a particular implementation.
> If the tests are too complex for anyone to change the code, the the code is too complex.
No, more than that, it's a sign that the tests are not just testing for how code behaves, but also its implementation details.
With well-written tests, you can completely re-structure some code and, if it still does the same thing, the tests for that code will still pass.
If the code being changed has several components with their own tests, then either those components have no interface change, in which case the previous paragraph applies to them, or the do, in which case their tests should be easy to adapt.
This believe got shattered when I learned Rust out of fun. Not only was the resulting code faster, it was is also incredible resistant against stupid changes. If you go in 6 months after and make a change where you forget a tiny little implementation detail, the thing fails, reminds you of that and once you fixed it, just works.
Coming from a dynamic programming language I thought Rust was going to be my "if it must be fast, and is allowed to take longer"-language, but instead it is my "if my life would depend on it I would choose this"-language.
What does StackOverflow really have left to deliver, though? They made a fairly snappy and robust web application which has very popular functionality, that serves giant boatloads of traffic, with a relatively small team of developers and a relatively small cluster of servers. They didn't have twitter-fail-whale like incidents as far as I recall.
It seems they proved that you don't need lots of good unit tests, you can out-perform the whole industry without that. (I say this as someone who doesn't care for Windows or C# - I still recognize superb execution.) At this point they just feel sheepish and apologetic about it.
Also true in interviews. When asked about their weaknesses, some candidates who have thought this through might say something like "I spend too much time trying to help struggling colleagues" or similar.
Somewhere down the line, developers will start to realize that single applications running in the same server memory and CPU cache will run much faster than Micro services applications that makes api calls over the network.
Depending on your organization scale of course, but for many smaller shops, there is a significant performance overhead of going over the network.
Ie a Single binary running Rust/Go/Java/NodeJs vs distributed Microservices applications.
I don’t think anyone is claiming that microservices are inherently faster. Instead it is a strategy to keep software loosely coupled and enable very large systems to be built by parallel teams while maintaining productivity and quality both technically and organisationally. That said, I’m a strong proponent of monolithic architectures until scale becomes a problem.
It’s also a strategy to isolate failures and to use resources efficiently. I’ve had a very positive experience working on a team that deployed a collection of microservices. That said, it was large scale, and most of the “micro” services were only conceptually micro; they were large in terms of resource usage. We also had excellent tooling.
You don’t have to have microservices in the end result, but you could develop and test them as microservices, then have them automatically rewritten into a monolith, perhaps with a subset of their functionality, if that would fit better.
Similarly, you could do TDD on a high percentage of the code by using injection, then automatically rewrite the code into static methods, as Atwood was saying they needed for performance, if that’s what would help.
Code generation, etc. got a bad name over the years as people tended to use it to balloon the eventual source, like that of many autogenerated config files today, but it can be used to do almost anything.
If you have a monolith, you have less control over the service’s shape—the CPU, RAM, IO footprints. If you break it up into smaller microservices, the footprints are smaller and you have more options for how you schedule them.
There is a tradeoff here—services that are too small have high overhead. Services that are too large are inefficiently scheduled. Services that are way too large force you to spend money buying the massive machines needed to run them.
This assertion is making a lot of unwarranted assumptions about the software architecture of a monolith. You can have a monolith and total control over the resources at the scale of an entire machine. This can be extraordinarily resource efficient in a way microservices never are, and is often the default case when designing systems like databases or data infrastructure.
A monolith easily achieves 10x throughput of what microservices will do on the same hardware. Which is why people build monoliths, they can be extremely efficient. Microservices are usually about trading resource efficiency for organizational convenience.
It’s “a possible strategy”, and whether the strategy is appropriate depends on the particulars of your service.
> A monolith easily achieves 10x throughput of what microservices will do on the same hardware.
If we’re gonna talk about unwarranted assumptions, this one takes the cake.
I’ve seen services spun up where the resources to run them simply didn’t exist in our data centers. As monoliths, their throughput was effectively zero. The cost tradeoff was “buy bigger servers and wait until they arrive” versus “spend engineering resources and stop buying bigger servers.” Again—this is going to depend on the particulars. The better answer is not necessarily obvious.
It's because Erlang abstracts the difference between distributed and centralized services. In Erlang everything is message passing, and Erlang doesn't much care if the components passing messages are running on the same server or multiple ones in a DC, it will route the messages either way. In many ways Erlang is the ultimate micro-service, to the point where the entire language is built around it.
I think, theoretically, that's close to what you're "supposed to do". The problem is genuinely identifying and limiting those bounded contexts in such a way that business and technology sticks to consistently.
Architectures and scalability, although related, are two different conversations. You could build a monolith with an internal messaging queue, this obviously limits scaling possibilities horizontally but may be able to scale vertically. If you externalize the queue and use something like Rabbit, Redis, or NATS then you have more options but you also have more/new problems. Not all things need to scale, and not all things need to scale horizontally. There are ways of shifting out of monoliths too, but this gets trickier with some older, ill-defined codebases in my experience.
Microservices can be advantageous but I don't think I've ever cited speed, unless I was talking about deployments and build times (which are definitely part of the conversation since architecture affects the entire SDLC). Usually my strongest points about microservices are about maintainability from the team perspective, reliability (if done right), and earlier parts of the SDLC like testing and build times. The cons are that you are usually having to maintain (and test against) very strong external contracts, inner-service knowledge is strong due to a smaller codebase but often times developers have very little intra-service knowledge (they don't understand the big picture, which is important), and many times codesharing is tricky or just not possible.
My hope is that the same realization will eventually come to web developers as well. A well written desktop application is much faster, and typically more stable, than a web app that's hosted ... well who can tell where. This isn't to say that web applications don't offer tremendous value, but it is definitely a trade-off worth considering sometimes.
>My hope is that the same realization will eventually come to web developers as well.
What makes you think that they don't already realise this? What suggests to you that web developers work on webapps for performance reasons, rather than familiarity with the technology or ease of development (or cross-platform support, or lack of user barriers, etc)?
The insane proliferation of SPA’s, electron apps and the general attitude of “dev time is more important than app performance” of the front-end community suggests performance isn’t a real concern.
What do you mean "somewhere down the line"? This is already well-known, but what is also known is that vertical scaling, e.g. how much one machine can do, is limited.
When you get to a global, millions-of-requests-per-second system like e.g. Twitter, you've gone well beyond vertical scaling.
Know your problem, then pick a solution. I'll admit that the microservices architecture is overused and I'm sure that's what you're aiming at, and I fully agree with that. But I also feel like you cannot understand the scale of true 'web scale' companies and their application. I know I can't.
I like RDS because I don't have to think about backups, replication etc but if I colocate the app server (web server, application code) and the database server on the same machine I see 1-2 orders of magnitude performance improvement in database-intensive apps.
Replication is the same as usual; have another machine setup to replicate onto. The restriction is you can't fan out hundreds of application servers and half a dozen database servers if you need that - but I don't need to.
I haven't done this with clients but I'm doing it with my own (cost-sensitive) side projects and appreciate removing the internal network latency.
I like RDS for the same reason, but I wonder if there's a way you could run Postgres locally, as you suggest, while still using RDS for all the replication/backup side of things? That way the local Postgres management could be pretty minimal and outsource the "annoying bits" to RDS as before. I need to look into the best way to set up something like that.. perhaps pgPool-II.
Well, yes, it can, and people exploit this in large datacenters all the time. That's one of the reasons people get huge, expensive machines full of disks that do nothing else.
But none of this applies to your run of the mill REST web service.
General statements like this help no one. The best architecture depends on the application. For example, running a CMS as a monolith makes sense but a video transcoding service doesn't.
Good developers choose the architecture that works best for what they're building.
I'd like to point out that the article does mention that their system is not single process. They have a database, caching layer and web servers at the very least which could be run on a single box. The number of API calls is a matter of degree when compared with a microservices architecture.
A microservices approach does not mean that they should run on different hosts and an early design decision should be that the entire architecture can run on a single host and that the scale out is a feature of the system.
It's worrying that this assumption is often made because it does lead to the maintenance and velocity issues that teams encounter when naively adopting microservices.
Within the same data center, a network call doesn't need to have significant performance overhead; it can be on the order of a read from main memory. Still overhead, but for many SLOs it's irrelevant.
The inherent cost of microservices is conceptual and organizational complexity, not runtime performance (though you can certainly build slow microservices, you can also build slow monoliths).
When I was an SRE-SE I despised the term "best practices". Teams would come in telling me all about the new best practice with serverless, Kubernetes, secrets storage, or whatever. It was almost always like someone took a magnificent full Lego set and snapped off a wall and was like, "Hey, try this, it was useful to us." If it's not a protocol or standard recognized by one of the IEEE, IETF, or ISO then it's a tutorial with a strong opinion. Nothing wrong about that, but just know you're often reading a strong opinion, not a standard.
"Best practices" almost always come out of some business or project explaining what tested best with their circumstances. That alone tells you everything you need to know. Invest in a testing process which exposes the most critical parts of your application/infrastructure, attempt to overwhelm or exploit them, and you'll find out what your best practices need to be real quick. Those best practices may work at other companies but not nearly as well as if they invested in their own testing processes that are as or more rigorous than yours.
I agree. Lately, I try to refrain from using the term 'best practices' in favor of 'good practices', 'common practices' or even 'my practices' for several reasons:
- True best practices mainly refer to standards, and this list is very slowly updated, unlike the number of false ones;
- It's often used for marketing purposes, which works well, but doesn't always imply quality;
- Since this spreads fast, we often have things at the places where they do not fit, but squeezed into, because it's the best, so we must have it;
- Forms a wrong mindset to rely upon someone else's experience instead of learning and gaining own understanding.
In the end, we have tons of inapplicable or controversial information to filter, which doesn't make things better, in my opinion. Sharing is great but we should do it responsibly.
It's also a bit funny to watch how the best practices suddenly become the worst.
Standards are for interoperability. If you try to get best practices out of them, you are set for a world of pain.
But I do agree, the hype of the day isn't a best practice either. And just the term "best practice" without any qualifier or context is already a sign of incompetence.
It works very well. I've seen all sorts of monstrosity code bases developed under the guise of "best practices" where the original developer or team jettisoned away and now a business doesn't know what to do with it. They often can't abandon it because it's critical to their work but they also can't go forward as is because of all sorts of issues they're encountering. Application maintainence is where empirical data tests designs, best practices, and new shiny. Unfortunately, those who create these messes never see anything but greenfields while they perpetuate poor cargo cult mentalities.
The tech and terminology often dazzles less savvy businesses and they're happy until problems start to arise.
> it’s no silver bullet: your software is not going to crash and burn if you don’t write your tests first, and the presence of tests alone does not mean you won’t have maintainability issues
This is not what tests are for. Tests rarely catch bugs for the first version of their code. They often catch bugs in modifications to that code, and any bugs that aren't caught can be added as a test. Tests are like checkpoints in video games. They guarantee that you don't regress past a point. Over time you build up so many checkpoints that your code is hard to break without a test failing. It's an evolutionary process.
Agree. Also, as a consequence in the long run writing unit tests / automated tests is the only way for maintaining productivity in a complex project, as every required code change to an existent component poses a great risk, and without automation it will be necessary countless hours of manual testing.
Shameless plug: I while ago I wrote on the topic of how projects should be structured with the goal of automated testing in mind [1]
The article makes the excuse that static (singleton) interfaces are inherently simpler and less resource intensive enough for it to be worth skipping regression testing (because testing that style of code is hard.)
Reading between the lines, it sounds like maybe they know they've dug themselves into a very expensive hole to dig out of. I've jumped head first into codebases that made the same excuse, and to me it seems severely limiting in terms of long term productivity. There are definitely benefits to simple, "static" singleton interfaces, and such interfaces aren't inherently incompatible with mocking and dependency injection. It does mean you likely need a layer of indirection somewhere, but if you're sweating the overhead of a single vtable lookup or explicit function pointer call on code that's internally accessing a non-trivially sized data structure, how are you even able to measure it?
The problem is that a handful of foundational static interfaces never got unit tested, and all the other code calls directly into them, so nobody anywhere wrote any unit tests because they couldn't (without touching foundational code that's unsafe to modify for lack of unit tests.) The first step in the right direction is to fix the foundation. This is terrifying for the folk that have been around a while and cemented in their assumptions about risk because they've tiptoed around modifying portions of the codebase for years. Luckily, a big benefit of static singleton interfaces is that it's very easy to modify them with O(n) developer time (where n is the number of references in the codebase). So, you just have to buckle down and get your hands dirty. You're pretty much guaranteed to find at least one latent bug in any old piece of code that you unit test, and so folk start to see the merits of the test coverage and it becomes easier to prioritize refactoring more and more ancient and scary code. After you've done it to half a dozen or so disjoint pieces of code, something magical happens. Suddenly, the vast majority of the codebase becomes easily unit testable.
The post also presents a false choice between what’s known as “the singleton pattern” versus just allocating a single instance of the object and passing it around where needed:
> We don’t write a lot of these because our code doesn’t follow standard decoupling practices; while those principles make for easy to maintain code for a team, they add extra steps during runtime, and allocate more memory. It’s not much on any given transaction, but over thousands per second, it adds up. Things like polymorphism and dependency injection have been replaced with static fields and service locators.
“dependency injection” doesn’t require constantly allocating new instances of the things your classes need.
I have to admit I would far rather work on a conceptually simple codebase, that has been kept fast and light with minimal testing, than to work on a conceptually muddled codebase with 100% testing.
Performance is a feature, and so is being able to reason through a codebase.
Being able to fit all the pieces in your head is a feature, but not so snappy.
> Being able to fit all the pieces in your head is a feature, but not so snappy.
How about "architectural sanity is a feature"?
It kind of isn't though. Clients don't care about qualities of your code that they can observe. They notice if your code is fast, but don't care it it's easy or hard to reason about.
If anything, I'd say it's a bit of a meta-feature, because code-quality is directly linked to many other properties that people do care about (speed, new features, quick bugfixes, etc.)
yes, but I think we live in a world of increasing software literacy - where a novel or a manifesto that is easy to read will beat out those that are not. Similar for software - the good software will include 'makes sense in the readers head'
TDD does not mean "small unit tests at the method level in code".
It means you code your test cases before you implement the functionalities. Those tests can (and IMO should) be behavior tests: then you can rip out all your code and replace it with whatever you like, even off-the-shelf software and your test suite will still run.
The excuse of those kind of tests being slow or hard to setup mean only one thing: we have to work on their tooling.
I share the same opinion. Its been a long time since I read Kent Beck's book on TDD, but I don't recall him saying TDD = unit tests. When I see people using these as synonyms it feels to me they don't quite understood what TDD means and just jumped into this conclusion of "unit tests that check every method/line of code you write".
Testing behavior from the user perspective should be the goal, these are the kind of tests that bring most value in the long run.
One thing that stood out to me was they wouldn't need necessarily need to optimize for performance if they didn't choose a stack whose costs increase massively as you scale.
In many applications things like having a testable, easy-to-read, well-structured codebase can trump performance, and it's often worth sacrificing a bit of performance if it means it's easier to refactor and build new features.
I don't know why they went for SQL Server, .NET has had good support for other databases even back to 2009. I imagine that was a huge amount of their licensing cost.
With .NET Core running so well on Linux now they would have no Windows Server licenses to worry about too. The first production version of .NET core was released in 2016.
Even back in 2009 it was possible to run web apps on mono on linux too, I know a lot of shops around that era that used windows for more complex parts of the app but then used mono for 'simple' high volume parts to save $$$ on windows licensing.
Because that was what the main developer and founder Jeff Atwood was familiar with. And he really liked the stack.
> Atwood: We are not personally going to be language agnostic, because we need to actually build the site. And in terms of people actually working on it, Joel's in an advisory role, I'm gonna be writing code, and then a friend of mine, Jarrod, I'll be working very closely with. So it's sort of like 1.5 developers, so I need to actually get things done. In order to do that, I'm gonna fall back on what I know, and what I know is essentially ASP.NET. So ASP.NET is gonna be the platform. And I actually really like ASP.NET.
Episode 003 April 29, 2008, from the StackOverflow podcast.
This should be obvious. Writing well-structured, decoupled, modular code can make you lose track of performance concerns. And since no one likes to write things more than once, so there's only "one shot" at writing the application, people tend to err on the side of "well-structured", because hey, who's gonna fore you for following best practices?
On the other hand, trying to optimise too early leads is wasteful and leads to headache down the line.
Writing "well-structured, decoupled, modular code" is a best design practice for a good reason. It what leads to best results overall, where "best results" means code that is easier overall to test, maintain, modify/expand, and optimise as needed.
Regarding optimisation, something related to keep in mind is that optimisation at system level (which is often the desired end result) is not the sum of independent local optimisations.
So, if this article actually showed me how writing modular testable code slowed down their application, and how they fixed it, that would be really interesting.
Is a thing mentioned from the article that I think many people forget. It also not only applies to programming or coding but rather to most things in life. Just because someone says it is best doesnt mean it is for you in your situation.
Yea, I really like that quote because it perfectly expresses how I feel about "best practices": Use them when you're unsure, but ditch them if you know something else will work better.
I think the industry should re-evaluate whether unit testing ought to still be the dominant testing paradigm. With Docker and things like WireMock I can fake/mock anything downstream of me in a fast, isolated and deterministic way. Both locally and in CI.
There is so much accidental complexity cruft and ceremony involved with the writing of unit tests and making things unit testable. In a lot of cases, the "coverage" is an illusion and the tests add more friction than there really ought to be.
It sounds like the team is interested in adding more automated tests, but are blocked by static singletons, which have high performance but also high coupling, resulting in poor testability.
I'm sure they've heard of test libraries like MS Fakes and Pose; I wonder if these libraries would let them maintain high performance, and only introduce the required layer of indirection during testing?
I think that this is more of a marketing post than a technical post. That said, I really wish people do not promote bad practices, they are already too well established :)
The idea that you can have either best practices or performance is a false dichotomy. Testing can be used to help track down performance issues. It can help you write code faster and prevent breakages. It should not slow you down.
Best practices inside a codebase should not sacrifice performance. If you have to do that to conform to some structural ideal, you don’t do it. I would love to see this supposedly untestable code.
Stackoverflow may contain lots of great dev knowledge, but it’s also a weird site that discourages participation and entrenches old answers. I am unsurprised that the devs themselves are a proverbial old dog that can’t learn new tricks.
> The idea that you can have either best practices or performance is a false dichotomy. Testing can be used to help track down performance issues. It can help you write code faster and prevent breakages. It should not slow you down.
It orks the other way too. Benchmarking can show that your bottleneck is virtcalls/boxing where you have used an interface field instead of a static instance to a concrete dependency.
The argument isn't "performance vs. tests" it's really "performance vs modularization". You can always move the tests upwards until you test the whole monolith. This isn't a common tradeoff but I also wouldn't dismiss it as something that never occurs and "best practices" wrt. modularization and unit testing always enables similar performance. It doesn't.
I’m just thinking about how much CO2 emission was spared by optimizing performance instead of scaling horizontally.
It would be nice to see the effect on the environment as part of the decision-making process for deciding whether to optimize versus scale.
I think developers have an innate desire to fix performance issues as optimization produces a better understanding of the code and the problem and developers would like to solve the problem in the best possible way. And it helps to know that the problem won’t bite again later.
I am not saying that you have to test every single piece of your code, but IMO sooner or later you will be forced to do so. Weather it is through the type checker (static or gradual), unit tests or manual tests is up to you. Unit tests just make this a little easier. You can replicate unit tests every time you commit (testing for bugs) without writing them down.
Don’t fully understand the argument here. One the one hand they claim they built a successful solution with minor problems without test driven development or a lot of unit tests.
On the other hand they claim to really like tests and have it as a goal to do more of them.
But this is a contradiction. If they successfully built a large site with minimal testing then why exactly do they need all these unit tests?
If they where successful without test driven development then why do they need TDD?
To be cool?
It sure what to make of this. The argument seems inconsistent. If I had major success without best practices I would have proclaimed loudly that those practices are BS. But these guys seem to want it but ways.
I don't mean this as insulting, but let's think this through, with the exception of the scale SO is a forum (a QA where the OP post is a question - a topic in a forum - and the following posts are answers). It does have many details that were built throughout time (moderators, different exchanges, job posts, etc) that can significantly increase the complexity of it all when thought as a single platform, but it's nothing ground-breaking per se, even in 2008 (again, not saying that the scale and amount of exchanges and keeping all that working together non-stop is trivial or not worthy of note by itself).
But you can imagine that, perhaps, the slow down in what they can do with all these platforms - most developments seem to be things in separate bolted on to the main QA idea, others take a long time to see the light of day - is that it has become quite difficult/slow to move without breaking half of it? That would explain why they want to move towards writing code in a more testable manner.
I don't even need to experience their scale to know what the pains are of trying to refactor or add functionality to an existing code base that is untested and where I was the one writing the code - ALONE - it has happened to me, with my perfect code writing abilities and clear pure thought more than I would like to admit.
And the same is true when we talk about mocks and etc. Yeah, if you don't think in using them from the beginning it becomes usually really problematic to implement tests and mocks after the fact. You don't need to write them but your code needs at least to know they're going to be used and written in a way that does - the problem with this approach is that you really need to know what you're doing to write it in a way that is testable without writing the tests, so my takeaway from my own endeavours is, write the tests as they'll show you exactly if something is well or not written, because when you start digging deep into 3 nested layers and building complex mocks of responses to mock a signup you know something is wrong. If you don't do it from the beginning guess what, you would have to have studied whatever you're integrating with and written that mock (even if mentally) to write that code ANYWAY, with the difference being that now all that work you did went to the bin because it only exists the execution path. What you learned from it is nowhere to be used and provide small guarantees.
If you don't write them, once it's all working and working correctly, you're in a world of pain, and you're probably going to be overtaken at some point by something new that sees what you're doing and do it better (because in the end it's not rocket science) - the ones who can ignore this and can punt all the way usually have some deep pockets.
> If they where successful without test driven development then why do they need TDD?
They were also funded in total up till now with $153M, one year after being beta they had a $6M funding.
I think these kind of posts do a disservice. I know I said to myself "you don't need that, look at X doing without it". Sometimes I look at the industry and think we pat ourselves way too much on the back when we shouldn't.
As an example that isn't web based. I used to work with Adobe Suite, CS5/6 - as an aside the current experience for me is an awful thing, so awful that I went to GIMP instead - but forgetting that, for instance they had and still have a batch utility (Image Processor) to do bulk transform pipelines on collections of images through scripts recording your actions. It had and still has many small stupid things but it mostly worked well. At some point, they decided to do a better one and touted it as the replacement so they did. Result, it's way slower to operate and, guess what, it can only work on an image at a time. What? It does better compression of gifs for instance, but it can't do batch processing. There's tickets open for years asking for the batch processing. Not done. Probably won't ever be. The previous processor also didn't get the benefits of this new compressing algorithm. What? How can it be that switching the used algorithm on the old one or adding the ability to do multiple images is more than a one man month work?
They noted in the post that they had a fixed number of servers and those servers had fixed performance characteristics. A lot of what they say applies specifically to that situation.
The problem is that they make broader claims about the validity of best practices (design, OOP, unit testing). Those best practices apply just fine to the status quo, where you can scale out easily and maintainability is more important than performance.
I don't understand the claim that replacing "polymorphism and dependency injection" with "static fields and service locators" reduces allocations.
I would expect that getting rid of dependency injection would allow you to convert a virtual method call to a direct method call, and replace a constructor argument with a static field reference, but I don't see the connection to allocation.
I think the problem with this article and their approach will be more visible when super star developers start to leave the company. If they are all the same people from the very beginning, yes, throw lots of practices out of the window because people know the codebase very well. But once they are out, there will be a mountain of tech-debt. They even try to make it testable.
Clearly adding further constraints can only worsen performance. I would code according to best practices except for cases where it’s possible to infer them implying tight bottlenecks. Then profile the code and if circumventing best practices results in a meaningful performance increase, do that.
This is almost obvious for a company made up of senior, capable developers. Hopefully no one else takes this as advice because things are already bad enough without giving people license to forgo testing or design for loose coupling.
The argument against Tests sounds like a tooling problem to me. Why shouldn't a compiler be able to optimize the modular approach to the same code that a monolithic approach yields?
Why is that relevant? The fact is that it doesn't (there are good reasons for it, but that is another matter). You can't work with the ideal tools, you can only work with the tools that actually exist. So, with the actual tools, this sacrifice must be made.
And now do answer your actual question, the reason is that the languages we use are just not expressive enough, and the compiler doesn't have enough time. If you are calling an interface which could be either the actual code or the mock code, the actual code can't be inlined, and that alone usually discards dozens of other optimizations. You can't specify in a language something like 'assume this implementation in optimized builds', so we're stuck.
There are ways around this, to be fair. In C#, you could ale all of your code generic, such that there could be two versions of your class - one with a real dependency, MyClass<RealDependency1, RealDependency2,...>, and one with mocked dependencies, MyClass<MockedDependency1, MockedDependency2,...>. Not sure if the C# compiler or JIT actually knows how to use this during optimization, but in principle it could. With C++ templates this would definitely allow optimization. With Java's extremely advanced JIT, you wouldn't actually need this code change, the compiler will be able to do it for you after a few thousand runs through the code, when it notices a single version is being called.
I have seen a lot of projects that use interfaces in C++ or Java just to be able to mock the tests. This puzzled me.
If it is known that the there will be exactly 2 implementations of the interface, one real one and one for mocking, then do not use interfaces. Instead use a global flag and at relevant places instead of calling interface’ methods just test the flag and branch for test/non-test code. If the flag is a compile-time constant, then compilers will be able to eliminate the checks completely. If it is not a constant, then due to branch prediction over a simple global the runtime impact will be minuscule, much smaller than with interfaces even without advanced link-time optimizations.
The big plus of this approach is that test-only coupling is explicit and easily grepable. Another one is that if test mocking requires complex changes to the control flaw, providing that level of control via interfaces often leads to complex and awkward interfaces, but with the flag it is very straightforward to get the necessary behavior.
It massively hurts readability and can quickly become messy in my experience. If you're running on multiple targets and testing on host, then you have an #ifdef spaghetti in no time.
In all the codebases I have worked on, they started like this because it is easy, but then moved on to interfaces/templates because maintainability is an issue with this method.
I worked on QT application where we observed the opposite. As typical with QT the code was a collection of components that were glued together in the main application. For testing it turned out adding few flags to the application was enough to implement unit and integration test driver.
Surely there were few ifs, but the code was very readable and such design allowed very straightforward implementation of integration tests.
It is relevant because they chose to abandon best practices because of a lack of tooling. A badly chosen language constraining your development methods is like a tail wagging the dog. There are languages and, more importantly, implementations out there that allow this kind of design. Just think of SML/NJ as an example for the principle.
If performance is important and C# doesn't allow unit testing performant code then switch to C++ or Rust.
It is staggering how many people seem to see the choice of language as an absolutely unchangeable decision. Rewriting even millions of lines of js or python code is definitely doable, even incrementally, and if your organization does not deem it worth the effort then maybe performance is not your most important concern.
With the current C# compiler and JIT, using generics to avoid a virtual function call works if the dependencies are structs, but not if they are classes.
That said, in my experience it's very hard to get performance gains through inlining in C# because the compiler is very conservative about what it will inline, even if you instruct it to aggressively inline a method (there is no way to force inlining in C#). IIRC this was a conscious decision by the C# team after they did some experiments and found that the benefit of fewer function calls from inlining was outweighed by the negative impact of increased code size on the instruction cache hit rate in all but the most trivial functions.
That decision makes it very hard to get some of the huge inlining wins that you see sometimes in C++ where essentially the compiler can recognize that once you recursively inline a few layers of abstraction you're in a special case that is known at compile time and a ton of code can be eliminated.
With that kind of situation off the table, a virtual method call that always goes to the same implementation will not cost you very much compared to directly calling the implementation method. It amounts to the difference between a branch and an indirect branch in machine code, and modern hardware has indirect branch predictors that will do a very good job when the indirect branch always goes to the same implementation.
One of the other drawbacks besides performance that I have seen in terms of "making code testable" is that it often hurts traceability. If I am jumping into an unfamiliar codebase, the best case is that I can right click on any method invocation, "go to declaration" and see what code is being executed. Often times in codebases which make heavy use of dependency injection, "go to declaration" takes me instead to some kind of interface declaration, and then I have to figure out which code is actually being executed at runtime.
I feel like this is a very real cost, and is often not accounted for in terms of the tradeoffs discussed in TDD.
I don't think this is a major problem with interfaces that have a real implementation and a mock implementation. You can just as well "Go to implementation", and will see two options - XImpl and MockXImpl.
This kind of thing is one of the reasons I'm really interested in languages that support multi-stage programming. I've played around with Terra and lately I've read many good things about Zig, because they both let you run code at compile-time. Things like dependency-injection could be zero-cost if done at an early compilation stage, so the compiler has a chance to optimise the code with the correct dependency already in place.
Using generics in this way in C# is asking for a headache. And C# being JIT compiled, the compiler can, theoretically, optimize by inlining. I do not know if it does.
In any case, for the vast majority if code bases, maintainability is more important than 10% better performance.
As long as dynamically loading code at runtime is allowed, proving that there will only ever be one implementation of an interface isn't going to be possible. That makes fully optimizing away the cost of the indirection difficult.
I've seen this pattern more often than not, but as the person joining the company later and having to fix other people's mess who either left or moved into management.
If someone says to me "something is a best practice" my question is: according to whom?
Because 95% of "Best Practices" around are just somebody's opinion disguised as a rule.
"Oh TDD is a best practice now" only if it's to sell more books. Yes I love unit testing, but to do it how it's the "best practice" please
Do you have an objective metric that's improved by your "best practice"?
SO itself is a great example. They're still on IIS+C# and it works for them. What if they didn't stop chasing the new fad or "best practice", like Kubernetes
"With our code base, you won’t be able to easily do test driven development or similar practices that the industry seems to love."
Why does the industry seem to love testing? Does the industry love testing? Does this mean you do or don't love testing?
It also feels like they wrote this blog post to show off: we write our c# code as if it was c code because we are better than all of you.
The fact is that every application is different, they chose to use a language that back in 2000's wasn't designed for performance and so they paid the price of using it for performance and didn't want to pay the scale out price.
Is their codebase testable, yes completely. Will adding unit tests to their code destroy their performance? no, there are always ways to make it testable and fast. Maybe in a few years we will see a "We thought testing was impossible but we made it work and also keep our codebase fast".