+1 on "well defined spec" -- a lot of Healthcare integrations are specified as "here's the requests, ensure your system responds like this" and being able to put those in a test suite and know where you're at is invaluable!
But TDD is fantastic for growing software as well! I managed to save an otherwise doomed project by rigorously sticking to TDD (and its close cousin Behavior Driven Development.)
It sounds like you're expecting that the entire test suite ought to be written up front? The way I've had success is to write a single test, watch it fail, fix the failure as quickly as possible, repeat, and then once the test passes fix up whatever junk I wrote so I don't hate it in a month. Red, Green, Refactor.
If you combine that with frequent stakeholder review, you're golden. This way you're never sitting on a huge pile of unimplemented tests; nor are you writing tests for parts of the software you don't need. For example from that project: week one was the core business logic setup. Normally I'd have dove into users/permissions, soft deletes, auditing, all that as part of basic setup. But this way, I started with basic tests: "If I go to this page I should see these details;" "If I click this button the status should update to Complete." Nowhere do those tests ask about users, so we don't have them. Focus remains on what we told people we'd have done.
I know not everyone works that way, but damn if the results didn't make me a firm believer.
The problem I’ve run into is that when you’re iterating fast, writing code takes double the time when you also have to write the tests.
Unit tests are still easy to write but most complex software have many parts that combine combinatorially and writing integration tests requires lots of mocking. This investment pays off when the design is stable but when business requirements are not that stable this becomes very expensive.
Some tests are actually very hard to write — I once led a project that where the code had both cloud and on-prem API calls (and called Twilio). Some of those environments were outside our control but we still had to make sure they we handled their failure modes. The testing code was very difficult to write and I wished we’d waited until we stabilized the code before attempting to test. There were too many rabbit holes that we naturally got rid of as we iterated and testing was like a ball and chain that made everything super laborious.
TDD also represents a kind of first order thinking that assumes that if the individual parts are correct, the whole will likely be correct. It’s not wrong but it’s also very expensive to achieve. Software does have higher order effects.
It’s like the old car analogy. American car makers used to believe that if you QC every part and make unit tolerances tight, you’ll get a good car on final assembly (unit tests). This is true if you can get it right all the time but it made US car manufacturing very expensive because it required perfection at every step.
Ironically Japanese carmakers eschewed this and allowed loose unit tolerances, but made sure the final build tolerance worked even when the individual unit tolerances had variation. They found this made manufacturing less expensive and still produced very high quality (arguably higher quality since the assembly was rigid where it had to be, and flexible where it had to be). This is craftsman thinking vs strict precision thinking.
This method is called “functional build” and Ford was the first US carmaker to adopt it. It eventually came to be adopted by all car makers.
> Some tests are actually very hard to write — I once led a project that where the code had both cloud and on-prem API calls
I believe that this is a fundamental problem of testing in all distributed systems: you are trying to test and validate for emergent behaviour. The other term we have for such systems is: chaotic. Good luck with that.
In fact, I have begun to suspect that the way we even think about software testing is backwards. Instead of test scenarios we should be thinking in failure scenarios - and try to subject our software to as much of those as possible. Define the bounding box of the failure universe, and allow computer to generate the testing scenarios within. EXPECT that all software within will eventually fail, but as long as it survives beyond set thresholds, it gets a green light.
In a way... we'd need something like a bastard hybrid of fuzzing, chaos testing, soak testing, SRE principles and probabilistic outcomes.
>I believe that this is a fundamental problem of testing in all distributed systems: you are trying to test and validate for emergent behaviour. The other term we have for such systems is: chaotic. Good luck with that
Emergent behaviour is complex, not chaotic. Chaos comes from sensitive dependence on initial conditions. Complexity is associated with non-ergodic statistics (i.e. sampling across time gives different results to sampling across space).
I work in Erlang virtual machine (elixir) and I am regularly writing tests against common distributed systems failures? You don't need property tests (or jeppsen maelstrom - style fuzzing) for your 95% scenarios. Distributed systems are not magically failure prone.
> TDD also represents a kind of first order thinking that assumes that if the individual parts are correct, the whole will likely be correct. It’s not wrong
In fact it is not just wrong, but very wrong, as your auto example shows. Unfortunately engineers are not trained/socialised to think as holistically as perhaps they should be.
The non-strawman interpretation of TDD is the converse: if the individual parts are not right, then the whole will probably be garbage.
It's worth it to apply TDD to the pieces to which TDD is applicable. If not strict TDD than at least "test first" weak TDD.
The best candidates for TDD are libraries that implement pure data transformations with minimal integration with anything else.
(I suspect that the rabid TDD advocates mostly work in areas where the majority of the code is like that. CRUD work with predictable control and data flows.)
Yes. Agree about TDD being more suited to low dependency software like CRUD apps or self contained libraries.
Also sometimes even if the individual parts aren’t right, the whole can still work.
Consider a function that handles all cases except for one that is rare, and testing for that case is expensive.
The overall system however can be written to provide mitigations upon composing — eg each individual function does a sanity check on its inputs. The individual function itself might be wrong (incomplete) but in the larger system, it is inconsequential.
Test effort is not a 1:1. Sometimes the test can be many times as complicated to write and maintain as the function being tested because it has to generate all the corner cases (and has to regenerate them if anything changes upstream). If you’re testing a function in the middle of a very complex data pipeline, you have regenerate all the artifacts upstream.
Whereas sometimes an untested function can be written in such a way where it is inherently correct from first principles. An extreme analogy would be the Collatz conjecture. If you start by first writing the test, you’d be writing an almost infinite corpus of tests — on the flip side, writing the Collatz function is extremely simple and correct up to large finite number.
Computer code is an inherently brittle thing, and the smallest errors tend to cascade into system crashes. Showstopper bugs are generated from off-by-one errors, incorrect operation around minimum and maximum values, a missing semicolon or comma, etc.
And doing sanity check on function inputs addresses only a small proportion of bugs.
I don't know what kind of programming you do, but the idea that a wrong function becomes inconsequential in a larger system... I feel like that just never happens unless the function was redundant and unnecessary in the first place. A wrong function brings down the larger system feels like the only kind of programming I've ever seen.
Physical unit tolerances don't seem like a useful analogy in programming at all. At best, maybe in sysops regarding provisioning, caches, API limits, etc. But not for code.
> I don't know what kind of programming you do, but the idea that a wrong function becomes inconsequential in a larger system... I feel like that just never happens unless the function was redundant and unnecessary in the first place. A wrong function brings down the larger system feels like the only kind of programming I've ever seen.
I think we’re talking extremes here. An egregiously wrong function can bring down a system if it’s wrong in just the right ways and it’s a critical dependency.
But if you look at most code bases, many have untested corner cases (which they’re likely not handling) but the code base keeps chugging along.
Many codebases are probably doing something wrong today (hence GitHub issues). But to catastrophize that seems hyperbolic to me. Most software with mistakes still work. Many GitHub issues aren’t resolved but the program still runs. Good designs have redundancy and resilience.
A counter to that could be all the little issues found by fuzz testing legacy systems and static analysis. Often in widely used software where those issues did not indeed manifest. Unit tests also don't prove correctness, they're as good as the writer of the unit test's ability to predict failure.
I can tell you that most (customer) issues in the software I work on are systemic issues, the database fails (widely used OSS) can corrupt under certain scenarios. They can be races, behaviour under failure modes, lack of correctness on some higher order (e.g. having half failed operations), the system not implementing the intent of the user. I would say very rarely those are issues that would have been caught by unit testing. Now integration testing and stress testing will uncover a lot of those. This is a large scale distributed system.
Now sometimes after the fact a unit test can somehow be created to reproduce the specific failure, possibly at great effort. That's not really something that useful at this point. You wouldn't write that in advance for every possible failure scenario (infinite).
All that said, sometimes there's attacks on systems that relate to some corner cases errors, which is a problem. Static analysis and fuzzers are IMO more useful tools in this realm as well. Also I think I'm hearing "dynamic/interepreted" language there (missing semicolons???). Those might need more unit testing to make up for the lack of compiler checks/warnings/type safety for sure.
The other point that's often missed is the drag that "bad" tests add to a project. Since it's so hard to write good tests when you mandate testing you end up with a pile of garbage that makes it harder to make progress. Other factors are the additional hit you take maintaining your tests.
Basically choosing the right kind of tests, at the right level, is judgement. You use the right tool for the right job. I rarely use TDD but I have used it in cases where the problem can relatively easily be stated in terms of tests and it helps me get quick feedback on my code.
EDIT: Also as another extreme thought ;) some software out there could be working because some function isn't behaving as expected. There's lots of C code out there that uses things that are technically UB but do actually have some guarantee under some precise circumstances (but idea but what can you do). In this case the unit test would pass despite the code being incorrect.
I work in software testing, and I've seen this many times actually. Small bugs that I notice because I'm actually reading the code, which became inconsequential because that code path is never used anymore or the result is now discarded, or any of a number of things that change the execution environment of that piece of code.
If anything I'm wondering the same question about you. If you find it so inconceivable that a bug is hiding in working code that is held up because the calling environment around it, than you must not have worked with big or even moderately sized codebases at all.
> sometimes even if the individual parts aren’t right, the whole can still work.
And in fact, fault tolerance with the assumption that all of it's parts are unreliable and will fail quickly makes for more fault tolerant systems.
The _processes and attitude_ that cause many individual parts to be incorrect will also cause the overall system to be crap. There's a definite correlation, but that correlation isn't about any specific part.
Yes. Though my point is not that we should aim for a shaky foundation, but that if one is a craftsman one ought to know where to make trade offs to allow some parts of the code to be shaky with no consequences. This ability to understand how to trade off perfection for time — when appropriate — is what distinguishes senior from junior developers. The idea of ~100% correct code base is an ideal — it’s achieved only rarely on very mature code bases (eg TeX, SQLite).
Code is ultimately organic, and experienced developers know where the code needs be 100% and where the code can flex if needed. People have this idea that code is like mathematics where if one part fails, every part fails. To me if that is so, the design too tight and brittle and will not ship on time. But well designed code is more like an organism that has resilience to variation.
If individual parts being correct meant the whole thing will be correct, that means if you have a good sturdy propeller and you put it on top of your working car, then you have a working helicopter.
> writing code takes double the time when you also have to write the tests
this time is more than made up for by the usual subsequent loss of debugging, refactoring and maintenance time, in my experience, at least for anything actively being used and updated
Yes, if you were right about the requirements, even if they weren't well specified. But if it turns out you implemented the wrong thing (either because the requirements simply changed for external reasons, or because you missed some fundamental aspect), then you wouldn't have had to debug, refractor or maintain that initial code, and the initial tests will probably be completely useless even if you end up salvaging some of the initial implementation.
form a belief about a requirement
write a test
test fails
write code
test fails
add debug info to code
test fails no debug showing
call code directly and see debug code
change assert
test fails
rewrite test
test succeed
output test class data.. false
positive checking null equals null
rewrite test
test passes
forget original purpose and stare at green passing tests with pride.
On a more serious note: just learn to use a debugger, and add asserts, if need be. To me TDD only helps having something that would run your code - but that's pretty much it. If you have other test harness options, I fail to see the benefits outside conference talks and books authoring.
Yes, so much this. I don’t really understand how people could object to TDD. It’s just about putting together what one manually does otherwise. As a bonus, it’s not subject to biases because of after-the-fact testing.
>at least for anything actively being used and updated
This implies that the strength of the tests appears when it's modified?
Like the article says, TDD doesn't own the concept of testing. You can write good tests without submitting yourself to a dogma of red/green, minimum-passing (local-maximum-seeking) code. Debating TDD is tough because it gets bogged down with having to explain how you're not a troglodyte who writes buggy untested code.
And - on a snarkier note - this is a better argument against dynamic typing than for TDD.
I can't remember the last time the speed at which I could physically produce code was the bottleneck in a project. It's all about design and thinking through and documenting the edge cases, and coming up with new edge cases and going back to the design. By the time we know what we're going to write, writing the code isn't the bottleneck, and even if it takes twice as long, that's fine, especially since I generally end up designing a more usable interface as a result of using it (in my tests) as it's being built.
> The problem I’ve run into is that when you’re iterating fast, writing code takes double the time when you also have to write the tests.
The times I have believed this myself, often turned out to be wrong when the full cost of development was taken into account. And I came back to the code later wishing I had tests around it. So you end up TDDing only the bug fix and exercising that part of the code with the failing test and then the code correction.
> The problem I’ve run into is that when you’re iterating fast, writing code takes double the time when you also have to write the tests.
That was the time it took to actually write working code for that feature.
The version of "working code" that took 50% as long was just a con to fool people into thinking you'd finished until they move onto other things and a "perfectly acceptable" regression is discovered.
The reason someone is iterating fast is usually because they are trying to discover the best solution to a problem by building things. Once they have found this then they can write "working code". But they don't want to have to write tests for all the approaches that didn't work and will be thrown away after the prototyping phase.
There are two problems I've seen with this approach. One is that sometimes the feature you implemented and tested turns out to be wrong.
Say, initially you were told "if I click this button the status should update to complete", you write the test, you implement the code, rinse and repeat until a demo. During the demo, you discover that actually they'd rather the button become a slider, and it shouldn't say Complete when it's pressed, it should show a percent as you pull it more and more. Now, all the extra care you did to make sure the initial implementation was correct turns out to be useless. It would have been better to have spent half the time on a buggy version of the initial feature, and found out sooner that you need to fundamentally change the code by showing your clients what it looks like.
Of course, if the feature doesn't turn out to be wrong, then TDD was great - not only is your code working, you probably even finished faster than if you had started with a first pass + bug fixing later.
But I agree with the GP: unclear and changing requirements + TDD is a recipe for wasted time polishing throw-away code.
Edit: the second problem is well addressed by a sibling comment, related to complex interactions.
> Say, initially you were told "if I click this button the status should
> update to complete", you write the test, you implement the code, rinse and
> repeat until a demo. During the demo, you discover that actually they'd
> rather the button become a slider, and it shouldn't say Complete when it's
> pressed, it should show a percent as you pull it more and more. Now, all the
> extra care you did to make sure the initial implementation was correct turns
> out to be useless.
Sure, this happens. You work on a thing, put it in front of the folks who asked for it, and they realize they wanted something slightly different. Or they just plain don't want the thing at all.
This is an issue that's solved by something like Agile (frequent and regular stakeholder review, short cycle time) and has little to do with whether or not you've written tests first and let them guide your implementation; wrote the tests after the implementation was finished; or just simply chucked automated testing in the trash.
Either way, you've gotta make some unexpected changes. For me, I've really liked having the tests guide my implementation. Using your example, I may need to have a "percent complete" concept, which I'll only implement when a test fails because I don't have it, and I'll implement it by doing the simplest thing to get it to pass. If I approach it directly and hack something together I run the risk of overcomplicating the implementation based on what I imagine I'll need.
I don't have an opinion on how anyone else approaches writing complex systems, but I know what's worked for me and what hasn't.
Respectfully, I think the distinction they're making it that "writing ONE failing test then the code to pass it" is very different than "write a whole test suite, and then write the code to pass it".
The former is more likely to adapt to the learning inherent in the writing of code, which someone above mentioned was easy to lose in TDD :)
One of the above comments mentions BDD as a close cousin to TDD, but that is wrong as TDD is actually BDD as you should only be testing behaviours, which allow you to "fearlessly refactor"
I don't think TDD gets to own the concept of having a test for what you're refactoring. That's just good practice & doesn't require that you make it fail first.
This falls under the category of problems where verifying, hell describing, the result is harder than the code to produce it.
Here’s how I would do it. The challenge is that the result can’t be precisely defined because it’s essentially art. But with TDD the assertions don’t actually have to actually live in code. All we have to do is make incremental verifiable progress that lets us fearlessly make changes.
So I would set up my viewport as a grid where in each square there will eventually live a rendered image or animation. The first one blank, the second one a dot, the third a square, the fourth with color, the fifth a rhombus, the sixth with two disjoint rhombuses …
When you’re satisfied with each box you copy/paste the code into the next one and work on the next test always rendering the previous frames. So you can always reference all the previous working states and just start over if needed.
So the TDD flow becomes
1. Write down what you want the result of the next box to look like.
2. Start with the previous iteration and make changes until it looks like what you wanted.
Using wetware test oracles is underappreciated. You can't do it in a dumb way, of course, but with a basic grasp of statistics and hypothesis testing you can get very far with sprinkles of manual verification of test results.
(Note: manual verification is not the same as manual execution!)
And that’s happening. The next test is “the 8th box should contain a rhombus slowly rotating clockwise” and it’s failing because the box is currently empty. So now you write code.
But TDD is fantastic for growing software as well! I managed to save an otherwise doomed project by rigorously sticking to TDD (and its close cousin Behavior Driven Development.)
It sounds like you're expecting that the entire test suite ought to be written up front? The way I've had success is to write a single test, watch it fail, fix the failure as quickly as possible, repeat, and then once the test passes fix up whatever junk I wrote so I don't hate it in a month. Red, Green, Refactor.
If you combine that with frequent stakeholder review, you're golden. This way you're never sitting on a huge pile of unimplemented tests; nor are you writing tests for parts of the software you don't need. For example from that project: week one was the core business logic setup. Normally I'd have dove into users/permissions, soft deletes, auditing, all that as part of basic setup. But this way, I started with basic tests: "If I go to this page I should see these details;" "If I click this button the status should update to Complete." Nowhere do those tests ask about users, so we don't have them. Focus remains on what we told people we'd have done.
I know not everyone works that way, but damn if the results didn't make me a firm believer.