+1 on "well defined spec" -- a lot of Healthcare integrations are specified as "...

wenc · on Aug 18, 2022

The problem I’ve run into is that when you’re iterating fast, writing code takes double the time when you also have to write the tests.

Unit tests are still easy to write but most complex software have many parts that combine combinatorially and writing integration tests requires lots of mocking. This investment pays off when the design is stable but when business requirements are not that stable this becomes very expensive.

Some tests are actually very hard to write — I once led a project that where the code had both cloud and on-prem API calls (and called Twilio). Some of those environments were outside our control but we still had to make sure they we handled their failure modes. The testing code was very difficult to write and I wished we’d waited until we stabilized the code before attempting to test. There were too many rabbit holes that we naturally got rid of as we iterated and testing was like a ball and chain that made everything super laborious.

TDD also represents a kind of first order thinking that assumes that if the individual parts are correct, the whole will likely be correct. It’s not wrong but it’s also very expensive to achieve. Software does have higher order effects.

It’s like the old car analogy. American car makers used to believe that if you QC every part and make unit tolerances tight, you’ll get a good car on final assembly (unit tests). This is true if you can get it right all the time but it made US car manufacturing very expensive because it required perfection at every step.

Ironically Japanese carmakers eschewed this and allowed loose unit tolerances, but made sure the final build tolerance worked even when the individual unit tolerances had variation. They found this made manufacturing less expensive and still produced very high quality (arguably higher quality since the assembly was rigid where it had to be, and flexible where it had to be). This is craftsman thinking vs strict precision thinking.

This method is called “functional build” and Ford was the first US carmaker to adopt it. It eventually came to be adopted by all car makers.

https://www.gardnerweb.com/articles/building-better-vehicles...

bostik · on Aug 18, 2022

> Some tests are actually very hard to write — I once led a project that where the code had both cloud and on-prem API calls

I believe that this is a fundamental problem of testing in all distributed systems: you are trying to test and validate for emergent behaviour. The other term we have for such systems is: chaotic. Good luck with that.

In fact, I have begun to suspect that the way we even think about software testing is backwards. Instead of test scenarios we should be thinking in failure scenarios - and try to subject our software to as much of those as possible. Define the bounding box of the failure universe, and allow computer to generate the testing scenarios within. EXPECT that all software within will eventually fail, but as long as it survives beyond set thresholds, it gets a green light.

In a way... we'd need something like a bastard hybrid of fuzzing, chaos testing, soak testing, SRE principles and probabilistic outcomes.

steve_gh · on Aug 19, 2022

>I believe that this is a fundamental problem of testing in all distributed systems: you are trying to test and validate for emergent behaviour. The other term we have for such systems is: chaotic. Good luck with that

Emergent behaviour is complex, not chaotic. Chaos comes from sensitive dependence on initial conditions. Complexity is associated with non-ergodic statistics (i.e. sampling across time gives different results to sampling across space).

bostik · on Aug 19, 2022

Thank you for the correction. And indeed, "complex" would have been the right term. My bad.

throwawaymaths · on Aug 19, 2022

I work in Erlang virtual machine (elixir) and I am regularly writing tests against common distributed systems failures? You don't need property tests (or jeppsen maelstrom - style fuzzing) for your 95% scenarios. Distributed systems are not magically failure prone.

somewhereoutth · on Aug 18, 2022

> TDD also represents a kind of first order thinking that assumes that if the individual parts are correct, the whole will likely be correct. It’s not wrong

In fact it is not just wrong, but very wrong, as your auto example shows. Unfortunately engineers are not trained/socialised to think as holistically as perhaps they should be.

kazinator · on Aug 18, 2022

The non-strawman interpretation of TDD is the converse: if the individual parts are not right, then the whole will probably be garbage.

It's worth it to apply TDD to the pieces to which TDD is applicable. If not strict TDD than at least "test first" weak TDD.

The best candidates for TDD are libraries that implement pure data transformations with minimal integration with anything else.

(I suspect that the rabid TDD advocates mostly work in areas where the majority of the code is like that. CRUD work with predictable control and data flows.)

wenc · on Aug 18, 2022

Yes. Agree about TDD being more suited to low dependency software like CRUD apps or self contained libraries.

Also sometimes even if the individual parts aren’t right, the whole can still work.

Consider a function that handles all cases except for one that is rare, and testing for that case is expensive.

The overall system however can be written to provide mitigations upon composing — eg each individual function does a sanity check on its inputs. The individual function itself might be wrong (incomplete) but in the larger system, it is inconsequential.

Test effort is not a 1:1. Sometimes the test can be many times as complicated to write and maintain as the function being tested because it has to generate all the corner cases (and has to regenerate them if anything changes upstream). If you’re testing a function in the middle of a very complex data pipeline, you have regenerate all the artifacts upstream.

Whereas sometimes an untested function can be written in such a way where it is inherently correct from first principles. An extreme analogy would be the Collatz conjecture. If you start by first writing the test, you’d be writing an almost infinite corpus of tests — on the flip side, writing the Collatz function is extremely simple and correct up to large finite number.

crazygringo · on Aug 18, 2022

This is completely counter to all my experience.

Computer code is an inherently brittle thing, and the smallest errors tend to cascade into system crashes. Showstopper bugs are generated from off-by-one errors, incorrect operation around minimum and maximum values, a missing semicolon or comma, etc.

And doing sanity check on function inputs addresses only a small proportion of bugs.

I don't know what kind of programming you do, but the idea that a wrong function becomes inconsequential in a larger system... I feel like that just never happens unless the function was redundant and unnecessary in the first place. A wrong function brings down the larger system feels like the only kind of programming I've ever seen.

Physical unit tolerances don't seem like a useful analogy in programming at all. At best, maybe in sysops regarding provisioning, caches, API limits, etc. But not for code.

wenc · on Aug 18, 2022

> I don't know what kind of programming you do, but the idea that a wrong function becomes inconsequential in a larger system... I feel like that just never happens unless the function was redundant and unnecessary in the first place. A wrong function brings down the larger system feels like the only kind of programming I've ever seen.

I think we’re talking extremes here. An egregiously wrong function can bring down a system if it’s wrong in just the right ways and it’s a critical dependency.

But if you look at most code bases, many have untested corner cases (which they’re likely not handling) but the code base keeps chugging along.

Many codebases are probably doing something wrong today (hence GitHub issues). But to catastrophize that seems hyperbolic to me. Most software with mistakes still work. Many GitHub issues aren’t resolved but the program still runs. Good designs have redundancy and resilience.

YZF · on Aug 19, 2022

A counter to that could be all the little issues found by fuzz testing legacy systems and static analysis. Often in widely used software where those issues did not indeed manifest. Unit tests also don't prove correctness, they're as good as the writer of the unit test's ability to predict failure.

I can tell you that most (customer) issues in the software I work on are systemic issues, the database fails (widely used OSS) can corrupt under certain scenarios. They can be races, behaviour under failure modes, lack of correctness on some higher order (e.g. having half failed operations), the system not implementing the intent of the user. I would say very rarely those are issues that would have been caught by unit testing. Now integration testing and stress testing will uncover a lot of those. This is a large scale distributed system.

Now sometimes after the fact a unit test can somehow be created to reproduce the specific failure, possibly at great effort. That's not really something that useful at this point. You wouldn't write that in advance for every possible failure scenario (infinite).

All that said, sometimes there's attacks on systems that relate to some corner cases errors, which is a problem. Static analysis and fuzzers are IMO more useful tools in this realm as well. Also I think I'm hearing "dynamic/interepreted" language there (missing semicolons???). Those might need more unit testing to make up for the lack of compiler checks/warnings/type safety for sure.

The other point that's often missed is the drag that "bad" tests add to a project. Since it's so hard to write good tests when you mandate testing you end up with a pile of garbage that makes it harder to make progress. Other factors are the additional hit you take maintaining your tests.

Basically choosing the right kind of tests, at the right level, is judgement. You use the right tool for the right job. I rarely use TDD but I have used it in cases where the problem can relatively easily be stated in terms of tests and it helps me get quick feedback on my code.

EDIT: Also as another extreme thought ;) some software out there could be working because some function isn't behaving as expected. There's lots of C code out there that uses things that are technically UB but do actually have some guarantee under some precise circumstances (but idea but what can you do). In this case the unit test would pass despite the code being incorrect.

OrderlyTiamat · on Aug 19, 2022

I work in software testing, and I've seen this many times actually. Small bugs that I notice because I'm actually reading the code, which became inconsequential because that code path is never used anymore or the result is now discarded, or any of a number of things that change the execution environment of that piece of code.

If anything I'm wondering the same question about you. If you find it so inconceivable that a bug is hiding in working code that is held up because the calling environment around it, than you must not have worked with big or even moderately sized codebases at all.

P5fRxh5kUvp2th · on Aug 18, 2022

> sometimes even if the individual parts aren’t right, the whole can still work.

And in fact, fault tolerance with the assumption that all of it's parts are unreliable and will fail quickly makes for more fault tolerant systems.

The _processes and attitude_ that cause many individual parts to be incorrect will also cause the overall system to be crap. There's a definite correlation, but that correlation isn't about any specific part.

kazinator · on Aug 18, 2022

> Also sometimes even if the individual parts aren’t right, the whole can still work.

Yes it can, but the foundation is shaky, and having to make changes to it will tend to be scary.

wenc · on Aug 18, 2022

Yes. Though my point is not that we should aim for a shaky foundation, but that if one is a craftsman one ought to know where to make trade offs to allow some parts of the code to be shaky with no consequences. This ability to understand how to trade off perfection for time — when appropriate — is what distinguishes senior from junior developers. The idea of ~100% correct code base is an ideal — it’s achieved only rarely on very mature code bases (eg TeX, SQLite).

Code is ultimately organic, and experienced developers know where the code needs be 100% and where the code can flex if needed. People have this idea that code is like mathematics where if one part fails, every part fails. To me if that is so, the design too tight and brittle and will not ship on time. But well designed code is more like an organism that has resilience to variation.

hbn · on Aug 18, 2022

If individual parts being correct meant the whole thing will be correct, that means if you have a good sturdy propeller and you put it on top of your working car, then you have a working helicopter.

pmarreck · on Aug 18, 2022

> writing code takes double the time when you also have to write the tests

this time is more than made up for by the usual subsequent loss of debugging, refactoring and maintenance time, in my experience, at least for anything actively being used and updated

tsimionescu · on Aug 18, 2022

Yes, if you were right about the requirements, even if they weren't well specified. But if it turns out you implemented the wrong thing (either because the requirements simply changed for external reasons, or because you missed some fundamental aspect), then you wouldn't have had to debug, refractor or maintain that initial code, and the initial tests will probably be completely useless even if you end up salvaging some of the initial implementation.

twic · on Aug 18, 2022

No, that's a separate issue, that eschewing TDD doesn't help you with.

With TDD, the inner programming loop is:

1. form a belief about requirements

2. write a test to express that belief

3. write code to make that test pass

Without TDD, the loop is:

1. form a belief about requirements

2. write code to express that belief

3. futz around with manual testing, REPLs, and after-the-fact testing until you're sufficiently happy that the code actually does express that belief

And in my experience, the former loop is faster at producing working code.

ipaddr · on Aug 18, 2022

It usually works out like..

  form a belief about a requirement
  write a test
  test fails
  write code
  test fails
  add debug info to code
  test fails no debug showing
  call code directly and see debug code
  change assert
  test fails
  rewrite test
  test succeed
  output test class data.. false 
  positive checking null equals null
  rewrite test
  test passes
  forget original purpose and stare at green passing tests with pride.

xxs · on Aug 18, 2022

> add debug info to code

On a more serious note: just learn to use a debugger, and add asserts, if need be. To me TDD only helps having something that would run your code - but that's pretty much it. If you have other test harness options, I fail to see the benefits outside conference talks and books authoring.

pmarreck · on Aug 26, 2022

my professional opinion is that having to resort to a debugger is a bad-design, bad-testing code smell

laserlight · on Aug 18, 2022

Yes, so much this. I don’t really understand how people could object to TDD. It’s just about putting together what one manually does otherwise. As a bonus, it’s not subject to biases because of after-the-fact testing.

pjmlp · on Aug 18, 2022

Test the belief of recovery from a network split in distributed commit.

laserlight · on Aug 18, 2022

I don't get the point. Is it something not testable? If it's testable, it's TDD-able.

pjmlp · on Aug 19, 2022

TDD sales pitch is not to write any code without an existing test.

minimeme · on Aug 18, 2022

That's my experience also! It's all about faster feedback and confidence the tests provide.

thrwyoilarticle · on Aug 19, 2022

>at least for anything actively being used and updated

This implies that the strength of the tests appears when it's modified?

Like the article says, TDD doesn't own the concept of testing. You can write good tests without submitting yourself to a dogma of red/green, minimum-passing (local-maximum-seeking) code. Debating TDD is tough because it gets bogged down with having to explain how you're not a troglodyte who writes buggy untested code.

And - on a snarkier note - this is a better argument against dynamic typing than for TDD.

wenc · on Aug 18, 2022

In theory, I agree. In practice, at least for my projects, the results are mixed.

dathanb82 · on Aug 18, 2022

I can't remember the last time the speed at which I could physically produce code was the bottleneck in a project. It's all about design and thinking through and documenting the edge cases, and coming up with new edge cases and going back to the design. By the time we know what we're going to write, writing the code isn't the bottleneck, and even if it takes twice as long, that's fine, especially since I generally end up designing a more usable interface as a result of using it (in my tests) as it's being built.

1123581321 · on Aug 18, 2022

The automaker analogy is a better fit for the “practice” of not handling errors on the assumption a function can’t return an unexpected value.

TDD is actually quite good at manufacturing methods to reasonable tolerance, which the Japanese did require.

Higher level tests ensure the functional output is correct and typically don’t have built in any reliance on unit tests.

majikandy · on Aug 19, 2022

> The problem I’ve run into is that when you’re iterating fast, writing code takes double the time when you also have to write the tests.

The times I have believed this myself, often turned out to be wrong when the full cost of development was taken into account. And I came back to the code later wishing I had tests around it. So you end up TDDing only the bug fix and exercising that part of the code with the failing test and then the code correction.

ParetoOptimal · on Aug 19, 2022

> The problem I’ve run into is that when you’re iterating fast, writing code takes double the time when you also have to write the tests.

That was the time it took to actually write working code for that feature.

The version of "working code" that took 50% as long was just a con to fool people into thinking you'd finished until they move onto other things and a "perfectly acceptable" regression is discovered.

discreteevent · on Aug 19, 2022

The reason someone is iterating fast is usually because they are trying to discover the best solution to a problem by building things. Once they have found this then they can write "working code". But they don't want to have to write tests for all the approaches that didn't work and will be thrown away after the prototyping phase.

tsimionescu · on Aug 18, 2022

There are two problems I've seen with this approach. One is that sometimes the feature you implemented and tested turns out to be wrong.

Say, initially you were told "if I click this button the status should update to complete", you write the test, you implement the code, rinse and repeat until a demo. During the demo, you discover that actually they'd rather the button become a slider, and it shouldn't say Complete when it's pressed, it should show a percent as you pull it more and more. Now, all the extra care you did to make sure the initial implementation was correct turns out to be useless. It would have been better to have spent half the time on a buggy version of the initial feature, and found out sooner that you need to fundamentally change the code by showing your clients what it looks like.

Of course, if the feature doesn't turn out to be wrong, then TDD was great - not only is your code working, you probably even finished faster than if you had started with a first pass + bug fixing later.

But I agree with the GP: unclear and changing requirements + TDD is a recipe for wasted time polishing throw-away code.

Edit: the second problem is well addressed by a sibling comment, related to complex interactions.

generalk · on Aug 18, 2022

  > Say, initially you were told "if I click this button the status should  
  > update to complete", you write the test, you implement the code, rinse and 
  > repeat until a demo. During the demo, you discover that actually they'd 
  > rather the button become a slider, and it shouldn't say Complete when it's 
  > pressed, it should show a percent as you pull it more and more. Now, all the 
  > extra care you did to make sure the initial implementation was correct turns 
  > out to be useless.

Sure, this happens. You work on a thing, put it in front of the folks who asked for it, and they realize they wanted something slightly different. Or they just plain don't want the thing at all.

This is an issue that's solved by something like Agile (frequent and regular stakeholder review, short cycle time) and has little to do with whether or not you've written tests first and let them guide your implementation; wrote the tests after the implementation was finished; or just simply chucked automated testing in the trash.

Either way, you've gotta make some unexpected changes. For me, I've really liked having the tests guide my implementation. Using your example, I may need to have a "percent complete" concept, which I'll only implement when a test fails because I don't have it, and I'll implement it by doing the simplest thing to get it to pass. If I approach it directly and hack something together I run the risk of overcomplicating the implementation based on what I imagine I'll need.

I don't have an opinion on how anyone else approaches writing complex systems, but I know what's worked for me and what hasn't.

andix · on Aug 18, 2022

TDD usually means that you write the tests before writing the code.

Writing tests as you write the code is just regular and proper software development.

patcon · on Aug 18, 2022

Respectfully, I think the distinction they're making it that "writing ONE failing test then the code to pass it" is very different than "write a whole test suite, and then write the code to pass it".

The former is more likely to adapt to the learning inherent in the writing of code, which someone above mentioned was easy to lose in TDD :)

Spivak · on Aug 18, 2022

Odd, I was taught TDD as

1. Write test, see that it fails the way you expect.

2. Write code that makes the test pass.

3. Write test...

and be secure that you can fearlessly refactor and not backslide while you play with different ideas so long as all your tests stay green.

I would get overwhelmed so fast if I just had 50 failing tests and no implementation.

imran-iq · on Aug 18, 2022

That's the right way to do TDD, see this talk: https://www.youtube.com/watch?v=EZ05e7EMOLM

One of the above comments mentions BDD as a close cousin to TDD, but that is wrong as TDD is actually BDD as you should only be testing behaviours, which allow you to "fearlessly refactor"

bcrosby95 · on Aug 18, 2022

Behavior is an unfortunate term, because the "London Style" TDD sometimes is described as testing behaviors:

https://softwareengineering.stackexchange.com/questions/1236...

Which seems like the exact opposite of what the talk is saying you should do.

thrwyoilarticle · on Aug 19, 2022

I don't think TDD gets to own the concept of having a test for what you're refactoring. That's just good practice & doesn't require that you make it fail first.

pjmlp · on Aug 18, 2022

Now do that for rendering a rotating cube in Vulkan with pbr shading.

Spivak · on Aug 19, 2022

This falls under the category of problems where verifying, hell describing, the result is harder than the code to produce it.

Here’s how I would do it. The challenge is that the result can’t be precisely defined because it’s essentially art. But with TDD the assertions don’t actually have to actually live in code. All we have to do is make incremental verifiable progress that lets us fearlessly make changes.

So I would set up my viewport as a grid where in each square there will eventually live a rendered image or animation. The first one blank, the second one a dot, the third a square, the fourth with color, the fifth a rhombus, the sixth with two disjoint rhombuses …

When you’re satisfied with each box you copy/paste the code into the next one and work on the next test always rendering the previous frames. So you can always reference all the previous working states and just start over if needed.

So the TDD flow becomes

1. Write down what you want the result of the next box to look like.

2. Start with the previous iteration and make changes until it looks like what you wanted.

3. Write down what you want…

kqr · on Aug 19, 2022

Using wetware test oracles is underappreciated. You can't do it in a dumb way, of course, but with a basic grasp of statistics and hypothesis testing you can get very far with sprinkles of manual verification of test results.

(Note: manual verification is not the same as manual execution!)

pjmlp · on Aug 19, 2022

That isn't what TDD preaches.

kqr · on Aug 19, 2022

TDD also underappreciates the ROI of that.

pjmlp · on Aug 19, 2022

TDD sales pitch it not to write any code without a test for it, that fails it.

Spivak · on Aug 19, 2022

And that’s happening. The next test is “the 8th box should contain a rhombus slowly rotating clockwise” and it’s failing because the box is currently empty. So now you write code.