This is why I left Microsoft. Automated testing was a separate discipline - meaning there were "dev devs" and "test devs". Automated tests were written based on what the "test devs" had time for, not on the need or usefulness of such tests for the actual code. I was hired as a "test dev" - I had no industry experience at the time and figured I would give it an unprejudiced try to see if I liked it.
I quickly realized that my job was futile - many of the "good" tests had already been written, while in other places, "bad" tests were entrenched and the "test dev" had the job of manning the scanner to check for nail clippers, or upgrading the scanner to find as many nail clippers as possible.
Here's a useful rule on the subject that I picked up from an Artificial Intelligence course back in the day: The value of a piece of information is proportional to the chance that you will act on it times the benefit of acting on it. We all realize there is no benefit in testing if you ignore failures rather than acting to fix the bugs, but in much the same way that doing nothing when tests fail has no benefit, doing nothing when tests pass also has no benefit - so tests which always pass are just as useless as failing tests you ignore, as are tests which only turn up corner-case bugs that you would have been comfortable with shipping.
If you're doing the right amount of testing, there should be a good chance, whenever you kick off a test run, that your actions for the next hour will change depending on the results of the run. If you typically don't change your actions based on the information from the tests, then the effort spent to write tests gathering that information was wasted.
I disagree that there is no benefit in passing tests that don't change your behavior. Those tests are markers to prevent you from unknowingly doing something that should have changed your behavior. That is where the nuance enters: is this a marker I want to lay down or not? Some markers should be there; others absolutely should not and just introduce noise.
I don't understand what you mean by "prevent you from unknowingly doing something that should have changed your behavior". If you do something without knowing it, how could it change your behavior? If it's a case where you should have changed your behavior, why would you prevent it?
I believe the above comment refers to regression testing. For instance, if I write a test for an invariant that is fairly unlikely to change, then the chance that my behavior will change in the next hour based on the test run is small. However, if and when the invariant is mistakenly changed, even though negative side effects might not be immediately visible, it could be immensely valuable to me to see the flaw and restore that invariant.
Yes - but the test would fail when the invariant is mistakenly changed. On the test run after the invariant was changed, you would get new information (the test does not always pass) and change your behavior (revert the commit which altered the invariant).
That is the point of the "changing behavior" rule - you do not gather the benefit of running a test until it has failed at least once, and the benefit gathered is proportionate to the benefit of the action you take upon seeing the failure. The tricky part of the rule is that you must predict your actions in the future, since a test that might have a very important failure later could pass all the time right now. Knowing your own weaknesses and strengths is important, as is knowing the risks of your project.
There are possible design benefits to writing tests, since you must write code that is testable, and testable code tends to also be modular. However, once you have written testable code, you still gain those design benefits even if you never run your test suite, or even delete your tests entirely!
Your comment reads like you can know when a test will fail in the future (how else can you know the difference between a test that "always passes" and a test that will fail in the future to identify a regression?). You may have a test that passes for ten years. When do you know it's OK to nuke the test?
Based on your follow-up, it is clear that my reading was not what you intended.
You can't know, but you can guess, based on past experience or logic. The simplest way to estimate the future is to guess that it will be similar to the past.
For example, if you personally tend to write off-by-one errors a lot, it's a good idea to write tests which check that. On the other hand, if you almost never write off-by-one errors, you can skip those tests. If test is cheap to write, easy to investigate, and covers a piece of code that would cause catastrophic problems if it failed, it's worthwhile to write the test even if you can barely imagine a possible situation where it would fail - the degree of the cost matters as much as the degree of the benefit.
You don't "know" when it's OK to nuke a test just as you don't really "know" when it's safe to launch a product - you decide what you're doing based on experience, knowledge, and logic. The important step many don't take is developing the ability to distinguish between good tests and bad tests, rather than simply having an opinion on testing in general.
Re: "The simplest way to estimate the future is to guess that it will be similar to the past."
When we say that the future will be similar to the past, for code, we really mean that the probability of certain events occurring in the future will be similar to their prior probability of occurring in the past.
In my hypothetical example of testing an invariant that is unlikely to fail but damaging if it does, it might be valuable to keep that test around for five years even if it never fails. Imagine that the expected frequency of failure was initially <once per ten years>, and that the test hasn't failed after five years. If the expected frequency of failure, cost of failure, and gain from fixing a failure remain the same, we should keep the test even if it's never failed: the expected benefit is constant.
Not to say that we should test for every possible bug, but if something is important enough in the first place to test for it, and that doesn't change (as calculated by expected benefit minus expected cost of maintenance), we should keep the test whether or not it changes our behavior.
Thus, if we could estimate probabilities correctly, we really would know when it's OK to nuke a test.
> We all realize there is no benefit in testing if you ignore failures rather than acting to fix the bugs, but in much the same way that doing nothing when tests fail has no benefit, doing nothing when tests pass also has no benefit - so tests which always pass are just as useless as failing tests you ignore, as are tests which only turn up corner-case bugs that you would have been comfortable with shipping.
On the contrary, my opinion is that we don't see this often enough. Our bug database shows the current list of defects, and there is very very very little data on what does work. What is test covering, and how many defects are they finding within that coverage?
If your bug trend is heading downward, is it because the test org is distracted by something, or because there are fewer bugs that they are encountering?
This is the danger of having a 'test org' separate from the 'dev org'. When writing tests is tied to writing production code, your progress in one is tied to progress in the other. If it's easy to write tests for a particular feature, then the developer stops writing tests and writes the feature once they're done with the easy tests. It's much easier to understand your coverage when you're actually working with and writing the code being covered, rather than working in some separate "test org" - you don't need to run a coverage tool, you just see what breaks if you comment out a chunk of code. If the answer is "nothing" then it wasn't covered!
At the end of the day, an automated test suite is a product for developers on your project in the same way that a car is a product for drivers. You will have a hard time making a car if nobody on your car-making team can drive, and you will have a hard time writing a test suite as a tool to develop Project Foo if nobody on your test-writing team develops Project Foo.
I now write a project where I handle both the code and the tests. In the same way that my production code is improved by writing tests, my tests are improved by writing production code. I know what is included in the test coverage in the same way that I know what I had for lunch today - because I was there and I remember what happened. Tools for checking coverage are an aid to my understanding; in a company with a separate test org, you don't know anything about coverage until you've run the tool.
> This is the danger of having a 'test org' separate from the 'dev org'.
I completely agree with you, in light of how MSFT categorizes their ICs these days. There used to be three test/dev disciplines when I started (in 2001): "Software Test Engineers", "Software Development Engineers in Test", and "Software Development Engineers" (you likely know this already, but it might be useful history for others to know). Around 8-9 years ago the STE discipline disappeared; the company hired for SDETs (like yourself) from that point on.
The big loss here was that now all STEs were expected to be SDETs - writing code on a regular basis to test the application. There were (notice the past tense) many STEs who knew our application very* very well, and knew how to break it, hard. STEs provided quality feedback at a higher level than what a unit test provides - does the application feel right? Are customers getting their solutions solved? If you have your eyeballs stuck inside a unit test it's difficult to step 30 feet back and ask yourself if you're even doing the right thing.
Nowadays I feel like the pendulum is slowly swinging back the other way, and there is less drive to have test write automation (and maybe it is because where we are in the product cycle). I understand that high-level STEs "back in the day" were probably concerned about potential career progression at the time, which might be why they eliminated STEs (I have no idea of the real reasons - they've been lost to Outlook's mailbox size limits), but all-in-all I think MSFT is poorer because of it.
I worked on enterprise projects "at a similar company" with a few hundred "top notch" engineers. We had no more than 30% coverage, most of it from SDETs. No tests were being written by the devs before checking in for months, the test devs were understaffed, unqualified and thus behind. At one point someone made a checkin that prevented login from happening. Nobody noticed for a full work week, until that version made it into preproduction and someone finally attempted to login. Apparently hundreds of mils spent on the project can't buy you a team able to write above high-school level code.
I can see the usefulness of SDETs in doing system / end-to-end testing or testing of really obscure scenarios, but most of the test writing should belong to devs. I love the Rails approach to UT, functional and integration test split. The first time you try BDD, especially if you're coming from an after-the-fact testing culture like the one above, you almost want to cry from joy. I agree that Cucumber might a bit of an overkill, but perhaps I don't get it. For a non-prototypey project you should absolutely add other types horizontal testing like performance, security..
While I agree with this description of the value of information, I disagree with your interpretation of it in this context. Consider the following, rather extreme example: nuclear power stations are equipped with countless diagnostic systems with several levels of fallback. In a well-built and well-operated nuclear power station these systems will never signal failure during normal operation. This clearly doesn't mean that the output of these systems carries little value. Surely, a test that is always passing doesn't necessarily has no benefit, you also have to consider what it would mean if it suddenly stopped passing.
A thought that folks reading this post might have an opinion on:
"Libraries should be mostly unit tested. Applications should be mostly (and lightly) integration tested. Naturally, some parts of a complex app will behave like a library..."
Strongly agree, but also I think that the interesting thing is why this might be so and what it tells us about application architecture.
The basic thing about tests is that once you have them passing, they represent statements about constraints on the program. In other words, they express your opinion of things that should not change. Unit tests are a bet that certain aspects of the implementation will not change. Integration tests are a bet that certain aspects of the externally visible behaviour will not change.
Libraries tend to be smaller and with well-defined responsibility. Applications tend to be bigger and have many responsibilities. In general, I think it’s true that the requirements for libraries change less often than the requirements for applications. I think this leads us to expect that applications may need to be rewired “under the hood” and have their implementations changed as responsibilities are added, removed, or changed.
This, I believe, leads us to want to unit test applications less, because a unit test expresses implementation semantics, and we expect application implementations to change. No what about integration tests? Well, if we’re unit testing less in the application, we need to make up for it by integration testing more, otherwise where do we get our confidence?
Now if we throw the words “library” and “application” away, this suggests to me that those parts of the code that are small and tight and with a single, clear responsibility should be unit tested, while those parts that involve a lot of what the AOP people call ‘scattering and tangling,’ should be integration tested.
Unit testing didn't fully make sense to me until I played around with quickcheck (and eventually theorem proving in Coq). Unit tests vanish nicely to theorems and (empirical) proofs if your code expresses a succinct API. This is one end of the testing continuum.
I use this sort of stuff extensively when doing mathematical computing and statistics because there's usually a clear mathematical boundary. Once you're inside it, it's relatively easy to write down global properties (theorems) of your code's API.
The moment you cross that boundary your testing apparatuses have to get more complex and your tested properties less well-defined. Unit tests are hazier than quickcheck properties and integration tests hazier still.
This continuum seems to be precisely the same as the code reuse continuum. Highly abstracted, testable code with a shapely API is a highly reusable library whether you like it or not. Maybe it's being called by other code, maybe it's being called by your UI, maybe it's being called by the user themselves.
I view unit tests as a kind of proof by counter-example. You have a logical structure your program embodies. This structure is very hard to specify mathematically and prove deductively so you come up with key statements that must at least evaluate to true for this structure/theory. The tests are a bunch of counter-examples that should be false (test passed).
If a random testing framework is available in your language they really should be integrated as they are able to come up with some pathological examples.
For Haskell you can do even better than random testing: There's Lazy SmallCheck for exhaustive testing. (For some values of `better' and `exhaustive'.)
After playing with a language with a great type system (Haskell, not Java/C++), I've become more wary and borderline uncertain about my Ruby code.
My integration tests serve two purposes:
1) Runs the code in an repeatable and isolated environment to verify I didn't do anything stupid like misname a variable, or treat something nil as an object.
2) Validate my unique application logic.
I don't think #2 goes away with other languages, but #1 changes dramatically. I've written Ruby for years, so this isn't an "outsider looking in" opinion.
Whether conscious or not, I believe move Rubyists gravitate to testing because of both #1 and #2.
#2 can be satisfied with unit tests, but in my experience, the suite becomes a lot more flexible when validating it in terms of integration tests.
Note that, at least in python (which I assume is pretty similar to Ruby in ecosystem), you can get a lot of mileage for #1 by doing static analysis, for example Pylint.
On this, I tend to think that many (most?) libraries, especially where they don't have extreme performance requirements, should have run-time testing. I.e., pre- and post-condition checks, or contracts.
Agree, but I'm concerned all these rules of thumb are doing nothing to help the authors of bad test suites understand why they're bad and how to fix them.
In my opinion the point of building out a test suite isn't to reach 100% code coverage, it's to allow developers working on the application the ability to refactor quickly and aggressively during the development process while remaining confident that it will still perform correctly when they're ready to release. Poorly written tests (or over-testing) will slow down refactoring and hinder development. But, in my experience, not having the right tests can mean firing up your application and manually testing after each small code change to prevent regression. Understanding the trade-off is key to helping developers answer the ubiquitous question: "do I need to test this"?
This role for unit tests is described effectively in Chapter 1 of Martin Fowler's awesome Refactoring book (or Refactoring, Ruby Edition, if that suites you. That's the version that's on my desk right now.) This use case is also highlighted in Eric Evans' book Domain Driven Design, where he highlights one of my favourite application development techniques: "refactoring to deeper insight." Both books are among my favourites.
Adding features to an application isn't just about appending files and lines of code, it's also about changing existing code to allow the new features to fit comfortably into the application's domain model. If you can accomplish that goal with "light integration tests" go for it. That'll mean much less overhead during refactoring and will do a good job of enforcing correctness. If your application contains a robust domain model (which it very well might) you may find unit tests useful for maintaining model integrity during development. This is probably what's meant by "Naturally, some parts of a complex app will behave like a library". When you change the domain model you'll have to change tests for all the affected classes and that's a good thing. It explicitly highlights how your change affected the model, which you should certainly understand before committing your changes.
I will admit this distinction is hard to communicate to first-time TDDers. However, I've found that after having to eat their own dog food for a few iterations (adding features to your own code) smart developers quickly find the sweet spot. The less code you write, the less you have to maintain. But too few test or poorly written tests can slow you down just as much.
That sounds good, with the caveat that most applications should have their business logic roughly organized into libraries. After those libraries get their unit tests, integration testing the application<->library barrier makes a lot of sense.
100% agreed. Rails people have started to complain about difficult/slow tests because Rails (a framework, not a library) makes your life hard. From my own experience, you don't need all the Rails magic to get started on a project, and once you do, it just happens to be a completely different magic that you need.
Hence I try to divide my projects into lots of application agnostic code (the libraries that need to be unit tested), and little application specific code (the glue code that needs to be integration tested).
We recently finished approximately one metric fuckton of unit tests for the next release of some of our libraries, and I tend to agree very strongly that a library must be well-tested.
A problem that we ran into, though, is exactly what to test. The biggest basic distinction we ran into was testing to verify interface integrity (i.e., does this do what it says in the header documentation, pursuant to declared constraints and so on) as opposed to implementation integrity (does my ringbuffer properly move around elements, does my array resize move my elements, etc.).
The former is very useful to ensure that the library is good for users, the latter of course is much better for testing that changes to the containers don't do something stupid.
Disagree. Your domain model is within the scope of your application. The code representing your domain model should be unit tested. If your application is doing anything interesting then the code for your domain model is the most important thing to test in your application.
Agree, though following this would lead me to extract all kinds of implicit library code from my applications. Likely not worth the effort without a genuine reuse story.
Thank God someone with a bullhorn finally said this. I was beginning to think I was alone in my hatred of Cucumber. (And my love of Test:Unit/Minitest.)
When I saw the whole "looks like English!" thing I skipped it without a second thought. That's just a bad idea because you're going to piss people off when it turns out that it's really nothing at all like English, and you're also going to irritate the people who wonder why Ruby constructs don't work either.
I really wish the Rails community would get over its crush on both RSpec and Cucumber. The whole point of Rails in the first place was to cast off all the ceremony and drudgery of web development in Java yet we've replaced it with these unnecessarily cryptic and complex tools just for the sake of some syntactic cuteness.
I'm so tired of reading Rails job postings hammering down RSpec and usually also Cucumber experience as a prerequisite.
I'm in the same boat as you, but I don't tend to voice my opinions on Test::Unit because there is such a strong opinion for RSpec and Cucumber. IMHO, I tend to like my tools to be tried and true, and not do any fancy magic.
Learned my lesson the hard way(s) with Rspec and Cucumber a while ago - total waste of my time.
There's a group of devs that have fooled themselves into thinking anyone outside their group understands how they are testing. I've seen this firsthand. DHH has always been right, imo, in this regard; test what you think is important, use simple tools.
TestUnit still serves me well and it perfectly fits my needs of do more with less.
I used Test::Unit for a long time, but I found it to be a poor tool when doing integration tests. RSpec lets me do both within the same DSL quite easily. That's probably why a lot of people flock to RSpec. I still have old Test::Unit code sitting in my RSpec suite that I haven't moved over to RSpec's DSL.
That being said, to a developer writing tests and doing TDD, Cucumber is a speed bump that doesn't need to be there. RSpec is great without lumping more crap on top of it.
I largely agree - there is a certain testing dogma that goes into testing that this article dispels nicely. Of course, it comes with its own dogma, though I guess that's a bit tongue in cheek considering the author says: "let me firebomb the debate with the following list of nuance-less opinions".
So let me add some nuances:
1) DO aim high though, just recognize that the work in getting there is probably better spent elsewhere in your app.
3) BUT ignore this advice if you don't write tests yet. When you learn to test, or start working on a new feature that you may not know how to test, it will take you as long to test it as to code it. From there on though, test cost of testing is pretty cheap, so the 1/2 or 1/3 ratios start to make sense.
4) Do test that you are correctly using features and libraries (yes, standard activerecord stuff is probably going overboard).
5) But dont forget that many bugs occur at the boundaries of functional units.
6) Do what works for you, and what makes sense for you code base and business priorities. I don't love cucumber myself, but when others swear by it I can see why they like it.
Kent Beck's quote at the end is lovely. The first and only book on TDD I read was Beck's, and it's good to know that he's not actually as dogmatic as the book makes you think.
Brilliant article. Testing for testings sake is wrong. Testing for 100% coverage sake is wrong. Write just enough tests at the level where it catches most of your regressions. Drill down into unit tests for complex logic, because you can test that more extensively and much faster than an integration test. Then leave a case or two for an integration test to make sure things are hooked up right.
Don't be afraid to unit test little complex things here and there. Are you writing a function to parse a string in a certain way? Pick that function, elevate its visibility if need be, write a simple unit test to make sure you didn't make a stupid off-by-one mistake. Does the rest of the class otherwise not loan itself to unit testing? That's OK, move on.
We've learned that each line of code is a liability, even if it's a configuration file, which is why we have come to appreciate things like DRY, convention over configuration, less verbose languages, less verbose APIs. Likewise, each line of test code is a liability, so each line better justify itself.
Write just enough tests at the level where it catches most of your regressions...make sure you didn't make a stupid off-by-one mistake.
One thing that I've seen in inexperienced coders (including myself in the past) is that they tend to think of every bug as a fluke one-off mistake in an otherwise mostly flawless and awesome record. New coders tend to want to just fix a bug, then pretend it didn't happen.
This is exactly the wrong attitude to take. As a discipline, we programmers should be studying our mistakes and taking steps to prevent them in the future. As a craftsperson striving to improve, each of us should be studying our own mistakes and taking steps to prevent them in the future.
I think you might be misreading what I wrote. While your points are correct, I was specifically referring to "off-by-one mistake", which is a common "silly" error (since many indices are zero-based, it's often easy to request one too many elements, or chop off the first item).
Also the way you quoted me above, "make sure you didn't make a stupid off-by-one mistake" looks like it's talking about writing just enough tests. However, in context, I'm actually referring to writing unit tests for small items where you might make a stupid off-by-one mistake.
So I was never referring to "one-off mistakes", as in mistakes that are flukes.
Your points are all good otherwise! Never rest on your laurels and always think about what you can do to catch your mistakes.
It seems so obvious to me now, but I just realised that is something that I do; I focusing on understanding what is causing a bug and how to fix it, but not why the bug occurred in the first place and how to prevent myself making similar mistakes in the future!
I'm no Rails dev, so I'm curious about this one point from DHH:
> 6. Don't use Cucumber unless you live in the magic kingdom of non-programmers-writing-tests (and send me a bottle of fairy dust if you're there!)
I mostly do C#, and teams I've recently been on have found SpecFlow tests to be an excellent time saver in communicating requirements and acceptance test criteria with customers. Has Cucumber not been designed for the same purpose?
I might guess that David included the point because a product business such as 37signals has no non-programming stakeholders to communicate about requirements and acceptance criteria with.
Using BDD for having non-programmers write tests sounds far-fetched to me indeed. It's excellent to have them able to read and understand the tests, though. Any opinions? Is BDD as dead horse, or is DHH a little narrow minded here?
My own observation is that some ruby/rails developers get so enthralled with testing that it becomes an obsession that eclipses the product that they are building. The result (strong opinion coming...) is libraries like rspec and cucumber. They're complex and burdensome, and tend to attempt to mimic english, but often do so poorly and are totally unsuitable for non-coders. You spend a lot of time learning "the way" to do things and wind up with code that is unintelligible to less experienced developers.
I use test::unit and minitest and it gets the job done without having to keep up with the latest trends. It's simple and can be written in a way that correlates well to actual English requirements. It takes all of an hour to digest minitest from zero.
That said, Cucumber and rspec are very popular, so I may be the weird one.
For a long time, I had a deep hatred for RSpec because we used it on a project before the API stabilized (and before it was easy to maintain an environment where all developers had the same gems).
We got stuck on some particular revision in the RSpec Subversion repository. The choice was re-write all the specs, or stick with that ancient version. We re-wrote all the specs -- to test/unit.
Several years later, and I have never picked up RSpec for my own use. However, I am working on another project that chose RSpec and it is working out pretty well. I have turned on render_views so I don't have to test those separately and am only using mocking for external services.
Cucumber, on the other hand, I do not understand at all. Why write tests in English when you have Ruby?
There's a few solid reasons I've heard for using Cucumber, although in my life I've not found a need for it yet.
1) Although it's a Ruby tool, it works with a ton of languages. Someone can write code based off Cuke tests in Ruby or .NET or whatever without much trouble.
2) it makes web workflow testing cake
3) it keeps people strongly out of the "implementation" zone when they are thinking about how a program should be properly executed
4) it works with many spoken languages so if you're collaborating with an international team it could be useful there.
5) it has a whole bunch of report formats built in
If you don't find any of those features incredibly useful, I'm not sure you're going to ever see a need for it. I've played with it but for me it seems more hassle than anything. I do LIKE it but that's not enough to justify the time spent messing with it.
Personally, I don't like Cucumber because it attempts to bridge the worlds of development and business but ends up serving neither very well. Its syntax gives it the verbosity of English while removing none of the brittleness of the underlying code. A non-programmer editing tests would need to learn Ruby to deal with the leaks in the abstraction, while a coder is better off describing behavior directly in rspec or Capybara rather than wrapping it in a "story" that should have been a two-line comment.
As for non-programmers reading tests, they could read the comments or README.md instead. Why force the documentation to test, and the test to document, if it decreases the efficiency of both?
Genuine interest: how does one use "comments and README.md" for agreeing a set of requirements with stakeholders who are not programmers, yet domain experts?
Note, I'm not talking about your average "look, a fancy dropdown control on Github" type of project. I rather mean the "look, it's the entire software stack of a TV" or the "hey, this software computes taxes for all households in a country" type of thing.
The stakeholders in such domains typically know quite well what they want, in terms of their domain. The software people know how to turn that into working, maintainable and usable software. Getting the automated tests to also be the acceptance test spec saves a lot of double work, and, most importantly, a lot of human error (acceptance spec not matching acceptance test done, etc).
Using BDD for having non-programmers write tests sounds far-fetched to me indeed. It's excellent to have them able to read and understand the tests, though. Any opinions? Is BDD as dead horse, or is DHH a little narrow minded here?
I thought what you thought - that this is DHH's working context speaking. For me Cuke-ish tests are about alignment and communication - not about having non-developers write tests.
(of course Cucumber isn't the only route for doing that)
You’re probably doing it wrong if testing is taking more than 1/3 of your time. You’re definitely doing it wrong if it’s taking up more than half.
"No generalization is wholly true - not even this one" said Oliver Wendell Holmes.
What proportion of your time do you spend writing tests for a one-off bash script to fix some filenames? What proportion of your time do you spend if your writing the fly-by-wire code in a 777? You should spend "enough" time writing tests - where "enough" is extremely context dependent.
I also worry about this sort of advice because the reason many spend too long testing is that they're really bad a testing. I fear this group will use the quote as an excuse to do less testing, rather than get better at testing.
Also, to be honest, I'd be hard put to tell anybody how much time I spend writing tests vs. writing code. I've been practicing TDD for about ten years now - and I just don't think about "testing" vs "coding". I'm developing - which involves writing tests and writing code. Trying to separate them out and time them makes about as much sense to me as worrying about whether I type more with my left or right hand.
Also, to be honest, I'd be hard put to tell anybody how much time I spend writing tests vs. writing code.
That's a good point. Test writing versus code writing is not black and white. Spending time writing tests is also time spent thinking about what's going to be coded. Writing tests should make the coding process go quicker such that there exists a lot of overlap in test writing and coding.
I completely agree with David's assertion that there is too little focus paid to how to test properly or what over-testing looks like.
Here's a question to the HN community... For a library you write yourself to, say, access a Web service, how do you go about testing it? (For example, if you want to write tests against an Akismet gem)
I tend to write both unit tests and integration tests for it. My unit tests mock out the HTTP calls made by the library and only test that the library is able to handle both good and bad inputs.
My integration tests allow the library to speak directly to the web service in question, to test that the correct connection is being made and the service is providing the correct data back to the library.
I don't get the "don't aim for 100%" point. What are you aiming for then? 50%? What happens when you reach 50% and remove some code - do you remove enough tests to match the ratio?
We may just phrase the same idea in 2 different ways, but I'd go with "don't force yourself to do 100% coverage if it's not that relevant" / "don't add a test for a simple getter if you have more important things to do". If you can do 100% and have time for it - you should definitely do that. For example in case someone rewrites your getter, but it's not that simple anymore.
For me the point of that one is that test coverage isn't the goal - good code is.
I've seen folk disappearing down a rabbit hole focusing on getting that last 2% of branch coverage using some baroque mock object monkey patched into the system. Their focus was on test coverage. What people who focus on test coverage get is an evil complex test suite that's very brittle in the face of change.
Other, smarter, folk go "damn - I can't test that easily - this code sucks", factor out unrelated functionality into appropriate classes, add some code that makes some duplication between branches obvious, factor out the duplication and end up with something that's better code, with simpler tests - and better test coverage too. Their focus is on the code - not the test coverage.
Aiming for any specific ratio is missing the point. I think the only rule worth following once you're testing at all is never make it worse unless you understand why you are doing so.
As a counter-argument to the "don't aim for 100%" thing, I would say that code coverage is more important in interpreted (and loosely typed) languages. Particularly with preventing regressions. For example, if you change the arity of a function, it's nice to be able to run the tests and see all the calls to it that break.
This reminds me of the old joke about the man who went to his doctor and said: "Sir, those deposiwhatever pills you gave me were no good. I might just as well have stuck them up my ass".
Using a tool (or a medicine) to do something else than what it was designed for rarely has a positive effect. Cucumber was not designed to be a testing tool. Most people don't realise this, and end up with a verbose, unmaintainable mess. They blame the mess on the tool without realising that they used the tool wrong.
I don't use Cucumber to test my code. I use it to discover what code I need to write. This is a subtle, but important difference. This approach can work whether there are non-technical people involved or not.
I typically start out with a single Cucumber scenario that describes from 10.000 feet how I want a certain feature to work - without getting bogged down in details.
This allows me to reason and think about the domain in a way that helps me write the simplest possible code. What usually happens (to me at least) is that I end up with a simple design that reflects the domain.
Last week I wrote a small application like this. It is 1000 lines of Java code. I have 6 Cucumber scenarios - a total of 100 lines of Cucumber "code" (Gherkin) and about 200 lines of Step Definitions code (the glue code that sits between the Cucumber scenarios and the app).
I could have used JUnit instead - or I could have just written the code without any tests at all. For me, this would have made it harder to start coding. I would have started off with a much more muddy picture about how the app needs to behave and how it should be designed internally. I would have spent more time experimenting.
If you use Cucumber as a starting point to discover the code you need to write - and keep your scenarios few and high level - then you're more likely to reap benefits instead of pain.
And never fall for the temptation to use Cucumber to test details. That's what unit testing tools are for.
Hey, while we're slaughtering sacred cows, lets kill mocking, endotesting, expectation based testing, and the whole nine yards. It's a horrible practice that causes you to write too many tests, too many assertions, and results in tests that stay green even if you delete entire files from your codebase.
I have found that heavy use of mocking tools couples tests tightly to the design, resulting in great difficulty when I need to redactor. To me, the ideal is that I can change my design as I see fit without difficulty. My tests specify what my application does - how it behaves, not how it is structured. In my experience, this entails minimizing the number of points where tests touch production code, i.e. keeping the test suite DRY.
I do use test doubles, including mocks, where things become difficult or slow to test (generally application boundaries). But keep in mind that every line of mocking code costs more than an equivalent line of the alternative.
Yeah I'm not referring to mocking external services, which is both smart and useful. I'm referring to the practice of endotesting whereby you mock your own objects and interfaces as a means to test interactions between them.
@dhh was talking about this on twitter before he made this post. And I think @dchelimsky has a very fair point that @dhh ignores: testing is all about effort to eliminate some amount of risk.
Test x but don't test y can't be universal. Risks are (often radically) different in every app/team.
@dhh and 37Signals have a different perspective on acceptable risk for their own product (which they maintain every day) vs developers writing software for someone else/handing it off to someone else. 1 dev to 1 project vs 1 dev to x projects. As a consultant, my acceptable risk level is very different than an entrepreneur trying to push out a MVP and I think testing will reflect that.
I think dhh tried to address that with his comment yes, yes, if you were working on an airport control system for launching rockets to Mars and the rockets would hit the White House if they weren’t scheduled with a name, you can test it—but you aren’t, so forget it
That's a hyperbolic comparison. Yes, focusing on outliers makes all the other data points look the same. My point is that in the context of each team/app, those differences in acceptable risk matter.
Code-to-test ratios above 1:2 is a smell, above 1:3 is a stink
I think this depends on the domain. Some code ends up with a metric shed load of important edge cases because it's modelling something with a metric shed load of important edge cases - not just because your code sucks.
For example one project I worked on involved a lot of code to manage building estimates. It involved a stack of special cases related to building codes, different environments, the heuristics that the human estimators used, etc.
There wasn't any sane way to remove the special cases - the domain caused them. There wasn't a way to sensibly avoid writing tests - since we didn't really want estimates for the amount of drywall in a multi-million pound skyscraper to be wrong :-)
Testing isn't just about preventing bugs, it also encourages better design, and good design makes extending and enhancing your application much easier and faster.
...in general, of course, there's always exceptions, don't be dogmatic.
I think this aspect of unit testing is largely overrated. True, it does stop people from writing something ridiculous, like a 1000-line method that handles a dozen different things. But it also encourages developers to go overboard in the other direction: creating endless, needless layers of abstraction, chopping logic into bits that are to small to represent anything in reality and doing dependency injection when it's not needed. It's important to keep in mind that ease of unit testing doesn't automatically ensure ease of use of your library in real code.
In short, it's better to think of unit testing as bug-prevention mechanism. Thinking of it as an ultimate design mechanism doesn't end up well.
Discussions like this arise mostly because developers today think less about whether a feature should be built and more about how it should be architected and tested.
For decades, we've been conditioned to think that by the time a feature is written down on paper, someone else has crossed all the Ts and dotted all the Is to determine that the feature has value and should be built.
It's far easier to focus on low value testing issues than attempt the harder work of convincing the customer/product owner that certain features should not be built.
If you're looking to manage code smell, check out Code Climate - http://codeclimate.com. We use it as an awesome gut check for your code coverage as you write more code. It helps you see at a macro level: "Are you writing better or worse code over time?". Pretty cool stuff.
Code-to-test ratios above 1:2 is a smell, above 1:3 is a stink.
I think this is stated backwards. A code-to-test ratio of 1:3 is lower than 1:2, and based on the wording (smell, stink), it sounds like David is saying it's higher.
I am glad DHH has put up this piece. This is one of my favorite topics to be contrarian about. I am a strong believer in the importance of developer testing to ensure code quality, but some people take it way too far. I gave a talk about this at RubyFringe: http://railspikes.com/2008/7/11/testing-is-overrated
I also am utterly opposed to the Cucumber-style programming-in-English-via-regexp testing approach. Unless someone who doesn't know how to code is writing those step files, why subject yourself to that?!
My first unit testing experience (Django), it wasn't 1:2 or 1:3, it was more like 2:1. As far as time spent, I don't even know if I could separate the two, I spent so much time in tests and production at the same time. I actually want to scrap that code and restart it, it's not worth changing all those damn tests any time something else changes. I tend to go to extremes, I'm actually thinking of switching to Yesod next to see if I can mitigate a lot of the reasons for testing.
First, I've found tests to be an excellent way to debug, much better than trying to manually get the app into the state I want and then stepping into the debugger.
Second, having a good coverage lets you refactor with a lot more confidence. I often open some file to fix a bug/add a feature and notice some piece of code that is horrible. I can comfortably fix that code too and if the tests pass (including the one that failed before fixing the initial bug/adding the feature) I'm done.
Great points. What not to test is just as important as what to test. If you're interested in learning how to start TDD, Typemock is hosting a free webinar next week introducing TDD: http://j.mp/IVmQNi
Yes, he said "fuck". Let's get past that, because it's a super boring discussion to have on HN.
Also: pointing out specific 37signals or Rails bugs as evidence that this view is flawed? Also a super super boring way to argue. Argue with the idea, not the proponent of the idea.
What a bunch of hogwash; 1000 lines of code to to test validates_presence_of? What testing framework is he using? x86 assembler?
The only part I agree about is the view testing and partially cucumber.
Then again I dont think 37s has a large amount of business logic to test (compared to some of the other commenters in here) and their testing is mostly front end, and testing front end ajax-y stuff with cucumber, does indeed suck.
It is not fun to pull out the power tools on a huge piece of software with 50 thousand unit tests and see that 600 of them fail because you changed an important part if the code.
It means the unit tests will have to be deleted, and or updated. If each takes 10 Minutes that is 6000 Minutes.
On the other hand, seeing one unit test fail in another module you didn't expect to fail can save you a week of work, so it's all in how brittle and organized the tests are.
In my experience, it means that you have to treat your tests with nearly as much software engineering respect and prowess as your real code. You have to factor things out and try to avoid DRY (particularly tough sometimes in testing) as much as possible.
If there is one "concept" that you're asserting in your test suite, you want that concept to only be repeated once or as close to once as you can get. Often times, people copy paste sets of assertions that have mixed concepts. Then, when a requirement changes, hundreds of tests end up having to be updated.
This is why many large companies, that contract out test automation development or train manual QA people with enough programming skills to write tests, end up with brittle test suites. One look at those test suites by an experienced developer and it's no wonder: there's conceptual/semantic duplication everywhere.
The phrase “Don’t test standard Active Record association” jumped out at me, since I’ve discovered a major bug in the most basic function of Active Record associations in 3.0.x.
One line in this article may have given me an epiphany:
>" If I don’t typically make a kind of mistake (like setting the wrong variables in a constructor), I don’t test for it."
My primary objection to TDD is that it doesn't seem to work, because when I've tried it, the tests caught no bugs at all. I believe that tests can be important for APIs to prevent regressions when you have to work with external components, but its frustrating to put time into tests and then never have the tests fail.
Its not that I'm a perfect programmer, its the kind of bugs I make. The kind of bugs I make are caught by the compiler. This may be because, over the years of my career (many of which occurred long before the idea of "test first" was widely heard) I've trained myself in a style of programming where I can trust the compiler to catch my mistakes (most of which are typos, frankly.)
I don't know if others can do this, but for me, it came about by doing things like:
Old way:
if (variable == 1) then whatever
New way:
if (1 == variable) then whatever
Every time I mistype that as "(1 = variable) the compiler catches it because I can't redefine 1.
I quickly realized that my job was futile - many of the "good" tests had already been written, while in other places, "bad" tests were entrenched and the "test dev" had the job of manning the scanner to check for nail clippers, or upgrading the scanner to find as many nail clippers as possible.
Here's a useful rule on the subject that I picked up from an Artificial Intelligence course back in the day: The value of a piece of information is proportional to the chance that you will act on it times the benefit of acting on it. We all realize there is no benefit in testing if you ignore failures rather than acting to fix the bugs, but in much the same way that doing nothing when tests fail has no benefit, doing nothing when tests pass also has no benefit - so tests which always pass are just as useless as failing tests you ignore, as are tests which only turn up corner-case bugs that you would have been comfortable with shipping.
If you're doing the right amount of testing, there should be a good chance, whenever you kick off a test run, that your actions for the next hour will change depending on the results of the run. If you typically don't change your actions based on the information from the tests, then the effort spent to write tests gathering that information was wasted.