An apology to readers of Test-Driven iOS Development

jamieb · on Sept 4, 2012

"The second problem, and the one that in my opinion I’ve let down my readers the most by not catching, is that the table is completely false."

There is a big difference between "unsupported by hard data" and "completely false", especially when the author later admits to not being willing to pay to see the first paper cited in support.

It also seems, (from this post: http://lesswrong.com/lw/9sv/diseased_disciplines_the_strange...) that in fact the first line is supported by data, if only in aggregate. The author of that blog posts makes much of the phrase "this study didn't accurately record" while glossing over the next part of the sentence that states "so we will use average rates summarized from several other studies." It is common practice to aggregate data, and provided it is done properly, the aggregation can reduce error, not increase it.

The claim that "a bug introduced in Requirements, costs more to fix the later it is discovered" is claimed to be supported by data. Well done for demanding to see that data. But epic fail for creating controversy by claiming that the opposite is true.

Morendil · on Sept 4, 2012

Amusing exchange over at Reddit:

Redditor 1: "I bet not a single f$%k was given about that table among the readers of the book :)"

Redditor 2: "That table, or something like it, features strongly in every software engineering textbook that I've ever seen. The numbers differ, but the increasing order of magnitude differences are roughly the same. It's the essential justification for nearly everything in software engineering. "A single f$%k" doesn't begin to describe the importance of that table."

And yes, "the essential justification for nearly everything in software engineering" is broadly correct. Anytime you see an argument that "it's important to get the requirements right" for instance, it's based on studies that purport to show these numbers.

ajross · on Sept 4, 2012

Indeed, with the semantic cavet that "software engineering" is used in the sense of "software development management process nattering" and not (always) the actual engineering of working software.

The basic falseness of that chart is pretty well known among the hacker set. Or maybe it's a failure of interpretation among the natterers: the "high cost" of mistakes in the past is made mostly of work already spent. The immediate cost of correcting a past mistake is often (of course not always!) vastly lower than people think. And the farther you are from the actual code involved (i.e. if your role is "architect"), the higher your overestimate of that value is going to be.

So this is a feedback loop: the "high level" people at the start of the process have a built-in bias against iterative solutions (which they perceive as "expensive") and thus a built-in predilection to overdesign as they try to get everything right the first time.

mcguire · on Sept 5, 2012

I'm glad someone else sees the difference between the engineering of working software and the "process nattering" that seems to make up official "software engineering" as far as I can tell.

You do miss a couple of steps in the feedback loop:

1. "The 'high level' people at the start of the process have a built-in bias against iterative solutions."

2. "A built-in predilection to overdesign as they try to get everything right the first time."

3. The wild-ass guesses that make up the overdesign prove to be completely wrong.

4. The project fails miserably or succeeds, miserably.

5. The "high level" people perceive software as "very expensive", reinforcing their biases.

ahoge · on Sept 5, 2012

>"it's important to get the requirements right" for instance

Well, we've all been there. If you don't get them right, it will really hurt. A lot.

Also, moving things in a wireframe or diagram around is basically free whereas moving things around in a "finished" product can be fairly time consuming.

So, there is at least some truth to it.

grayprog · on Sept 4, 2012

I know Graham. And he's both a great guy and a serious developer. I've also read his book and though I've not yet finished it, I've seen this table, as it's in the beginning.

Having read this blog post, I now respect him even more. Both for the intellectual honesty and for his efforts to try and reconcile the data in the table, with all its consequences.

I wish more people (including me) were this dedicated and willing to admit mistakes.

Maybe that's what makes him a specialist in software security, of all things.

Shish2k · on Sept 4, 2012

> I therefore no longer consider Table 1-1 in Test-Driven iOS Development to be representative of defect fixing in software engineering

Surely this makes it more representative, in a literal sense -- if he'd checked his data at the start, he wouldn't have to go back to check it + issue an apology + reprint the books now :P

marshray · on Sept 5, 2012

The general idea ("bugs found later tend to cost more to correct") is uncontroversial.

It's also accepted that this is a very difficult thing to study. There are so many subjective factors involved that's it's really hard to quantify the data or the results. But it seems to me that if Software Engineering as a wants to make progress methodically, it needs to throw out the old unsubstantiated assumptions in order to make room for new conclusions with a solid basis.

robomartin · on Sept 4, 2012

The time and cost to fix bugs can have a huge variance which is sometimes dependent on the nature and field of the software being written. There are bugs that take seconds to fix and others that take months.

I have personally experienced hunting down a bug for six months nearly full time (10 to 12 hours per day) until it was finally found. This was a real-time hardware system and the code was that of an FPGA. The culprit was a coefficient in one of many polyphase finite impulse response filters. The calculations used to generate this coefficient were done in a huge Excel spreadsheet. At one point, in the hundreds of equations in the spreadsheet, the author had used the ROUND() function when the ROUNDUP() function should have been used. This was enough cause a buffer issue in the FIFO feeding the polyphase filter. These are tricky problems to track down when you are literally looking for events that take somewhere in the order of single-digit nanoseconds to occur.

On the other hand, there are those bugs where you know exactly what is going on and where in the code it is happening the instant you see the behavior. We've all experienced that during the course of software development.

Fixing bugs for a dating website has vastly different requirements than, say, fixing bugs in a flight control system.

One argument is for more up-front planning in order to avoid some bugs. At one point this can quickly become counterproductive. Sometimes it's better to just start writing code and fix issues as they come up.

Now, if we are talking about fixing bugs after the fact, that's a different matter. One example here might be if you inherit the code to a complex website or an engine control system and, without any familiarity with the code, are required to fix bugs. This can easily take cubic time as it requires to learn the code-base (and sometimes the subject matter) while also trying to hunt down bugs.

This is why I tend to take such tables or studies with great skepticism. I haven't really paid much attention to these studies, but I remember looking at one or two of them and thinking that they tended to focus on narrow cases such as fixing bugs in-house, with a stable programmer team and great familiarity with the code base.

lttlrck · on Sept 4, 2012

Skepticism is deserved. It's a very fine balance, and bearing in mind not much software exist that is entirely bug-free.

What is the cost of a bug that is never discovered? Is it negative? What is the cost of fixing a bug that would never have caused a problem (at each stage of development)?

It's way more complex than bugs == bad

thenonsequitur · on Sept 12, 2012

Agreed. Indeed, your argument naturally follows from a "bug fixes are features" type mindset.

thebooktocome · on Sept 4, 2012

Previously on LessWrong: http://lesswrong.com/lw/9sv/diseased_disciplines_the_strange...

cousin_it · on Sept 4, 2012

Note that the author of that post is Laurent Bossavit, the author of the "Leprechauns" book that was referenced in the OP.

diego_moita · on Sept 4, 2012

Call me cynical but I am getting very skeptical against a lot of well established "truths" in Software Engineering: the cone of uncertainty, the orders of magnitude difference in programmers productivity, the efficiency of TDD, ...

Most of these well established claims simply don't have enough empirical data to sustain them. They're all bloggers' hand waving and empty claims.

Morendil · on Sept 4, 2012

"One of these isn't like the others..."

Things like the Cone, the rising-cost-of-defects or the 10x claim have been kicking around for decades.

The evidence for or against TDD is, admittedly, inconclusive, but it's more recent and of a better academic caliber. There have been a lot of studies. Most of these studies aren't any good - but at least someone is trying.

There's a deeper question, which is "granted that all the empirical evidence we have so far for claims in software engineering isn't all that good, how can we get good empirical evidence?"

I suspect that the answer is going to involve changing the very questions we ask. "Does TDD work" is too fuzzy and ill-defined, and there's no way you can test that in a blinded random experiment. People's biases about TDD (subjects' or experimenters') are going to contaminate the evidence.

Instead, we need to ask questions that aren't susceptible to this kind of bias and contamination. For instance, we might want to unobtrusively study actual programmers working on actual projects, and record what causes them to write defects.

diego_moita · on Sept 4, 2012

I fear the problems on this empirical approach are deeper.

The main problem is that software metrics are imprecise and non-objective. Lines of code, functional points, code coverage, counting code paths, ... we can't have a metric that can be accepted by everyone, all of them have big flaws. And the metrics are the basis for any reliable analysis, if we can't trust them we can't trust anything.

The second main problem is that it is very hard to isolate things under examination. How can we analyse TDD without taking into account the developer's grasp of good design (coupling and cohesion), Dependency Injection and Inversion of Control, refactoring techniques and tools, etc?

Software Engineering is a lot harder because it is much more akin to the fuzzy social studies (e.g.: economics, sociology, management) than to hard sciences (e.g.: computing science).

davidcuddeback · on Sept 4, 2012

To be fair, the metrics you mentioned are objective. It's just debatable how relevant they are. Other metrics that are used are measures of coupling, cohesion, lines per method, methods per class, etc. There are probably more measures that are used. None of them are perfect, but they all help paint a picture that adds to our understanding.

To address your second point, it's actually not too difficult to isolate factors like TDD. The standard way is to have a control group and a test group. With a large enough of a sample size, you can determine statistical significance with standard tests.

Unfortunately, the test subjects are often university students, who are less experienced than professionals. The fact that data collected on students might not generalize to professionals is a threat to external validity, but should be made explicit in most papers. Most of the time, I think companies aren't very happy about having researchers use their engineers for experiments on the company's dime, but it does happen. So there are some papers out there reporting results with professionals.

EdiX · on Sept 4, 2012

> The evidence for or against TDD is, admittedly, inconclusive, but it's more recent and of a better academic caliber. There have been a lot of studies. Most of these studies aren't any good - but at least someone is trying.

Can you point me to some of those studies? Every time I look I only find the same 2 studies everyone quotes from (and aren't very good).

Morendil · on Sept 4, 2012

Sure, you can grab my Agile bib file here: https://github.com/Morendil/referentiel.institut-agile.fr/bl...

There are 48 papers tagged with "tdd".

itmag · on Sept 4, 2012

the orders of magnitude difference in programmers productivity

I am skeptical of this too. It makes more sense to have huge swings in ability, not productivity. Ie a poor programmer won't take 10x as long to code a given feature, he will just hit a ceiling of ability and not be able to do it at all.

JabavuAdams · on Sept 4, 2012

Consider the difference between knowing exactly which library to use to solve a particular problem vs. believing that you need to write new code to do it. That can easily account for a 10x productivity difference.

RHSeeger · on Sept 4, 2012

I'm inclined to believe it's less about the time required to code specific feature... and more about the time required to

1. Figure out what features should be implemented (ie, will implementing this feature shoot us in the foot later)

2. Figure out the correct implementation

3. Be able to handle future feature requests

Sure, 1 & 2 will vary by skill/experience. However, the skill/experience with which 1 & 2 are handled can severely impact 3, causing it to easily take 10x longer if it can be done at all. As you move onto 4 and down the line, this becomes more and more pronounced.

robomartin · on Sept 4, 2012

>Ie a poor programmer won't take 10x as long to code a given feature, he will just a ceiling of ability and not be able to do it at all.

10x longer is not a poor programmer, that's an incompetent programmer. If someone needs ten days to code something that can be done in one day something is very seriously wrong.

sirclueless · on Sept 4, 2012

Are you surprised by that? The 10x longer statement doesn't surprise me at all. If you ask someone, "Please implement a Java class that has the following methods and behaves like this" you might expect any competent programmer to finish within ~3x of each other.

But given some more nebulous task, where architectural decisions must be made and serious research and testing needs to be done, it's not surprising at all. For example, if you asked someone, "Please write me a library so that I can send and receive XMPP messages," I would expect a large number of otherwise competent programmers to make a significant number of false starts and poor decisions, and generally take much more time than the guy who has experience writing libraries and interpreting text protocols. For example, consider the case of Ron Jeffries and Peter Norvig writing a sudoku solver[1] (this example is a perennial favorite of mine in all sorts of discussions).

And I don't think anything is "very seriously wrong" with this situation. Different skillsets and competence levels result in drastically different results. I think this is true of any profession that is largely about creative problem-solving: some can do it efficiently, some cannot. Programming is just a unique case because there are so many people trying it and not being deterred due to poor performance because there is such a demand.

[1] http://ravimohan.blogspot.com/2007/04/learning-from-sudoku-s...

shawabawa3 · on Sept 4, 2012

> 10x longer is not a poor programmer, that's an incompetent programmer. If someone needs ten days to code something that can be done in one day something is very seriously wrong.

That's just not true, it's all relative. Linus Torvalds supposedly coded git to the point where it was self-hosted in 1 day. Even a very good programmer could take more than 10 days to do that, an average (but not incompetent) programmer could take months.

jkubicek · on Sept 4, 2012

Linus has guessed it took him about two weeks to get git to the point it was self-hosted.

http://www.spinics.net/lists/git/msg24132.html

It's still impressive, though.

Edit: Should have read further. It looks like Linus got git self-hosting in two days.

https://lkml.org/lkml/2005/4/8/9

vidarh · on Sept 4, 2012

I think it's a bit of both, and varies significantly with the domain.

Quick, how do you write an SMTP server, and what are the challenges of making it scale?

Most developers won't know how, to start with. That's fine - that's besides the issue. So they need to look it up.

Here starts the performance gap, even if you deal with people with the same lack of knowledge of the relevant RFC's.

In my experience, there's a vast difference in developers ability to read even relatively simple specs and ensure they develop something that follows it. I mentioned SMTP because it genuinely is a simple standard compared to many of the alternatives. But it has enough edge cases that you'll have a big gap out of the gate instantly between the people who have a problem mentally picturing what the spec is describing and those that can easily and systematically map it out.

Secondly in this case you'd start to see experience gaps. Even assuming most people won't have written an SMTP server, you will start seeing a gap between a group of developers that at least have in-depth knowledge of part of the domain or type of service. That will account for a very substantial difference.

In this case, understanding on how to write efficient, scalable network services makes the difference between the guy that will do horribly inefficient stuff like read()'ing a byte at a time to get a line from the client (I mention this because the MySQL C client libraries did that for years instead of the vastly more efficient solution of non-blocking larger reads into a temporary buffer to avoid the context-switches, so it's not like this is something that only rent-a-coder's with no experience will do).

Just the gap between those that understand the tradeoffs of context switches and threads vs. processes vs. multiplexing connections will account for a fairly substantial factor in many types of problems like this.

Then comes the thorny issue of queues. Most otherwise relatively competent developers will struggle to get this right in a way that is not either slow or full of race conditions. Most competent developers never have to deal with really optimizing disk-IO. Witness the wildly different approaches and performance in established mail servers to see that doing queueing well is hard, and those are the good ones.

That does not mean they won't be able to figure out how to do it well enough for typical use cases.

(I used this example, because I've written several SMTP servers, and managed teams working on mail software, so it's an area where I know the tradeoffs and problem points particularly well)

Then again, when writing your typical cookie-cutter web app, the difference probably won't be 10x because so much more of the time will be spent mediating stakeholder requests vs. solving hard problems.

olavk · on Sept 4, 2012

The origin of this 10x meme was (if I remember correctly) measuring productivity on a single task. It is not inconcievable that the difference between the worst and the best is 10x on a single task, especially for students. This doesn't tell us what the difference between the average and the worst/best, which would be more interesting. Also, it doesn't tell us if the best developer is consistenty 10x faster, or the worst performing developer just made a mistake with this particular task.

If you compare a developer solving his first task in an unfamiliar language/platform/framework, you will easily see this magnitude of difference compared to a developer with deep experience. But this difference will not stay consistent.

randomdata · on Sept 4, 2012

I don't know, there may be some truth to it. Based on the information I can gather, it seems Bellard's LTE implementation (linked yesterday) was completed over the course of about a year in his spare time. I don't know many programmers that can keep that kind of pace.

If it encompasses more than just the time of writing code, it becomes even more believable. e.g. A great programmer will take a day to implement the given feature and it will be relatively bug free. An average programmer will take a day to implement the given feature and then it will take another nine to work out the bugs introduced.

consultutah · on Sept 4, 2012

Hopefully this will end in a discussion in which the actual data underlying the table can be had and matched up.

Anecdotally, it seems to make sense that it is more expensive to find and fix a defect later in the process. If a dev finds a defect while implementing a feature, only his costs are involved. However, if a defect is found later in the our process, at least 2 qa analysts are involved: one to find the defect and another to confirm it. After that a project manager schedules out time and assigns the defect out to a developer, possibly not the one that introduced the defect gets the assignment to fix it. The developer fixes the defect, a build is made by the build person. The original tester retests the defect and marks the fix as being verified.

That seems complex, but there are possibly even more steps than that. Unit tests may need to be written, customer test cases may need to be updated.

I don't know if the costs end up being exponentially greater, but they would seem to be greater.

At any rate, it would be good to have someone independently validate that data.

philwelch · on Sept 4, 2012

It's not written in stone that you need 2 QA analysts, a project manager, a developer, and a build master to fix a bug. That only proves that bureaucracy is expensive, not that fixing bugs "late" in the process is expensive!

I'm not sure what exactly the second QA person adds to the process. As a developer, I need to be able to reproduce a bug myself in order to have any chance of fixing it, so there's your confirmation step right there.

Project manager? No, just have your QA person file the ticket as a bug (with the correct priority) and have your developers pull tickets off your bug tracker themselves. At worst, a slight email/verbal nudge from the usual boss should do.

Build person? No, have a continuous, automated build system. The job of the build person should be to maintain that system, not to run individual builds.

Now you're down to one QA person to file the ticket, one developer to fix the bug, and the same QA person again to close the ticket. Add in a few minutes of another developer's time to code review the first developer (you don't even do that despite having all that process?) and it's still cheaper.

Unit tests? Developers write their own unit tests. Have a failing unit test that reproduces the root cause of the bug before you fix the bug--that's a best practice anyway. For testing above and beyond that, it's not unreasonable to have SDETs maintain that stuff, though you already have the root cause of the bug captured in a unit test so your main concern should be whether any existing tests actually rely upon the broken behavior.

marshray · on Sept 5, 2012

It's not written in stone that you need 2 QA analysts, a project manager, a developer, and a build master to fix a bug.

Well - that depends. For plenty of large software engineering projects (think aerospace) it's a given that you'll have that organization many times over.

On the other hand, a web startup may produce a high-quality shipping product with just two or three devops. The number of "lines of code" may even end up being similar between the two deliverables.

The code that gets burned into an ASIC requires a completely different set of development considerations than a non-critical "make deploy_production" web service.

But how do you quantify all these confounding factors?

philwelch · on Sept 5, 2012

The hope is that adding all those people adds value somewhere, it doesn't just add cost.

marshray · on Sept 5, 2012

Right - the aerospace people aren't stupid - far from it. They simply have vastly different goals and constraints, and thus a different organization and a different development process.

neves · on Sept 5, 2012

And don't forget that you'd have other costs involved:

  o The cost to fix the corrupted data

  o Opportunity costs of a broken software (it is Black Friday and your site is down)

  o Image costs of the ability of your company/product

  o The costumer cost (specially if she is also from your company)

These costs are very difficult to measure. The chart is popular because it matches our expectation as developers.

sunraa · on Sept 4, 2012

Lots of props to the blog author for going through the hoops that I suppose we should all be going through. I've always placed the CC books up there in the pantheon of great software engineering books. This is a chink in the armor and I hope Mr McConnell takes the time to provide a response although I'm not holding my breath. Another object lesson in not accepting things at face value.

jakejake · on Sept 4, 2012

Based on my own subjective experience I've found the numbers regarding testing to be suspicious simply because "bugs" are so varied that any specific number would be subjective.

I'd always assumed that these numbers referred to an architectural type of bug where, once made, more code is built upon the bug and so a cascading effect occurs. The more code that relies on the bug, the worse it is to fix because you have to fix all of it's dependent code. In some cases you may need to repair data, or entirely refactor sections of an application.

But there's other bugs that are more "typo" level bugs where obviously the time to fix is exactly the same no matter what phase of development.

I had just always assumed those numbers were a worse-case average to encourage developers to better plan their architecture. In part because the context of code complete is more about planning and estimation.

Though there may not be support for the exact figures, I think the point of them is to not let things pile up and to try to put thought into your design especially at the architectural levels.

MattRogish · on Sept 4, 2012

This seems to have been my experience and interpretation as well. A bug is not a mistake in coding, but a mistake in selecting core architecture, 3rd party libraries, etc.

Making a mistake picking the wrong storage engine can cause major problems if you find it can't scale to your load after 12 months of development. That's a ton of code to change if you're switching from MySQL to Redis (for a crazy example).

Make a mistake in copy or layout, and that's usually much easier to change.

unreal37 · on Sept 4, 2012

It seems he wasn't able to see some of the sources. One cost money to see, one is out of print and not available...

It may be true and noble that since he isn't able to verify the data himself, he shouldn't have used it. But if you can't see the sources, you also can't state that the data is incorrect. One of those books he couldn't find might contain the data that directly corroborates this.

I suspect the cost of fixing defects has more to do with your internal process, and less to do with how much work it takes to find and fix the actual defect. I have a client I work for now for which I need about 10 days lead time to get code from development to production. So it might take 1 hour to fix a bug that is discovered, and 8 hours to do the paperwork and go through the formal process of moving code through testing, staging and finally to production.

aptwebapps · on Sept 4, 2012

I wonder if he tried asking McConnell just how he compiled the table. It does seem a bit odd to construct such a simple table from eight different sources.

mcguire · on Sept 4, 2012

...none of which seem to have the actual data to support the table.

smoyer · on Sept 4, 2012

I seem to remember first seeing a table like the one the author describes in "The Mythical Man Month" but my copy is currently at home. The data underlying that book was gathered from projects at IBM in the '60s and '70s and I don't really doubt that the underlying data was fairly represented.

The bigger question is whether improvements in processes and tools have obsoleted this data. TDD and automated regression testing would be one place where inter-phase defects could become less costly. On the other hand, projects that have uncorrected architectural errors can be completely functional and yet their maintenance costs never decrease.

If someone has a copy handy, please validate my memory otherwise I'll check when I get home this evening.

alttag · on Sept 4, 2012

I've just gone through it page-by-page, and didn't see a table like that. Using the 20th Anniversary Edition, I also looked through the summarized list of claims in each chapter, and the 20-year retrospective, and did not see the table or a section that might have been a textual version of the same data.

Perhaps the closest bit I found was in the chapter "Plan One to Throw Away": "The total cost of maintaining a widely used program is typically 40 percent or more of the cost of developing it" (p 121). Similarly, another section quotes Capers Jones, "Focus on quality, and productivity will follow" (p 217, emphasis original).

That, I think, is about as close as MMM gets to these claims.

smoyer · on Sept 5, 2012

I couldn't find it either but in addition to "Code Complete" this data is also shown in a chart in "The Software Project Survival Guide", also by Steve McConnell (p29).

The attribution is "Researchers have found" but Boehm and Papaccio are listed in the references.

zwdr · on Sept 4, 2012

I wish more authors would check their references like that. Not that I am interested in iOS-development, but still.

habitue · on Sept 4, 2012

It's great to see such intellectual honesty. It looks like the real perpetrator, however, is McConnell.

Maxious · on Sept 4, 2012

"The fact that no studies have produced findings that contradict the 10x claim provides even more confidence in the 10x claim. When I consider the number of studies that have been done, in aggregate I find the research to be not only suggestive, but conclusive—which is rare in software engineering research." Steve McConnell http://forums.construx.com/blogs/stevemcc/archive/2011/01/09... http://news.ycombinator.com/item?id=4117417

nateberkopec · on Sept 4, 2012

Yikes, that doesn't strike me as very intellectually sound.

"The fact that no studies have produced findings that contradict the existence of a teapot orbiting Jupiter..."

gruseom · on Sept 4, 2012

I think if you read the article you'll find that that's unfair. His argument is that there are many studies that support it and none that contradict it, and that such consistency is unusual in the literature. That may be wrong but it's not trivially stupid as you make it sound.

Morendil · on Sept 4, 2012

The linked post isn't about the 10x claim, it's about the unrelated "defect cost increase" claim.

gruseom · on Sept 4, 2012

Someone needs to defend McConnell here. I'll do it. He's one of the most meticulous people to work with this material. That comes through pretty clearly in the post you linked to. The problem is not McConnell, it's – and this is the dirty secret of software engineering – that the research literature itself is so weak and spotty. Most studies consist of arbitrary metrics on small data sets with no replication, rarely (ever?) making the actual data available. Massive assumptions and confounding factors are typically obvious and ignored (things like: how familiar were the programmers with the tools they were using). The overwhelming factor is the interpretive preferences of the authors, who knit their conclusions out of this dodgy material. As science, it not only does not hold up, it's a joke.

On the other hand, what they are trying to study is incredibly complex - not just people, not just software, but people working on software, and even social systems of people working on software. How do you even begin to reliably measure that? Psychology looks like physics by comparison. Yet the resources available to do these studies are a drop in the bucket. This is not something society values – it's not even something our own industry values. So it's unsurprising that what we have so far is at best a smattering. You can argue that these toy studies ought not to be dressed up in formal scientific wear, and I don't disagree – it gives an altogether misleading impression of how solid their conclusions are. But no doubt that's true of a lot of papers.

(Oh and on top of all that, much of the literature is hard to track down. Didn't anyone notice that the OP wasn't able to even find half of the citations he was trying to check? Based on that alone he shouldn't be claiming that "the table is completely false", only that he was unable to rebuild it from data.)

The real question is whether we should throw it all out as junk or try to make something of it. I think there's a good argument for junking. But it's also reasonable to say: hold on, for all their weaknesses these studies are all we have and they're not nothing. At least someone is trying to work with actual data. So let's consider them, but carefully. Like walking on a swamp.

McConnell is one of the few people who have made serious efforts to walk the swamp and filter it somewhat and convey its findings to professional programmers. Of all these people (that I've read), he actually stands out as the most meticulous. To fire cowboy accusations of dishonesty at him personally is likely unfair and misses the point. I do think McConnell overestimates how solid the swamp is, but he's not ignorant about it (read that whole post) and the claim that he outright falsifies citations, or anything like that, demands a high burden of proof. I haven't read Bossavit's critique (should I?) but anyone taking that position had better first make sure he's not standing on a swamp too. That's harder than it sounds.

Morendil · on Sept 4, 2012

> I haven't read Bossavit's critique (should I?) but anyone taking that position had better first make sure he's not standing on a swamp too.

Author of "Leprechauns" here.

I'm not sure I follow you in the above sentence. What happened time and again in my investigations was that I would find a citation (in McConnell or in Boehm or elsewhere) that was metaphorically accompanied with the statement "this here is a solid piece of land in the swamp".

When I got there, however, what I found was, in fact, just more swamp.

It takes a higher standard of proof to demonstrate that something is solid ground than to demonstrate there's a flaw in it. That may be unfair, but it's why scientists need a lot of training.

To take just one example, Graham mentions the Hughes Aircraft study, cited (by McConnell and others) in support of the usual "exponential" curve for defect cost (as "Willis, Ron R., et al, 1998").

When you actually read that paper - which is both one of the easiest to obtain and one that has the more detailed data about what was studied and how - and look for the raw data, you find numbers that do not obey an exponential law OR a monotonic increase. For one column for instance the numbers in fact vary within a narrow range .36 to 2.00 with two maxima at the "Coding" and "Functional test" phases, dropping off before, between and after. In a different column the costs vary only by a factor of two between Unit Test and System Test, by a factor of less than four between Code and System Test. The exponential rise is generally not true for the pre-1991 period; some post-1991 measurements get closer, but many do not.

It's hard for me to see this evidence as doing anything else but undermining the original claim of the exponential rise as a generally reliable regularity in software development.

Given this, I don't think that citing this paper in support of the claim is appropriate.

gruseom · on Sept 4, 2012

"I'm not sure I follow you in the above sentence."

I'm saying that if one is going to attack someone personally for dishonesty, as opposed to critiquing the state of the field in general, the bar for that is high and one had better be more than careful with one's own particulars. Reasonable people can interpret this stuff differently. The post linked to upthread [1] didn't strike me as dishonest (though of course it's only one side).

I'm glad you've been looking for solid ground in the swamp and finding it swampy. I think that's valuable. What bothers me are the hints in the OP, comments here, and elsewhere I've seen (including people talking about your work) that this is about one guy making shit up when in reality the problem is endemic to the entire field. Unless he's been egregiously dishonest, which I doubt, it's a distraction.

On another note, you seem like a good person to ask: is there any finding in the software engineering literature that you think holds up? i.e. have you found any solid ground in the swamp? I'm not sure I have (but I haven't looked nearly as hard as you). If there really isn't anything, that alone is kind of shocking.

[1] http://forums.construx.com/blogs/stevemcc/archive/2011/01/09...

Morendil · on Sept 4, 2012

> attack someone personally for dishonesty

Hackers hate ad hominem, and with good reason. I too subscribe to the school of "harsh on the problem, soft on the person". On the other hand, it makes no sense not to call out things that keep us in the swamp, or to tiptoe around important epistemic issues just to spare hurt feelings.

One problem we have is that few people are willing to go to great lengths to check out the available evidence; instead the pattern is to repeat claims (and associated citations) made by people who sound authoritative, accepting them essentially on faith. This has the unfortunate side-effect of magnifying the mistakes of people who have become authorities.

In "Leprechauns" and elsewhere, my focus isn't on what any particular person says, but on specific claims. "The cost of fixing defects rises exponentially as a function of SDLC phase" isn't tied to a particular person - it originated with Boehm but many others have propagated it. My method has been to look up the evidence and to see if it held up. Also to think hard about why studies may have failed to show conclusively what they set out to prove, and how to overcome these challenges.

I'm doing the same kind of thing in my own area, i.e. I'm collecting all available evidence, pro or con, about whether various Agile practices "work".

> is there any finding in the software engineering literature that you think holds up

There are many good ideas and rules of thumb, but when it comes to very general "laws", solidly established - that's harder. I've been asking that question over and over again, hoping to get a convincing answer. Still haven't got one.

I've read a bunch of supposedly solid references, e.g. Pressman, or the "Handbook of Software and Systems Engineering" and have been underwhelmed. (The first "law" proposed in the Handbook: "Requirement deficiences are the prime source of project failures," based on evidence like the Chaos Reports. My rebuttal: http://lesswrong.com/lw/amt/causal_diagrams_and_software_eng... )

gruseom · on Sept 5, 2012

The thing about "many good ideas and rules of thumb" is, I've got a few dozen of those of my own! Most of us do. It would be interesting if there were decisive evidence against any of them, but even when I read studies whose conclusions contradict my beliefs, the studies are so flimsy that I find it easy to keep my beliefs.

There does seem to be a recent wave of software engineering literature, exemplified by http://www.amazon.com/Making-Software-Really-Works-Believe/d... (which I haven't read). Are you familiar with this more recent stuff? Does it represent new research or merely new reporting on old research? If the former, are the standards higher?

mcguire · on Sept 5, 2012

I haven't read all of Making Software yet myself, but you would be interested in one of the first few chapters. I don't remember who wrote it offhand, but as I recall, in discussing the standards of evidence needed for software engineering the author concluded, and I am paraphrasing here, that hard numbers were difficult to get and came with many, many conditions; as a result anecdotes were likely the best you could do and were perfectly acceptable. (Was that enough disclaimers?)

You might be able to tell why I lost my enthusiasm for the book.

gruseom · on Sept 6, 2012

Thanks. My enthusiasm mostly consists of trying to get other people to read this stuff and tell me what it says :)

I think that's the argument for junking the SE literature. If it can't do any better than anecdote, well, to quote Monty Python, we've already got some they're very nice.

ludflu · on Sept 4, 2012

Much respect for the frankness and forthrightness with which he addressed this. I've always taken the exponential increase in the cost of bug fixes with time for granted. I won't do so in the future. We need more empirical studies of software development!

Tashtego · on Sept 4, 2012

Anecdotally, I have certainly found that although the cost in time may not vary as much as this table would indicate, the cost in stress ramps up even faster. Fixing a bug in production is usually a highly stressful endeavor for all involved. I would love to see a similar table phrased in terms of stress comparing different development methodologies currently in vogue (test and throw it over the wall, CI, automated pushes vs. manual pushes, etc.)

emmapersky · on Sept 4, 2012

A google scholar search for the title of the paper behind the paywall reveals numerous copies freely available through university websites.

f4stjack · on Sept 4, 2012

So yeah, long story short from what I gather is: I couldn't find the referenced sources to the table I am referencing so it must be false!

What a leap of logic, seriously...

pilif · on Sept 4, 2012

That's not at all what he way saying. In the conclusion at the end of the article he said that "I therefore no longer consider Table 1-1 in Test-Driven iOS Development to be representative of defect fixing in software engineering [...]".

So while the data might still be true, the numbers are not necessarily accurate and thus you shouldn't base conclusions on that table. He doesn't say that the values are wrong - just that they might be incorrect.

On a related note: I'd love to get real numbers for this. Fixing bugs happening in production certainly feels much more expensive and cumbersome, but is there real data aside of this table which now apparently is inaccurate.

f4stjack · on Sept 4, 2012

Well, that was what I gathered from the part I'll about to quote. But considering English is not my native language I might have misunderstood his meaning:

"As explained in the book, this table is reproduced from an earlier publication:

    Table 1.1, reproduced from Code Complete, 2nd Edition, by Steve McConnell (Microsoft Press, 2004), shows the results of a survey that evaluated the cost of fixing a bug as a function of the time it lay “dormant” in the product. The table shows that fixing bugs at the end of a project is the most expensive way to work, which makes sense…

The first mistake I made was simply that I seem to have made up the bit about it being the result of a survey. I don’t know where I got that from. In McConnell, it’s titled “Average Cost of Fixing Defects Based on When They’re Introduced and Detected” (Table 3-1, at the top of page 30). It’s introduced in a paragraph marked “HARD DATA”, and is accompanied by an impressive list of references in the table footer. McConnell:

    The data in Table 3-1 shows that, for example, an architecture defect that costs $1000 to fix when the architecture is being created can cost $15,000 to fix during system test.

As already covered, the first problem is that I misattributed the data in the table. The second problem, and the one that in my opinion I’ve let down my readers the most by not catching, is that the table is completely false."

ludflu · on Sept 4, 2012

He didn't say it was false. He did say it was unsupported.

gruseom · on Sept 4, 2012

"The second problem, and the one that in my opinion I’ve let down my readers the most by not catching, is that the table is completely false."

ludflu · on Sept 4, 2012

"it" meaning the conclusion about the exponential increase in the cost of fixing bugs over time.

MaxGabriel · on Sept 4, 2012

Would people who've read the book recommend it? It sounds interesting to read because I'm not experienced with TDD, and iOS tends to shy away from it.