Two sides of software testing: Checking and exploring (2011)

mrkeen · on Jan 26, 2024

> "If they can't tell me what the software is supposed to do, I can't test it," Francine, the test manager, scowled. "So, I tell them that I won't start testing until they produce a detailed requirements document."

I think this attitude could only work if you expected end-users to do the same: no user interactions except things explicitly written in the manual.

satisfice · on Jan 26, 2024

According to the guys who introduced the terminology of "checking" as a term of art in 2010 (Michael Bolton and me), checking is not a "form" of testing. It's not testing in and of itself. But it is a part of testing.

We introduced the terminology because we think it's important for the craft of testing that we reserve the word "testing" for an essentially human process. Testing is what people do.

Testing is evaluating a product by learning about it via exploring, experiencing, and experimenting.

Checking is a mechanical process of operating a product and evaluating its output algorithmically.

A tester frequently performs a check. And all automation that is called a "test" is actually a check by this definition.

All testing is exploratory to some degree. Much of testing involves performing a check at some point.

This way of thinking leads to a richer and deeper craft of testing than the alternative popular notion.

See: https://www.satisfice.com/blog/archives/1509 https://www.satisfice.com/blog/archives/856

CipherPilot · on Jan 26, 2024

While I understand the motivation for disambiguating "checking" from "testing" conceptually, I have some concerns about how this framework may play out in practice. What constitutes a "mechanical" check vs a human "exploration", for example, could likely be a source of endless debate. And is it truly useful to separate the two when most testing undoubtedly involves elements of both? A more pragmatic definition may be preferable to one that sets up a dubious distinction prone to subjective interpretation. Nonetheless, I appreciate the attempt to enrich discussion around strategic testing methodologies at a time when clear terminology is needed to match the complexity of modern software systems.

satisfice · on Jan 26, 2024

I can explain the differences. I've been using these distinctions in practice for many years.

"Mechanical" means algorithmic. It's what Turing meant when he wrote "The idea behind digital computers may be explained by saying that these machines are intended to carry out any operations which could be done by a human computer. The human computer is supposed to be following fixed rules; he has no authority to deviate from them in any detail. We may suppose that these rules are supplied in a book, which is altered whenever he is put on to a new job." See: https://redirect.cs.umbc.edu/courses/471/papers/turing.pdf

Checking is an evaluation activity that can be COMPLETELY automated.

Testing is a social activity done by humans. It cannot be automated because it incorporates social judgement. Social judgement requires collective tacit social knowledge. See: The Shape of Actions, by Harry Collins for a very deep dive into that.

A human exploration is distinguished by the presence of choosing. Humans make choices. You could claim that human choices are an epiphenomenon of bio-mechanics, if you want to. I actually believe that, myself. However, we are speaking of the social/practical realm. In this realm we perceive each other and treat each other as if we are making free choices (although perhaps influenced by known sources of bias). Thus the difference between exploring and checking is that, in every moment, a check is controlled by an explicit script that is knowable in principle; whereas in exploration, a person is forming new thoughts and theories, through a process of real-time sensemaking (otherwise known as learning).

Checking is a process of operating a product and verifying a particular fact. A check does not change the checker.

Exploration is a process of interacting, experiencing, and being transformed by this process. Exploration grows the explorer.

satisfice · on Jan 26, 2024

When I say that testing cannot be automated, this is how I work that out in practice:

Let's say some developer runs his "unit tests" as part of a build process. Let's say that they are automatically kicked off, as part of the pipeline. This developer may say "I've done testing."

I would say, well, we don't know yet. It depends on the answer to these questions:

1. Do you fully understand the unit-level checking code that is being run? Did you write it yourself, for instance, and do you remember what it does and does not do? (IF YES, IT MIGHT BE TESTING, BECAUSE IT'S POSSIBLE THAT YOU COULD MAKE A REASONABLE JUDGMENT ABOUT THE COVERAGE AND VALUE OF THIS PROCESS)

2. Do you fully understand the intent and logic behind the code? In other words, do you understand why the code was written as it was instead of doing something completely different? (IF YES, IT MIGHT BE TESTING, BECAUSE IT'S POSSIBLE THAT YOU COULD MAKE A REASONABLE JUDGMENT ABOUT THE COVERAGE AND VALUE OF THIS PROCESS)

3. Were the automated checks performed correctly? Can you be reasonably sure of this? Did you supervise them in progress or otherwise receive log output that serves as compelling evidence? (IF YES, IT MIGHT BE TESTING, BECAUSE IT'S POSSIBLE THAT YOU COULD MAKE A REASONABLE JUDGMENT ABOUT THE COVERAGE AND VALUE OF THIS PROCESS)

4. Have you reviewed the test results? Did your code collect any metadata or raw data of any kind that may indicate problems beyond those that were the immediate subject of verification? (IF YES, IT MIGHT BE TESTING, BECAUSE IT'S POSSIBLE THAT YOU COULD MAKE A REASONABLE JUDGMENT ABOUT THE COVERAGE AND VALUE OF THIS PROCESS)

So, a checking process becomes testing when a tester applies his judgement to the situation. Otherwise, it's just checking in a vacuum.

Oh, and I'm only talking about the situation where everything is "green." If it's not green, then for testing to be happening, someone must investigate.

An example of checking that is not testing is when my car's "check engine" light comes on. I am not qualified to interpret this event, other than to take me care to a repair shop. When the repair guy looks at it, that check becomes a test, and then he acts on the result of that test.

drewcoo · on Jan 27, 2024

Much as I'm a fan (and still consider myself context-driven), the quibbling over hazy definitions is where I balk a bit. Not the ALLCAPS so much as the sentiment.

What is "testing?" Well James (or Cem or . . .) will proclaim the definition in an enthusiastic and self-assured way and then tell you how you're not doing their just-made-up definition. It feels a bit like a self-help seminar. And I'm not just saying that because you also have books available in the lobby.

And the specific claims are over the top, too. "If you didn't watch your 'test' run, it may not have been 'testing'" isn't very helpful. Especially when you're talking about dev unit tests.

tveyben · on Jan 26, 2024

I’m glad to see you here James, I was just about to make a comment linking to you’re site ;-)

/A happy exploratory tester

rqtwteye · on Jan 25, 2024

Exploratory testing by experienced testers is extremely valuable. I am a dev but have done some testing. I could break the systems in various ways because I had a pretty good feel for what can go wrong.

Beginner users are also very good because they do stuff you never expected.

eru · on Jan 25, 2024

> Beginner users are also very good because they do stuff you never expected.

People using software for a while undergo operant conditioning: consciously or subconsciously they quickly learn to internalise what makes the software 'angry', and learn to avoid that. That makes them more productive users, but also less useful as explorers.

See https://blog.regehr.org/archives/861

> Have you ever used a new program or system and found it to be obnoxiously buggy, but then after a while you didn’t notice the bugs anymore? If so, then congratulations: you have been trained by the computer to avoid some of its problems. For example, I used to have a laptop that would lock up to the point where the battery needed to be removed when I scrolled down a web page for too long (I’m guessing the video driver’s logic for handling a full command queue was defective). Messing with the driver version did not solve the problem and I soon learned to take little breaks when scrolling down a long web page. To this day I occasionally feel a twinge of guilt or fear when rapidly scrolling a web page.

> The extent to which we technical people have become conditioned by computers became apparent to me when one of my kids, probably three years old at the time, sat down at a Windows machine and within minutes rendered the GUI unresponsive. Even after watching which keys he pressed, I was unable to reproduce this behavior, at least partially because decades of training in how to use a computer have made it very hard for me to use one in such an inappropriate fashion. By now, this child (at 8 years old) has been brought into the fold: like millions of other people he can use a Windows machine for hours at a time without killing it.

julian_t · on Jan 26, 2024

I think this is true whenever we learn to interact with a system. When I was learning the saxophone I had real trouble getting one note to sound properly, and it seemed to me that there was a leak on that pad. My teacher couldn't get my instrument to misbehave at all. He said there probably was something slightly wrong with that pad, but over the years he'd learned to play 'with' an instrument, adjusting for its particular behavior. The same goes for things like driving a temperamental car.

switchbak · on Jan 25, 2024

I’ve met Elisabeth, and i think she’s a fantastic thought leader for exploratory testing.

Sadly my efforts to promote this practice were met with a dismissive response: “QA leads to lower quality, according to this study”. I just didn’t have the energy to try to change this person’s opinion.

I have seen exploratory testing in concert with good developer testing work extremely well. Old school SQA isn’t really my thing, but modern exploratory testers are tremendously valuable.

jdlshore · on Jan 26, 2024

Elisabeth is the author of the article they may have been unknowingly quoting: “Better Testing, Worse Quality.”

switchbak · on Jan 30, 2024

I don't think so, I don't know the exact article they were referencing but it was definitely post-2000. I think something more along the lines of this: https://www.forrester.com/blogs/11-02-17-want_better_quality...

It was a very strawman argument, as if we could only have a 90's style split where QA only did testing, or you could have modern developer testing and no QA.

I think modern dev testing + exploratory testing is the best I've experienced, but it has been challenging to explain that. Software organizations are imperfect places, and sometimes it's hard to push against popular trends.

shiroiuma · on Jan 26, 2024

>“QA leads to lower quality, according to this study”.

Was this produced by the same people who published a study claiming that flossing your teeth is bad for you, or the study that claimed that wearing a bicycle helmet is dangerous?

YZF · on Jan 26, 2024

"QA" isn't as precise as wear a helmet while riding your bicycle or floss your teeth every night before going to sleep. It refers to vague practices that vary greatly from organization to organization and team to team.

There are several anti-patterns that sometimes fit under this vagueness:

- Throw garbage over the fence. I.e. low quality products thrown over the fence to a QA team.

- Over-reliance on manual testing.

- Quality is somebody else's problem (sort of related to the first anti-pattern but not exactly the same).

Even the term "Quality" is usually too vague to make sense of. What is quality exactly? How do you tell if your product is higher or lower quality than the competing product? Organizations tend to declare they're the best, regardless of practices and with no real benchmark.

shiroiuma · on Jan 26, 2024

I don't see how having a QA team can actually result in worse quality though. Even for your "throw garbage over the fence" (something I've seen first-hand actually), the QA team is improving quality by finding the bugs in all the garbage thrown over the fence. Sure, getting the dev teams to stop throwing garbage would be better, but if management doesn't want to do that for some weird reason, and wants QA to serve as the bug-fixing org in place of having devs actually test their code before merging to master, not having the QA team at all would be measurably worse.

It's a bit like the bike helmet controversy. The anti-helmet people from Netherlands all claim that helmets somehow force cyclists to ride faster or more dangerously, or force car drivers to take greater risks around them. (Incidentally, this is very similar to the anti-seatbelt arguments I heard decades ago.) The helmet isn't the problem here though, even if there's any truth to these claims: it's the humans. The helmet will absolutely protect your skull better when it impacts concrete than no helmet.

YZF · on Jan 26, 2024

The problem is that these are systems. I.e. you're right that in a given system if all else is constant the presence of some QA/testing step should improve quality. But everything else is not constant.

But yeah, point taken on the bike helmet arguments, though I think that's a little more clear cut.

I think QA (manual software testing by professional testers) has an important role in producing quality software but there are organizational corner cases where the idea of "QA" in software promotes the wrong thinking/approach to quality that results in an overall poor outcome. Quality in software starts before a single line of code is written and has to be in mind throughout the development process so you shouldn't fall into a trap that somehow slapping "Quality Assurance" on top of bad engineering can create quality after the fact.

shiroiuma · on Jan 26, 2024

Just because some people get the wrong idea when something exists, doesn't mean the existence of that thing is an actual problem or detriment. It's true, slapping "QA" on top of bad engineering isn't going to create quality. However, "having quality in mind throughout the development process" isn't going to guarantee it either, especially in a large and complex codebase with modules from many different teams. A QA org is useful for doing things like full-system testing, which can catch bugs that those individual teams might miss because they aren't testing the entire system. It's also useful to have dedicated testers because they'll test things out in ways the devs never thought of. Again, this isn't a replacement for quality engineering at the outset, but that's a management problem.

NomDePlum · on Jan 26, 2024

In most cases I'd agree. QA teams/testers can be very professional.

However, having a team that is responsible for Quality can absolutely lower the quality of a product. For starters that team is now, at least in terms of perception and more importantly organisational politics, responsible for quality, so by definition others are not held as responsible. That can have all sorts of perverse side-effects.

Secondly, if the QA team is not very good, then it's just a time sink. Misreported defects, queries about functionality, increased communication, supposed environmental issues all generated spuriously, then take away time from development and other teams. Lost time is not replaceable, and limited time drives down quality.

Lastly, having a separate team lengthens feedback and issues are spotted later in the cycle, this drives quality down too.

My preference is always to keep testing within the team, automate it and make the developers think about and appreciate good testing. Good testers, can absolutely contribute, and exploratory testing can be highly valuable, particularly for systems users directly interact with, but they are, in my experience, pretty rare.

switchbak · on Jan 30, 2024

You make some very good points. I've definitely seen organizational dysfunction related to the Quality / QA split of responsibility. Typically this manifests as:

- an over-reliance on low quality and fragile tests at the highest and least flexible level of the stack. Usually a GUI or browser based tests.

- under investment in domain synthesis and learning, and an over-investment in either automated testing or robotically following test plans.

- because we've shifted the responsibility and accountability to a different team, the developers may assume that they have a reduced burden to test their work. And you get the typical "throw it over the fence" pattern, with its associated long latencies.

- an us vs them mentality, separate management, and isolated silos of communication.

The pattern I've advocated for is more like: having an experienced and talented exploratory tester as a core part of a given team. They take full responsibility on deeply understanding the domain, and often become a real expert that can identify domain level misunderstandings and poor mappings on to the solution domain. These issues are amongst the most expensive to fix, and the payback for this level of investment is ridiculously huge.

So I wish we had a more distinct term that would help distinguish this older approach of QA (or SQA) with the dysfunctional patterns we've identified from the newer exploratory testing role. Something like "subject matter validator" (naming is hard).

drewcoo · on Jan 27, 2024

As already mentioned by GP, reliance on someone else to catch bugs can mean devs don't pay as much attention. The old saying is that "you can't test quality in," devs have to code it in. Someone else doing your testing also encourages a false confidence in the tests - danger zone!

QA tends to make iterations slower. At the very least, there is some added step with added handoff. That's assuming there's not a lot of back-and-forth between QA and dev. Worse, they often add special test environments (with potential problems) and test passes that take a long time, so even slower still.

There's also slowdown due to the Brooksian "Million Man Month" problem. If you have a team that's having problems hitting a schedule, adding team members isn't going to fix it.

And there's the fact that the only bugs that matter are the ones customers see. More so, the bugs that impact customer use. Customers always seem to find those bugs no matter how much QA you do. And QA gumming up triage processes and filling backlogs with real "bugs" that no one cares about doesn't help anything.

I could go on but this is long-winded enough.

IMO, the way QA has been done in the past completely justified the move to eliminate QA a decade or so ago. That doesn't mean there can't be value in it. It does mean that it can't gatekeep or even slow down progress. QA needs to be committed to shipping product and trying to prevent customer-impacting bugs.

pfdietz · on Jan 26, 2024

The article presents two forms of testing, but it implies checking is only possible with a complete specification of the program.

But that's not true: one can check incomplete specifications, both positive and negative.

To the extent one has any specification, one can automate testing, and in doing so execute far more test cases than any manual exploration ever could. To a significant extent, testing is about sheer volume of tests, not cleverness in constructing the tests.

drewcoo · on Jan 27, 2024

> it implies checking is only possible with a complete specification of the program

No, it doesn't.

If exploratory testing has any value at all, that implies that there cannot be complete test coverage with checks, thus there cannot be checks matching a complete specification.

> incomplete specifications, both positive and negative

There's more to testing than positive and negative test cases.

> execute far more test cases than any manual exploration ever could

Exploratory testing does not "execute test cases," so you have a tautology here.

> testing is about sheer volume of tests

The purpose of testing is open to wildly contradictory opinions. I usually claim that it's to find user-impacting bugs before users do. Maybe that can happen with lots of checks and maybe those can be automated. Maybe you're dealing with a system with lots of hidden interdependencies and some "I'm not gonna touch that" dev code and even if you want to check everything you just can't. Maybe the user experience matters more and that goes far beyond positive/negative test cases. There's no simple one-size-fits-all solution.

tveyben · on Jan 26, 2024

> To a significant extent, testing is about sheer volume of tests

Well, quantity != quality

Just because you have 1E10 test cases does not mean you have a large coverage…

One cleverly constructed testcase might return more insights regarding the product than 100 testcases made just to increase the number of test cases.

Goodhart's law: "When a measure becomes a target, it ceases to be a good measure" comes to mind…

pfdietz · on Jan 26, 2024

Quantity has a quality all its own. Time and again, testing complex programs, I've seen this approach find bugs one could not have conceived when creating manual tests.

You mention coverage. Interestingly, if you read the Csmith paper on random testing of C compilers, they found that while it found plenty of bugs, it didn't increase coverage (over previous tests) more than a tiny amount. Coverage is a great example of Goodhart's law.

omgwtfbyobbq · on Jan 26, 2024

In my experience, manual and automated testing are necessary together, and neither is sufficient alone.

Manual testing alone usually can't cover enough ground, and automated testing alone usually can't generate inputs with sufficient cardinality to cover enough of what the software does.

Jtsummers · on Jan 26, 2024

Property-based and fuzz testing can generate a lot of inputs. You may have to run them overnight or longer sometimes though.

And if you pair property-based testing with model-based testing you can detect a lot of errors that manual testing will likely never discover, just due to time.

https://hypothesis.readthedocs.io/en/latest/stateful.html - one example

drewcoo · on Jan 27, 2024

There are ways to avoid running all night, which I assume is a side-effect of combinatoric explosion of the test space.

Check out pairwise (a.k.a. "all pairs") testing:

https://en.wikipedia.org/wiki/All-pairs_testing

https://www.pairwise.org

And there are also ways to (at runtime) prune the set of cases being run based on system behavior. As in "aha - these similar things failed to meet expectations in the same way, let's flag it as the same bug and stop running anything with the failure characteristic," if that makes sense. Say, for example, you have a zip code field that takes a string and after several instances of tests with a zip code your test determines that you're getting a NaN error (for a string field?) . . . stop running any tests with zip code strings and treat all of those as the same failure. This can greatly increase test speed and make triage of failures easier.

omgwtfbyobbq · on Jan 26, 2024

Nice! That would perfect for fully automated testing if a user didn't need to specify any primitive actions. I'm sure it still saves a ton of time/catches many bugs.

zubairq · on Jan 26, 2024

Exploratory testing has been a huge way for me to build user friendly products. But I guess when you work in a large project that you are not paid to do exploratory testing, therefore product usability can end up terrible with end users hating ever using the thing!

CoastalCoder · on Jan 25, 2024

This is not the topic I expected from the title :)

frogulis · on Jan 26, 2024

Really? It was exactly what I expected. What did you expect?

CoastalCoder · on Jan 26, 2024

> Really? It was exactly what I expected.

Shoot, somehow I actually commented on the wrong article. That's a first.

TL;DR: There was an article about "architecture", but it was about building architecture, not computer architecture. That's what I was trying to comment about originally.