> is why would we suspect the lack of a test indicates a bug?
I can only speak for my experience but the code is not better because it is mutation tested. It is better because we have thought about all of the edge cases that could happen when inputting data into the system.
Mutation testing, as a tool, helps you find statements that are not being exercised when parsing certain data. For example if I write an HTML parse and I only ever provide test data that looks like `<a href=....` as an input string and a mutation testing tool replaces:
if (attrs.has("href"))
return LINK;
with:
if (true)
return LINK;
It is clear to a human reader that this conditional is important but the test system doesn't have viability in this. This means in the following situations you can be screwed:
1. Someone (on the team, off the team) makes a code change and doesn't fully understand the implications of their change. They see that the tests pass if they always `return LINK;`.
2. If you are writing a state machine (parser, etc) it helps you think of cases which are not being tested (no assertion that you can arrive at a state).
3. It helps you find out if your tests are Volkswagening. For example if you replace:
for (int i = 0; i < LENGTH; i++)
with:
for (int i = 0; i < LENGTH; i += 10)
Then it is clear that the behavior of this for loop is either not important or not being tested. This could mean that the tests that you do have are not useful and can be deleted.
> For most non-trivial software the possible state-space is enormous and we generally don't/can't test all of it. So "not testing the (full) behaviour of your application is the default for any test strategy", if we could we wouldn't have bugs... Last I checked most software (including Google's) has plenty of bugs.
I have also used (setup, fixed findings) using https://google.github.io/clusterfuzz/ which uses coverage + properties to find bugs in the way C++ code handles pointers and other things.
> The next question would be let's say I spend my time writing the tests to resolve this (could be a lot of work) is that time better spent vs. other things I could be doing? (i.e. what's the ROI)
That is something that will depend largely on the team and the code you are on. If you are in experimental code that isn't in production, is there value to this? Likely not. If you are writing code that if it fails to parse some data correctly you'll have a huge headache trying to fix it? Likely yes.
The SRE workbook goes over making these calculations.
> Even ignoring that is there data to support that the quality of software where mutation testing was added improved measurably (e.g. less bugs files against the deployed product, better uptime, etc?)
I know that there are studies that show that tests reduce bugs but I do not know of studies that say that higher test coverage reduces bugs.
The goal of mutation testing isn't to drive up coverage though. It is to find out what cases are not being exercised and evaluating if they will cause a problem. For example mutation testing tools have picked up cases like this:
if (debug) print("Got here!");
Alerting on this if statement is basically useless and it can be ignored.
> Is this method better than just looking at code coverage? Possibly none of the tests enter the if statement at all?
Coverage does not tell you what the same thing as what mutation tests tell you. Coverage tells you if a line was hit. Mutation tests tell you if the conditions that got you there were appropriately exercised.
For example:
if (a.length > 10 && b.length < 2)
If your tests enter this if statement and also pass when when replaced with:
if (a.length > 10 && true)
Or:
if (true || b.length < 2)
You would still have the same line coverage. You would still have the same branch coverage. But, if these tests pass, it is clear that you are not exercising cases where a.length <= 10 or b.length >= 10.
> where I'm coming from is that it's not a given this is an improvement to the software development process
In my experience if I didn't write a test covering it, it was likely because I didn't think of that edge case while writing the code. If I didn't think of that edge case while writing the code then I am leaning heavily on defensive programming practices I have developed but which are not bulletproof. Instead of hoping that I am a good programmer 100% of the time and never make mistakes I can instead write tests to validate assumptions.
> Seeing data to that effect would be cool (i.e. after the fact) and if it's real that'd be pretty incredible and we should all be doing that.
Getting this kind of data out of various companies might be challenging.
> is why would we suspect the lack of a test indicates a bug?
I can only speak for my experience but the code is not better because it is mutation tested. It is better because we have thought about all of the edge cases that could happen when inputting data into the system.
Mutation testing, as a tool, helps you find statements that are not being exercised when parsing certain data. For example if I write an HTML parse and I only ever provide test data that looks like `<a href=....` as an input string and a mutation testing tool replaces:
with: It is clear to a human reader that this conditional is important but the test system doesn't have viability in this. This means in the following situations you can be screwed:1. Someone (on the team, off the team) makes a code change and doesn't fully understand the implications of their change. They see that the tests pass if they always `return LINK;`.
2. If you are writing a state machine (parser, etc) it helps you think of cases which are not being tested (no assertion that you can arrive at a state).
3. It helps you find out if your tests are Volkswagening. For example if you replace:
with: Then it is clear that the behavior of this for loop is either not important or not being tested. This could mean that the tests that you do have are not useful and can be deleted.> For most non-trivial software the possible state-space is enormous and we generally don't/can't test all of it. So "not testing the (full) behaviour of your application is the default for any test strategy", if we could we wouldn't have bugs... Last I checked most software (including Google's) has plenty of bugs.
I have also used (setup, fixed findings) using https://google.github.io/clusterfuzz/ which uses coverage + properties to find bugs in the way C++ code handles pointers and other things.
> The next question would be let's say I spend my time writing the tests to resolve this (could be a lot of work) is that time better spent vs. other things I could be doing? (i.e. what's the ROI)
That is something that will depend largely on the team and the code you are on. If you are in experimental code that isn't in production, is there value to this? Likely not. If you are writing code that if it fails to parse some data correctly you'll have a huge headache trying to fix it? Likely yes.
The SRE workbook goes over making these calculations.
> Even ignoring that is there data to support that the quality of software where mutation testing was added improved measurably (e.g. less bugs files against the deployed product, better uptime, etc?)
I know that there are studies that show that tests reduce bugs but I do not know of studies that say that higher test coverage reduces bugs.
The goal of mutation testing isn't to drive up coverage though. It is to find out what cases are not being exercised and evaluating if they will cause a problem. For example mutation testing tools have picked up cases like this:
Alerting on this if statement is basically useless and it can be ignored.> Is this method better than just looking at code coverage? Possibly none of the tests enter the if statement at all?
Coverage does not tell you what the same thing as what mutation tests tell you. Coverage tells you if a line was hit. Mutation tests tell you if the conditions that got you there were appropriately exercised.
For example:
If your tests enter this if statement and also pass when when replaced with: Or: You would still have the same line coverage. You would still have the same branch coverage. But, if these tests pass, it is clear that you are not exercising cases where a.length <= 10 or b.length >= 10.> where I'm coming from is that it's not a given this is an improvement to the software development process
In my experience if I didn't write a test covering it, it was likely because I didn't think of that edge case while writing the code. If I didn't think of that edge case while writing the code then I am leaning heavily on defensive programming practices I have developed but which are not bulletproof. Instead of hoping that I am a good programmer 100% of the time and never make mistakes I can instead write tests to validate assumptions.
> Seeing data to that effect would be cool (i.e. after the fact) and if it's real that'd be pretty incredible and we should all be doing that.
Getting this kind of data out of various companies might be challenging.