One release of the GHC Haskell compiler, long ago, had a bug where it put all your source files in the list of build artifacts it had created. When GHC ran into a type error, it deleted all its build artifacts from the current run.
There are stories told that a few argued to keep it around as a feature for training in doing mental type checking.
Do people not typically modify the tests in the change where they introduce a feature?
I have never worked anywhere where I would check in a change without the tests passing. If you are making a huge refactoring, you do it a step at a time, with each incremental change still having passing tests.
When I was at Google I believe that changes were automatically rolled back if they broke enough tests globally. I don't recall anyone ever complaining except that the rollback was slower than they would have preferred. You simply can't check in a library change that breaks half the company, after all.
If people are writing adequate tests and consider commits to be "change lists" instead of just a random thing they want to back up from their local machine... this is a no-brainer. But it does require the culture of having your code reviewer check that they think there's enough code coverage. Otherwise "passing tests" doesn't mean anything.
(Google's tools happily showed lines of code that were covered by tests in the review UI, so you have objective and subjective data as a code reviewer to look at before approving.)
I'm sure your (manual) commits keep together the changes to the code and the matching changes to the test that keep the test green—but your Ctrl+S'es probably don't. In this strange workflow, the tests are required to be green every time you press Ctrl+S; and any time you press Ctrl+S and the tests aren't green, everything you did since the last time the tests were green disappears from the worktree.
I just can't imagine how that would scale in any significant way.
One of the many points of automated testing is that it's automated. I don't have to mentally run all of my tests in the back of my head to tell whether or not my code is correct. With this setup, I do. I become scared to even hit Ctrl+S in case I made a typo somewhere. Or perhaps even worse, what if I made a typo in my test and just committed a half-finished implementation? I've taken automated testing, one of the most brilliant time-saving inventions in all of programming, and turned it into a chore.
And as a programmer, that's supposed to encourage me to do more TDD?
I understand that the article is specifically about addressing those kinds of doubts, but... I dunno. I can't imagine ever trying this for any program bigger than a couple thousand lines of code.
If the goal is to have tiny changes, why not just count up the changelog and revert it if anyone ever tries to commit more than 25 lines of code? It seems like that would be just as effective, and potentially less annoying.
Yeah, this is how I’ve worked for 16 years. The unit tests always pass. Maybe once or twice a year you can go way out on a limb and do something exploratory where you don’t care about the tests passing for every commit, but... part of TDD and CI is that the tests always pass.
In some ways, I think the feature branch->PR model of today is a step backwards from traditional central server single-trunk development of years gone by (CVS/Perforce/SVN). Branches and PRs work better for F/OSS development, and for monster refactoring, but I kinda long for the days of a normal team just committing to master/trunk several times a day.
I think PRs are logically the same as Perforce changelists. The fact that you can pick apart a PR into multiple commits does pollute the history when you merge, though, unless your tools treat the PR as a single atomic commit (for bisection, etc.)
I have never treated the individual commits inside a PR as anything other than "I'd like to backup my workstation to the cloud now", but if people are treating them as something meaningful, I think that's a mistake.
I don't know if I'm reading your comment correctly, sorry if I'm not... but individual commits within a PR are absolutely not a backup. Every commit in the PR has to be a logical unit, meaningful, atomic and stand on its own. The PR, as a high level feature, is just a grouping. But single commits have to be useful and be a history/build up of how high level features are being developed. This granularity is important to logically isolate the changes and make them bisectable among other things.
The popularity of rebase merging in Git I think is partly to blame here. Rebase merging tries to make everything seem like real history, but no one ever seems to have a solid idea of what that actually is supposed to mean (and CI systems don't make it easy to configure "test all commits" or declare what you want your commit history to mean).
There's no such thing as "real history" - even without rebasing, your commits do not reflect the development of the code; they reflect practically arbitrary moments in time at which you chose to commit.
The idea behind rebasing is not to make it seem like "real history", but to have all changes in a commit to be related. They're not a reflection of how you built the code, but how you can understand it. They're documentation, and as such, you spend effort on them.
Yup. I always picked "merge" instead of "rebase" in Gerrit because it's not like I tested the result of applying each patch to the master branch. All I did was decide that mine worked and that I wanted to merge my branch with the upstream.
People like using clever features to make the history inaccurate, which seems like a bad idea to me. (It's "cleaner", but wrong.)
Once your test suite is big enough that it won't run within a minute or two, it's kind of hard to keep a "green on each commit" workflow, though. Especially with UI and visual diff tests in the mix, I certainly don't want to tie up my local computer for a full test run when I'm just fixing a small bug; pushing the commits in a branch and letting the full suite run against that while I move on to another task is a much better use of resources, and that's where the PR model seems most useful to me.
Are there workflows that avoid this problem but still enforce every commit being green?
Sure, I separate heavyweight tests out into integration/acceptance tests (the side effects make them slow; the side effects mean they're not really unit tests). So, I don't commit until the unit tests pass, and then a PR triggers heavyweight tests on my build server, to be sure you're clean before the merge to master.
But if there's too much divergence between unit test and integration test pass/fail rates, then unit test coverage needs to be improved; obviously, easier said then done when you have old pre-TDD that's tightly coupled and/or code that's interacting with the system and incurring side effects that are hard/expensive to test.
In Phabricator you make your changes on a local branch, running "arc diff" to post or update a changeset. CI runs on each update to the changeset. You're expected to have CI passing before you "arc land" (squash merge to master and push), and usually before you solicit code review. Commit whatever garbage you want locally and even publish it to the Diff, but your code doesn't touch master in the central repository until reviewed and passing.
> But it does require the culture of having your code reviewer check that they think there's enough code coverage. Otherwise "passing tests" doesn't mean anything.
Replying just to highlight that it's possible, even common, to have a scenario in which:
Back in my last job, management tried to get automated standards for code coverage. This led to people writing unit tests that looked like:
//implementation
var CONSTANTS = {
RED : "red"
}:
//test
test(function () {
assert.strictEqual(CONSTANTS.RED, "red");
});
I spent a lot of time arguing (unsuccessfully) that these kinds of tests accomplished literally nothing. I heard everything from, "well, this lets someone know that the constant shouldn't change" to "well, writing it twice will encourage programmers to be more careful."
The worst was that people started doing it with translation strings. So we'd send a config file to the doc team, they'd rephrase something to be clearer, or remove a misspelling, and suddenly tests are failing and the translation string can't be committed until the next day. We had a release build delayed over this once.
Meanwhile, because the coverage report is already showing the code path that displays the string as green, nobody has written tests to figure out if the dialog containing the string actually shows up on screen.
Sometimes I feel like the entire concept of code coverage was just a mistake ;)
This only comes up if you have data that isn't bundled with logic: as long as there is some logic to test, even enums end up with full coverage.
The outstanding issue with Java code coverage tools is switch statements, because there are unhandled paths generated by the compiler, but there is an easy solution: don't use switch statements. Enums are basically always a better option anyway.
Code coverage is a very poor measure of quality of your tests. You can have lower coverage and great tests, and high coverage with completely useless tests.
I use test coverage as part of my workflow. Run my tests with coverage on and make sure my new code is fully covered. Kinda like a sanity check to make sure I'm not missing anything. Or if I'm stepping into old code and I want to see how well it's tested, I'll run the full suite with coverage on and see how bad it is.
The second management starts asking for x% code coverage is when it starts to break down, imo.
I do this fairly often. I find it can aid communication and recall.
Few examples:
Say we have a parser and we discover that it chokes on certain input. I'll start a new branch and first commit will be "Failing test case for XXX" to demonstrate the bug. Then I go about fixing it (or sending it off to a colleague who is better suited to do the work). Once it's fixed, it goes into PR and it gets merged into master only if CI likes it. No one is inconvenienced by the fact that that particular test fails in one of the commits...
In this case, there are couple of benefits:
1. I get to see the tests go red. Happened to me a few times where I do the test and fix in the same breath cause the fix seems simple, but then turns out the fix did nothing as the test wasn't triggering the bug. I find that I'm less likely to do a commit that says "add failing test" than I am to dive right into fixing the code prior to running the tests.
2. It's easy for anyone to go back in time to see that test indeed got red, that is to verify that provided input indeed demonstrates a bug etc. If fix and failing test case were in the same commit, then one would have to carefully checkout just some files and not others.
3. If bug is being handed off to someone, it's pretty obvious where to start if there is a single failing test when they begin looking into it.
In a similar vein, another usecase for a commit that breaks the build is as a means of communication when "architecting" new features and delegating work. We are all developers, so code sometimes comes more easily than prose, and creating a new test case that fails is more expressive and less ambiguous than paragraphs of text in the ticket.
Say we get a ticket that says "We are having problems with trolls", and we want to make it a bit harder for them to register (contrived example and silly solution). I can create a new branch, make the following commit and assign to a junior developer:
+ assert_false(
+ UserRepository->create( email => 'blah@mailinator.com' ),
+ 'Do not allow throwaway email addresses to register'
+ );
This way, whoever gets to work on the feature has a great starting point - ticket explains the business problem, and the failing test puts them right where they need to start looking. It can be a time saver for everyone involved vs me updating the ticket and saying:
"Hi DevJunior, I think the way to solve this is to hook into the registration process. The create method in UserRepository is probably the best place to implement this - it should return failure if someone tries to register with @mailinator.com"
Lastly, it's great to have a failing test to come back to when needing to leave a certain branch for a while - be it end of the day, or switching focus to more urgent matters. I find I remember quicker where I left off if test suite tells me, than if everything is green and I have to hunt what was the next thing that was needed to complete the feature... Basically, using VCS and the testsuite as a distributed TODO list :)
On some projects (e.g. Mesa), the test projects (piglit, etc) are decoupled from the project. It's not unusual that tests for a feature land after the feature is merged.
This is the exact method I use to teach TDD and it works. It works really well.
The article mentions the sunk costs fallacy. This is true. In the workshop I give you tougher and tougher parameters, shorter and shorter time frames. If your tests aren’t passing you reset to the last commit. You can’t commit without passing tests. People learn, a) write smaller simpler tests, b) code is cheap. The third time you write something you’ll write it faster and better. Better to do that in a tight loop than over a prolonged dev-test-refactor cycle that could last days to years.
You also learn how to refactor in the green. Boy is that a good skill to have.
I don't like lumping "add test & pass" together. It's very easy to accidentally write a test that doesn't test what you think it tests, and actually passes even without the "fix" applied. I do think it's very important to verify that the test fails without the associated change applied.
This breaks `git bisect`, so maybe they shouldn't be separate commits in the public commit log, but they should be separate commits while you're working on it.
That's why the red-green-refactor style of TDD is so useful. You start by writing a failing test, then make it pass with the simplest possible change, and then refactor the code while keeping everything passing. You'd be hard pressed to write a test which fails and then passes by accident. It leads to extremely clean code IME.
In my experience this approach offers nothing, because the whole challenge is in writing the right tests, and you almost never know what the test is supposed to be like until after writing code for a while. There’s a symbiosis between knowing what test needs to be written and knowing what code needs to be written.
No part of specs, TDD, BDD, etc., actually ever captures the reality of this, despite every Name Brand Workflow always claiming to.
> There’s a symbiosis between knowing what test needs to be written and knowing what code needs to be written.
This should only be the case while unfamiliar with the language or project at hand. In my experience, once you're familiar with the project and the language, mulling over a requirement for a short time should be enough to split it into smaller features which are simple to test in isolation.
It has pretty much nothing to do with your experience level, your familiarity with the project or the language or tooling. When those things matter, they amplify this phenomenon, but the baseline effect is already high.
It’s a function of requirements being ambiguous by design. Non-engineering stakeholders have a severe need for requirements and specs to be fungible and ambiguous, such that whether or not the state of the project satisfies requirements is fundamentally not codified in any objective document and no test can reflect useful objective measures of it. Tests can only loosely approximate it and the ways to do so are always shifting and changing and up to a subjective interpretation of some non-technical people.
Perhaps the only objective measure that non-tech people will agree to be specific about is financial cost, and even then usually only after it becomes a problem.
That’s every project in a product-oriented tech company period, and anyone claiming otherwise is full of it.
There can be isolated, tiny pockets of maintenance work or last stages dev work once you pass a point of no return when business people can no longer sustainable lobby for requirements to continue to be ambiguous and fungible, and finally you can start proving things with accurately scoped tests, but it’s so late in the game that the idea you could do something like TDD at that point is comical.
Instead, becoming a good product-oriented developer means in part learning how to write well factored tests on the fly at the same time you are writing the code to be tested, and keeping testing code clean and set up with fixtures and set up to run in highly automated ways, so that when you’re inevitably ripping stuff out of the implementation for the 18th time due to yet another priority pivot from sales, you can nimbly also update the tests as you go.
I've been using TDD for years (since 2000!) and my experience doesn't match yours. I've used it on countless projects, big and small.
I think the disconnect is that you're assuming the tests are about requirements, when in fact the tests people write with TDD are much more tightly focused.
> “in fact the tests people write with TDD are much more tightly focused.”
This usually doesn’t help and sometimes hurts, because if the tests are tighlty scoped yet the requirements are ambiguous, it usually means the tests are very related to the current implementation details, which are going to rapidly change due to scope pivots.
TDD simply doesn’t work when you don’t know what tests to write or how to structure tests to accomodate scope and priority changes, which is all the time. Starting out by writing a test then coding to pass that test is baking in an assumption about that test which is almost surely already wrong, and just creates rework to undo a tightly focused test.
When TDDing the tests are going to change a lot, just like the code you are saying you write anyway. I recommend the book Refactoring for the tools so you can make the changes without breaking the test, then clean up the now-unimportant tests, and then clean up the unused code. That way the tests stay green even when making major, breaking changes.
Being able to slice projects that way, in terms of functional behavior, is absolutely a skill that requires practice. You are already thinking about what your code needs to do: the practice is to get good at expressing that in terms the computer understand. Once you are in the habit, the tests make it easier to evolve the code rather than harder.
It is possible to be a good developer without that skill, but learning and practicing the skill will make you better.
I'm sorry you're going to feel I'm full of it, but I've seen TDD work in three different companies. Of course, TDD is not going to fix broken project management such as blindly trying to second guess vague requirements. Other types of QA are also necessary, including (but not limited to) client UAT and sign-off on features, to make sure you're continuously working towards something the client would want to use.
I appreciate the comment but for me seeing is believing. I’ve heard many people make claims like yours, but across 5 “enterprise” software jobs at large companies (many thousands of engineers) and 2 jobs at medium startups ~10 yrs old and 300+ engineers, I’ve never ever seen any type of formal methodology make any impact or difference of any kind.
I’ve only ever seen non-engineers arguing about process crap and making claims like this, while engineers are forced to learn a bizarre skill set of how to actually solve problems despite this quarter’s latest and greatest new process thingy that’s going to fix it all.
Indeed. I always have to change a condition in the test to make sure that it's actually running, rather than that I just happened to get it right the first time ;)
This sounds highly useful for changes you can do in a few hours. Or for toy projects where I can keep the whole application in my head.
I have no idea how this should work on a big refactoring where you touch every single one of your.. say 200 files and add something to every single method. (Don't ask, assume a change that moves every single primary key to a composite key, so every getX(id) would become getX(id1, id2)).
I mean, we were multiple people - we split out the work because a lot of it wasn't even very error prone, but this took days (ok, weeks). Are you supposed to just not commit or even show the stuff you did to the others? (Yes, there were many, many reviews and pair programming sessions in between).
Of course this was on a feature branch.
Yes, it lived too long, yes there were still bugs at the end when we fixed the tests. I don't think it would've gone better or quicker if we had meticulously broken down the exact same change (add an id to every getter and setter on controller and db-layer level). Actually we tried - and as expected it took deep dive through all layers.
I don't think the contrast to TDD (made in the image more than the text) is right: why not split 'add test & pass' into 'add test' & 'pass', and then you still have TDD.
Sure, you have one commit where the test was broken. But the product was broken anyway, and now you have a commit proving the test works.
The workflow is using Git as a realtime state-synchronization primitive (like Operational Transformations in Google Docs), not an SCM. The point is to push the commits over to the other person as soon as they're created, and for the other person to pull them and rebase against them as soon as they're created.
Unlike character-by-character OT sync, though, each synced Git commit does represent a working codebase. You could theoretically retroactively grab huge numbers of these commits and squash them down and describe the result—though multiple independent things might have happened in them that would be very difficult to untangle, given that, in this workflow, it's entirely possible that two separate people are working collaboratively on two separate features, in the same file, at the same time.
I assume that commits are being made to some sort of topical branch (Edit: I guess not? That seems like a poor decision). Personally, I encourage the developers across our organization to do an interactive rebase prior to pushing to a remote, to "curate" their commits (squash/fixup).
This greatly improves the utility of the commit history, at the expense of losing the actual commit history as work progressed:
- FeatureA should ensure valid user session
- fix test
- missing semicolon
- null check
- fix incompatibility with FeatureB
- shit broke the build
- address PR comments
Probably useless in terms of existing, standard development processes, but it seems to me like this Limbo effort is to come up with a new one (in which we can assume, by the principle of charity, that this is not entirely useless).
We did this in CI at an earlier job, and what we found is that it works if and only if you have a good test suite. In our case we had too many false positives because of badly written legacy tests, so it was not a usable approach.
The problem with this is making changes TOO small - then you can find you are testing the wrong behaviour just to make your code committable. I think there is a balance to finding the right size increment while still keeping direction
The is how I’ve developed code for almost a decade. For a given test we have the following behavior triple: desired result, expected result, and observed result. A test fails if it’s expected result differs from the observed result. The set of development work is every place the desired & expected results differ. The CI is passing when no test changes behavior. There’s a revert “window” where a change-in-state can be blessed.
From the perspective of a test that only passes or fails, this doesn't gain you anything.
I suppose we could imagine a test system which includes more metadata, and allows for placeholder values to let tests pass while producing an annotation that the product is not production ready.
What am I missing? If commit is a possible action, then we have a unit of code not yet in the repo. So if that is the case, won't the revert be reverting a previous, unrelated commit? If we can't commit unless the tests pass, then why would we ever revert?
I believe what I'm missing is taking the word `revert` too literally. I suspect it's actually `test && commit || checkout .` essentially.
The idea of this workflow is that saving runs the tests, and then either commits or deletes all changes. I think in this workflow I would write the code, get the tests to pass (thereby creating a commit), and then create a new commit with good comments.
This sounds like a sure way to enforce a local optimum. If you can't make mistakes and break enough things at a time to change the architecture, in order to fix the architecture itself, it means your architecture is going to be extremely sticky, and there will be no chance that design mistakes will be corrected.
Owning production systems means you can't make mistakes and break things. People are already doing what you're saying can't be done, everyday, when they change production systems.
He's saying you should figure out the path from A to B and get there in small steps. And make sure your tests pass at each step. Don't make big changes that are hard to propagate and merge with others' changes.
Basically, avoid the classic "nobody change anything, I'm doing a merge today!" situation.
> People are already doing what you're saying can't be done, everyday, when they change production systems.
Only seemingly so. People do break tests when they are writing (or removing) code, and even intermittent commits often don't pass tests (they thus have to be squashed before the merger, they only exist for making reviewer's jobs easy and keeping them risks faulty bisect for example). Development and deployment are separate things.
I'm thinking they mean the Bourne(-style) shell, where this syntax means that test is run, either commit or revert is run. That is, it is a shorthand for:
if test
then
commit
else
revert
fi
Presumably the point is to encourage you to keep your code changes and tests so small that you are OK with losing work.
is evaluated, and the resulting value is the left side of
expr || revert
Not sure if that was intentional, but it's a very nice one liner assuming either your commit never fails, or your commit runs extra tests, and you want to revert if those fail.
Assuming it's shell, they are always processed in left-to-right order, so it's (A && B) || C, or (A || B) && C, with no proper order of operations. In this case, it's closer to test ? commit : revert though since the commit can't fail (but that's not a bash operation)
The latter. Including reverting if commit fails. I'm not sure if they meant that literally, but it could actually be useful to revert if commit fails, since commit might have hooks that run extra tests.
Anyway, I suspect he's proposing it not as a serious methodology, but as a thought experiment to spur you into how you can make your changes smaller, simpler, and more atomic.
You could couple this with code-coverage requirements, like coverage must always increase or stay level with every commit. That can still be gamed but at least it's harder and more obvious to other devs
Speaking of tests that never fail... I've seen contractors hand over code exclaiming they've met the 75% test coverage required by SalesForce's platform but never actually asserting anything.
The times I found XP unworkable in the real world all fit the scenario of "We're doing XP, but not pair programming because that's silly, we don't do x and y either because those are impractical."
The few projects that fully embraced it worked surprisingly well.
There are stories told that a few argued to keep it around as a feature for training in doing mental type checking.