This experience is emphasized even more if you work in a language with a REPL, like Clojure. The only value of tests, at that point, are to protect against regressions. The testing process with a REPL happens faster and more organically than writing unit tests, but it's less structured and harder to formalize. It's like comparing a structured debate to a conversation.
What someone needs to do (and I'll do eventually) is implement a REPL enhancement that allows "test capture". Namely, if you have recently evaluated one or more expressions, execute a "capture" command that extracts the working environment (locals, globals), the previous statements you've run, and their evaluated results, and outputs one or more unit tests. For example, a session with clojure might look like this:
> (def x 1)
> (def y 2)
> (+ x y)
3
> capture!
-- Saved capture001.clj
(1 test passed, 0 tests failed.)
Where capture.clj contains a test similiar to:
(def x 1)
(def y 2)
(def expected-result 3)
(assert-equal (+ x y) expected-result)
Fuzzy, and I apologize as I don't know the clojure unit testing syntax, but you get the idea. Of course, this is a baseline case, you'd need more functionality such as allowing the user to specify which parts of the environment to capture, and what results to assert, but these should fall out naturally as the tool is dogfooded.
In fact, this the key reason that you need to have automated tests. I don't know anyone that commits a new feature without trying it -- be it (conveniently) at a REPL or via some other UI. But the manual "trying it out" method, REPL or otherwise, has always suffered from the fact that it cannot be easily repeated, so old features eventually get broken, and nobody notices. This is why I automate my tests, because their value over time exceeds the extra cost to write them vs. trying it out.
I think this is an aspect that the original article fails to take into account. Eliding tests can feel very liberating, and it allows you to plow ahead adding new features faster. Particularly in small, or at least new, projects. But over time reality catches up, and the lack of tests becomes a burden. You start avoiding adding new features, and particularly improving existing code, out of fear of breaking something. And so you end up more constrained than if you had added the right balance of tests along the way.
Writing software that is maintainable, with staying power of years or decades, requires the sacrifice of some up-front productivity.
Exactly - and it applies to regressions caused by changes in the system (ie: change of platform, of version of the language, security patches etc), not only to changes to your code.
Just copy paste the REPL interaction into a doc string, and you're done. Apparently, it recognizes ">>>" as the REPL prompt, and the following line as the expected output. The equivalent for Clojure would be a neat addition.
It's a neat hack -- I think there's definitely a gap though, in that there could be a tool that does some smarter introspection of your history and the state of the REPL to generate a unit test. v1 would be quite rudimentary, but after many iterations this could be an almost magical tool.
The "code-as-data" homomorphic semantics of lisp would make building something like this quite interesting, as it would probably need to transform some of the code you've REPLled from statements into assertions, etc.
I think you would end up writing code in the REPL restricting what you would type because you know it'll become a test.
Then you wouldn't using a REPL for what a REPL can offer, but writing test code.
Unless it was really magical and would cover 100% of anything that someone could type in the REPL. If you were only using a subset of the REPL/languages features because you know your to-test-converter doesn't like some stuff, then you're writing the tests anyway.
I'm confused. This would be a REPL enhancement, meaning it would be something you use as needed.
Currenly, using the clojure REPL to test things comes with a twinge of guilt, as it is not being captured for regression tests. (and I am too lazy to write unit tests separately)
This would make it so that using the REPL would cycle between two "styles", ad-hoc experimentation and then, when you've found some repeatable behavior you want codified in a test, capture mode. These can in some cases be distinct processes (physically and mentally) and in other cases overlap so much as to look and feel like the same thing.
Yes, yes I am. :) The reason being, that first, it's annoying to have to set up the boilerplate for the file. Second, it's annoying to have to convert what I just exercised in the REPL into a test. (Setup, tear down, asserts.)
The reason TDD works and is fun is that you are using tests to learn and explore. It just so happens the artifact of that learning ends up living forever as a test. In a REPL, I'm doing that same learning and exploration already. The act of writing a test becomes as exciting as filing a TPS report.
Here's an example. I've just implemented a new function, and REPL'ed it to solidity. There were about 4-5 ad-hoc calls I made to the function to prove that it worked. I just finally got to the point where I can call it using my 4-5 different arguments, and it always outputs the right thing. Using readline, my arrow keys, and my enter key, I'm repeating the same series of steps over and over until the function works. We all do this. Win.
Now, I'm at a crossroads. Do I just start working on the next piece of the project? I know this piece works, I'm happy with it.
But wait! What if something changes. I need to write a test don't I. Sadness consumes me, since testing is slowing me down. I've already exercised the code, I already know it works, and I've already written the tests, albeit sloppily, in the REPL. Why do I have to switch gears now and start writing a file, running a test runner, and so on?
The truth is: I won't. I'll move onto the next thing, not breaking my flow and not doing something boring, something I already know the outcome of, instead of doing something fun: the next feature.
Maybe those who do switch off and go through the motions to write a test, repeating themselves, are more noble and careful in their programming. But I humbly suspect most of us are more lazy than noble :)
Just want to say that I've been thinking about REPL vis-à-vis unit testing for a few years now and my experience and conclusions match yours very closely. You've done a nice job of articulating them (here and in the root comment).
I don't know, I'm still not convinced. Guess it'll have to be one of those things that I might change opinions after trying (if someone ever comes up with an implementation).
You're too lazy to open a file, type the unit tests and hitting CTRL+S but you're not lazy to open the REPL, type the unit tests and hit capture?
If you can't automate 100% of the REPL-into-test feature, if you need two mindsets/styles/etc, if you still need to "find the behaviour codified in a test", then you're just duplicating in the REPL the same workflow and results of writing the tests in a file. They need to overlap 100%.
How much context do you need? An entire memory dump? The complete REPL history? If IO is involved, do you need to somehow guarantee the same files are available at test time with the same contents?
It seems to me the trick is to set sensible limits on the context of the current REPL state preserved at test time, in a way that works for most kinds of common unit tests. I believe this is the "magic" of which you speak.
I dunno. I think this idea occurs to everyone who understands unit testing and then encounters REPLs; it certainly occurred to me under those circumstances and I got excited about it for a while too. Over time, though, it has struck me as less and less obviously good. Though you're right about where the two approaches to programming overlap, and I agree with you that REPL > tests in those areas, there's also considerable territory where they don't overlap. I suspect that xor represents an impedance mismatch that makes "test capture" not as feasible as it seems at first.
I don't mean to pour cold water on the idea, though; if someone figures out a way of doing it that's useful I'd happily change my mind.
That would be truer if the shell generated the Doctests for you. And if it did, how great would that be? I suspect that this would be easier in a functional (or maybe prototype inheritance) language than a traditional inheritance one, because you work directly with the objects that should carry your tests. In python you would have to decide whether the tests go on the object itself (unlikely but possible) or somewhere up its inheritance chain.
Ah ha. I knew I couldn't have been the first person to think of this, as it's a natural enhancement to the REPL to support TDD. I'm not a pythonista, hence my lack of exposure to it, thanks!
A good REPL is one thing I find it really hard to program without. In many situations, I find ad-hoc testing by typing expressions in to the REPL to see that the return what I expect (or sometimes more experimentally, to see what they return so as to better understand an API) preferable to formal unit testing.
It's the main thing I miss when programming in Haskell. GHCi doesn't quite measure up to Lisp (or Ruby, or Python) REPLs.
What someone needs to do (and I'll do eventually) is implement a REPL enhancement that allows "test capture". Namely, if you have recently evaluated one or more expressions, execute a "capture" command that extracts the working environment (locals, globals), the previous statements you've run, and their evaluated results, and outputs one or more unit tests. For example, a session with clojure might look like this:
> (def x 1)
> (def y 2)
> (+ x y)
3
> capture!
-- Saved capture001.clj
(1 test passed, 0 tests failed.)
Where capture.clj contains a test similiar to:
(def x 1)
(def y 2)
(def expected-result 3)
(assert-equal (+ x y) expected-result)
Fuzzy, and I apologize as I don't know the clojure unit testing syntax, but you get the idea. Of course, this is a baseline case, you'd need more functionality such as allowing the user to specify which parts of the environment to capture, and what results to assert, but these should fall out naturally as the tool is dogfooded.