This technique is also known as characterisation testing, golden-master testing, snapshot testing, and probably other names too. I recommend looking into https://approvaltests.com/ (already mentioned by another commenter).
I use characterisation testing all the time, in a perhaps unusual application: Checking the behaviour of "Page Objects" (classes used to model the application-under-test in GUI-level automated testing) when the application-under-test has changed. That's right: Automated [characterisation] tests for my automated [GUI] tests! It makes GUI testing oh so much easier to maintain. I wrote it up here: https://david.rothlis.net/pageobjects/characterisation-tests
Another thought related to characterisation testing, this time from Jeremias
Roessler's talk at QAFest Ukraine 2017
(https://www.youtube.com/watch?v=f4PT_u8hjhU): Traditional automated tests (with asserts) are "blacklisting" the changes that aren't allowed, whereas a characterisation test will catch any change in behaviour, and instead you have to "whitelist" the changes that are allowed (by providing regexes or some kind of pattern to match the dynamic output that is allowed to change).
I started doing this a few years ago while working on a Sphinx extension that defines new directives but also customizes a few output targets (including two ~plaintext output targets).
It's not only useful to avoid accidental output changes, but it's essential for iterating with confidence that tiny changes at the code level affected the output as-expected over a mostly-representative fraction of the documentation set.
I didn't have a term for it at the time. I guess about a year later I bumped into snapshot testing, but that's always felt like more of a metaphor.
I recently found it useful for an approximation algorithm. Since it was an approximation, it didn't really make any hard guarantees on the output for a given input, but snapshot testing at least made it very clear when a change affected the output and to what.
What I like about this idea of Approval Testing is that I don't have to write too many tests if I know my system is working, as long as I am notified when the behaviour of my system has changed (usually in some unexpected way).
This would require auto-mocking of all subsystems to prevent side-effecting functions from sending email & SMSes.
I believe it is a direct descendant of yours, which I’ve used from time to time over the years. I just didn’t feel the need to support *, % was enough for me.
Thanks for all the stuff you’ve written about Make: I’ve enjoyed reading quite a lot of it!
I appreciate these posts for "I just need to do X..." google-search-for-my-fix problems. But I've always found it frustrating that this is what the whole www is now: a whole lot of spread out individual docs for individual problems, rather than a concise review of useful information (a "just the useful bits" database if you will).
Try to learn anything in depth these days and either you're deciphering a dense yet incomplete manual, or googling until eternity. Make is a great example, as even though they have a useful manual, good luck figuring out how to apply it to your problem. Wikis don't quite cut it either. We need a different data model for knowledge sharing.
OK, but I wrote those things as "advanced" uses of make because the actual GNU Make Manual is very, very good (https://www.gnu.org/software/make/manual/) and people should read it.
I keep wondering about doing a video series explaining GNU Make from the ground up.
I went to an awful lot of trouble to apply roughly the same technique to test some code generation code [0]. Required pulling in a random dependency and hacking around for a while to make the output look right. Your solution is much slicker!
What you ended up with has two major advantages: ① it integrates with the Cargo test harness, so that Rust developers’ expectations will be met; and ② it does it in one compiler invocation. The test harness I describe would be quite slow for your sort of situation.
I’m not completely sold on using a separate crate and build.rs how you have, but it looks like it’ll yield a good usable result. A couple of related things you might be interested in: compiletest_rs, trybuild.
Heads up: what’s shown and run at https://tty-player.chrismorgan.info is currently out of date (though https://github.com/chris-morgan/tty-player is up to date). I wrote it years back against Web Components v0 for a project that has been shelved for a few years, and more recently updated it to Web Components v1 and Shadow DOM, because I wanted it for this article. Still need to update it to xterm.js.
The most popular thing in this space nowadays is asciinema, mostly used by embedding it from asciinema.org, but I prefer my thing.
I use the Poly variant by default, only disabling it in places where the monospacedness matters for layout purposes, e.g. the terminal recording in this article (because of Vim), and Rust compiler output in some of my other articles. The Poly variant does things like make i, l and space a little narrower, and m a little wider, which makes casual reading more comfortable than a strict monospaced font.
I do this because I believe that monospacedness is substantially overrated in most places, and that most things actually look better not strictly monospaced. I contemplated not even using a monospace-style font at all but decided that was probably going too far for most people. (And as a Vim user I necessarily work in a monospaced text editor; but if it weren’t for that, I’d probably go full proportional.)
So I’m curious if you have further feedback on this matter and why you find fault with what I’m doing. It may influence what I do.
I think one of the best reasons to use a monospaced typeface is that it is a fairly strong and accurate signifier of code. Of course, in this case you have special highlighting for it that makes it less useful, but in general I think that it really helps. (Plus there are a couple of other, minor benefits probably not worth listing here.)
As mentioned elsewhere I've been using proportional for code and it's very nice but for a small drawback. I'd say give it a try for a couple of days and see.
To you and anyone else with this opinion: try disabling the `code { font-feature-settings: "ss01" 1, "ss02"; }` rule in the CSS, which will disable the Poly variant, and let me know how it feels.
It seems possible to me that it’s actually the use of a true serif monospace font (>95% of monospace fonts used these days are sans-serif, and >95% of the remainder are slab serif) that’s throwing you off, more than the strict monospacedness of it, and I’d like to try that hypothesis out.
(In early development of the visual style, I used only the font with no spacing or colour hints, but I found the monospace Triplicate too similar to the serif Equity, so that it was sometimes not quite clear enough that it was code; that was the reason why I put the background colour on inline code rather than only on code blocks, even though that wouldn’t be done in a printed manuscript, which is a style I am loosely imitating in part.)
Disabling `code { font-feature-settings: "ss01" 1, "ss02"; }` made a marginal improvement for me but it was only marginal. The bigger issue I had was the type-face was too large.
Ultimately you're never going to win a discussion about type-faces because they're entirely personal preference. For example I find most proportional fonts to be too narrow and harder to read so much prefer the typically wider glyphs of monospaced type-faces. To the extent that the font I used on one of my blogs was rounder letters. I then had complaints that others found it "unreadable" and preferred something narrower.
I'm sure there will always be a sweet spot where more than average number of readers will be content however the web would be a little duller if everyone converged on that same type-face. So I'm willing to take a marginal hit on readability (and let's be honest, the different is almost always only marginal) for the sake of websites having their own personalities. The alternative if people can just toggle Reader View in Firefox (or whatever the equivalent is in other browsers)
I object to this whole thread as bikeshedding, however I happen to use proportional fonts for code (lucide sans unicode in windows) but just yesterday reverted back to a proportional font (lucida console).
While I much, much prefer proportional it's simply that indented stuff after text didn't line up properly in it eg.
stuff = 23
more stuff = 99
x = 41
(edit: sorry HN is messing up the indenting even further, but you know what I'm getting at)
Also my magit popup buffer is all ziggy-zaggy instead of properly column'ed. I can live with that. Edit - and git log which relies on fixed-width to show properly gets all bollixed.
In your case I can't see your code suffering at all from these problems, so I'm fine with it.
The niceness of proportionals may be enough for me to go back to it. I don't know yet.
Interesting. I’ll give the thought time to settle and probably disable Poly for all code blocks tomorrow, leaving it on only for inline code. (That’s what I did initially in the design, but then I decided to make normal code blocks Poly as well because I preferred it so, and why not? —But it seems to be disconcerting people.)
Yet one of the things I really like about Poly is how it decreases width. Disabling Poly would slightly harm layout on https://chrismorgan.info/blog/rust-fizzbuzz/ where I have code side-by-side, increasing the width required for the full layout without wrapping from ~1500px to 1600px. Ah well. It’s not critical, just makes me a little bit sad. (Admittedly I could get much of that back with `tab-size: 3;` instead of `tab-size: 4`, but that would doubtless make people baulk too. And I’m not going `tab-size: 2` except on small displays.)
I'd gently suggest that, while the side-by-side code is a strong with the early for loops, it's not adding much as much value for the longer snippet with the Display impl.
I use Triplicate, Concourse and Equity extensively. Equity, there are surprisingly few fonts like it in its organic feel and faithfulness to the old art of printing. (This doesn’t convey as much as I’d like, but I lack the terms of art to describe what I mean properly.) Triplicate, well, it’s the only good true serif monospace that I know of, and I like that. Each font fills a niche that I very much appreciate. I reckon they were worth buying on their own merits, but it was also then a way of supporting Matthew Butterick’s project https://practicaltypography.com/ which I strongly approve of as a project.
I've used a similar technique to test, generating an expected output, actual output and then diff them.
One trick I found helpful was using JSON to serialize test results instead of unstructured plain text.
Test results stored as JSON are much easier parse and therefore process. You can quickly whip up programs that verify the tests satisfy invariants, diff the tests and filter out expected test changes from unexpected test changes.
This aspect of expected and unexpected test changes is even more important than the diff part in my opinion. It allows you add failing tests immediately once you get the bug report and you notice if you fix something accidentally.
The Readme of that project could do with a few examples of what tests and successful/unsuccessful output look like. I found the examples folder and still can't visualize what it might be like.
I've been using cram (also written in Python) for a private project and been mostly happy with it: https://bitheap.org/cram/
cram is a very good tool for testing in this manner: a test file is basically a copy/paste of a terminal window, deviations from expected behavior are represented using diffs, and `cram -i` will prompt you to update the test file with actual output. and it supports globbing and regular expressions for fuzzy matching.
i've been using cram for everything i write for what feels like a decade (it'll be 10 years old in september), and though it has it's limits, i bitch and moan about it very little given how much i rely on it. if you'd know me you'd recognize that this is a huge endorsement, i'm quite vocal about my disdain for most software in existence. :)
Major drawback mentioned here is that make breaks when used with filenames with whitespaces, which is a big blocker for some uses. Anyone know of a similar alternative which handles this?
I’m a big fan of using make in my projects. It’s nice to be able to sit down another dev or new user and just tell them to `make build` or `make test`. It also makes finding bugs easier as you can bisect with it.
I use characterisation testing all the time, in a perhaps unusual application: Checking the behaviour of "Page Objects" (classes used to model the application-under-test in GUI-level automated testing) when the application-under-test has changed. That's right: Automated [characterisation] tests for my automated [GUI] tests! It makes GUI testing oh so much easier to maintain. I wrote it up here: https://david.rothlis.net/pageobjects/characterisation-tests