I looked into TAP a while ago, but had a couple of problems with it:
1) The whole point is to allow language/harness agnostic tooling, but the only tool of note seems to be smoulder. Everything else seems to be trivial things like showing pass/fail with emoji tick/cross symbols, etc.
2) The TAP standard allows tests to be numbered, and a total count to be given up-front. In practice, all tools seem to require this, and barf when given unnumbered tests. This makes TAP far less useful, since it requires some global coordinating process to know how many tests we're going to run.
I was interested in TAP since I could write test scripts in any language which spit out a simple format to stdout, and they could be nested; e.g. we can have an overall "test everything" script which runs the test script for each of our projects/repos; projects might choose to have different scripts for unit tests, functional tests, integration tests, etc. and those tests might be spread across many files. Ideally we shouldn't have to care about any of that structure. With TAP we need some way to count the tests before running them (or else we have to wait for all of the tests to finish first, which is naff); this requires some sort of declarative structure, at which point our scripts can no longer be opaque black-boxes, so we might as well use a "proper" test harness like xUnit.
Was going to reply this too. The idea of TAP was to allow tests to be streamed too. So in theory a producer could output/emit events of test results. And a consumer would be able to parse/interpret the results. Upon a number like 1..2, it would know that the test set is done.
It gets more complicated with subtests, but that was part of TAP 13 I think, which wasn't formally released.
> The whole point is to allow language/harness agnostic tooling, but the only tool of note seems to be smoulder. Everything else seems to be trivial things like showing pass/fail with emoji tick/cross symbols, etc.
Interesting. In the Node.js ecosystem there are loads of different interfaces (https://github.com/substack/tape/#pretty-reporters), but my favorite is node-tap (which also comes with coverage tests and such).
TAP is something that I invested a lot of time years ago, but nowadays I rarely have time to maintain any project, or contribute to the project (which is active on GitHub now https://github.com/testanything, but before we had only a Wiki that would be offline and be restored from some backup a few times :)
This is what I ended up being involved with:
- Joined the mailing lists, and updated the Wiki. Eventually joined the new GitHub org
- Created tap4j, using an existing implementation as reference (contacted the author, who helped with some questions too) https://github.com/tupilabs/tap4j
That statement says a lot, coming from someone who's spent effort in developing the ecosystem around TAP. It confirms the general feel of the thread, that some aspects are missing or can be improved, for it to be a full-fledged, scalable testing protocol.
When I discovered TAP, its simplicity was so refreshing. In the particular project where I learned about it, the implementation was a single file, a snippet really, included in the tests folder. No dependencies, with such a small interface. I was able to whip up my own tiny tester, and have used it (or similar ones in other languages) over the years.
In a way, I'm glad the protocol stayed small. There are many feature ideas in this thread, and some do look useful (or essential even) - but I find great value in the stability of the spec, it's kind of timeless.
The sharness[1] project originated from the Git project and is a really great and easy way to write tests using shell. Its output is TAP-compatible, so you can use `prove` to verify sharness tests.
TAP is fantastic for testing your Postgres database (indexes, views, functions, triggers, etc.) without depending on any specific client language: https://pgtap.org/
It's unit tests for PostgreSQL, so pretty straightforward use cases. I've definitely caught a lot of edges, type confusion, and refactoring opportunities with it.
I’ll always support more testing. Software Quality is something I think needs all the help it can get.
I’m not sure why I would use something like this, even if it were available for my language (Swift), when I have a robust testing framework (XCTest).
In the past, my team used to use Google Test for testing C++. It was not as powerful as platform-native frameworks, but did allow us to have cross-platform unit tests.
I looked at TAP when I had to implement a testing tool for my own language. In the end, I didn't really see the point of using TAP. Is anyone using it outside Perl? What are its specific advantages compared to using common testing libraries?
I used it with Java with tap4j (https://tupilabs.com/tap4j/). For my project I needed a multi-language format to use in Jenkins. Ended up quickly writing tap4j and the Jenkins TAP Plugin.
Both were used by multiple users back at the time, and we managed to integrate the test results of the projects (PHP, Java, and some JavaScript) through TAP in Jenkins, reporting the test coverage too (in the YAMLish, which is not exactly part of the TAP spec, unless the latest release included it)
I first saw it used with JavaScript, so I assume yes; but the site itself answers this, too. (specifically the producers and consumers pages.)
I think it is of limited use if you can use any of the more sophisticated defacto standards like JUnit compatible output. Still, you can imagine scenarios where a stupidly simple test protocol would be nice. One fairly esoteric use case might be test ROMs for CPU emulators; today many of them output results to serial in some format, but AFAIK there is no unification of the format at all, across different test ROMs.
What's the advantage? While a common protocol at first glance sounds nice, I'd imagine the devil and validity of any test is in the actual implementation. So TAP, as far as I can tell is just a BDD spec.. which basically just amounts to some lines of text for someone to write BDD against.
Am I missing something? I was actually recently looking into BDD-like tools recently, so I feel like I'm a potential user for TAP, but I just can't grok the value add in it offhand.
edit: oh, maybe TAP is the inverse of what I was thinking. It's about output of test suites, for common post-test tooling? Aka to give statuses about the state of tests?
I actually randomly wrote my own TAP producing test runner in TypeScript recently for no reason other than I wanted something to do while I was stuck at home during the pandemic. The protocol is pretty simple to follow although I do agree that you have to have something to orchestrate the whole thing up front. I actually wrote it for an example script for an asynchronous utilities library that had the equivalent of low level primitives like an async version of locks and barriers. The barriers were useful for orchestrating when the different tests ran so that I could make sure that build up and tear down happened at the right time before and after the tests. The locks insured that each tests logs didn’t print over each other as I proxies the console and ran the tests all at the same time and then I did the logging sequentially. I got most of it working but I haven’t touched it in about a month or two. I feel like it might not be a good thing to have those lower level paradigms in JavaScript just because it seems a bit like a foot gun. I’m also pretty certain it wouldn’t work in parallel programming situations just in async situations. I based my TAP producing test runner off a previous work I did in modifying a testing framework that Ryan Florence made in the size of a tweet. I modified it to support async functions [1]. I had to use the async lock in that one too.
I like it a lot, and am surprised I didn't know about it considering I've been looking for a dead-simple git + Makefile + shell script + (chroot or Docker-like) container + simplistic topsort-prio'd shell runner pool + minimal Web UI solution for CI for some time now, of which this could totally be an adequate part to comprehend and render test reports. As a protocol, it might benefit from some enhancements, such as capturing and linking to detailed test result output files for failed tests.
TAP is a neat protocol, and I use it in a few of my projects. I was drawn to it because of Automake's built-in TAP support. It took a lot of work out of including a test-suite that "just works" with the "make check" and "make distcheck" targets.
So TAP is great if you're making an Autotoolized project. Although given that Autoconf's release team has abandoned Autoconf, maybe good Autotools support isn't as important as it used to be.
I guess the projects you mentioned are complementary, in that TAP as I understood it merely specs the output stream/protocol that a test run is expected to produce and isn't limited to testing shell scripts, whereas cram/mdx appears more like a user acceptance test convention specifically for shell script code as component-under-test, much in the spirit of "behavioural" testing a la jBehave.
just want to clarify that the program-under-test in cram can be written in anything, as long as it's executable and produces some outputs (stdout, stderr (or any other file descriptor, really), files, exit code). right now i'm using cram to test a program written in F#.
it's true that cram shines in tests for programs written in sh/bash/zsh, where you can output the command lines your PUT consists of (while | instead of) running them and then have cram verify that your program composes expected commands.
git-pimp.zsh is an example of an early approach: it runs its commands through a helper which uses $GIT_PIMP_CHATTY and $GIT_PIMP_DRYRUN to optionally output the given command line and whether to actually perform it, respectively. the helper is a function called "o" ([o-impl]) which makes the code look a little like a bullet list ([o use]).
[0120-output.t] sets git-pimp to skip execution of `git mailz` and `review-files` invocations, and echo any `git format-patch`, `git mantle`, ... invocations (regardless of whether they're dryrun or actually executed).
`o` was a stepping stone and inspiration to [fake]. this is an external command (as opposed to a shell function), so it cannot stub out shell functions, but it provides greater control over the behavior of mocked command lines. eg. one can mock out `rm -rf --no-preserve-root /` but leave all other uses of `rm` to the actual command, define any combination of exit codes and outputs of particular invocations or prefixes, or provide custom implementations for same. if you ever scripted any destructive, hardware- or network-dependent code (ip(8), fdisk(8), ssh(1)), [fake] can help write sort-of unit tests that do not require expensive setup. as always, mocks and stubs are dangerous, and i'm not claiming this is bullet proof, but it's still very useful.
It happens that I was in need of a TAP v13 parser for Go the other day, and I didn't find one that met our needs. So I wrote one; maybe it's useful to someone else, too:
TL;DR: it takes TAP output (in the form of a []string, one string per line) and produces a Results struct containing an array of Test structs. The goal was to sufficiently parse the TAP output so that it could be transformed sufficiently for storage in other systems.
I can't find anything about why I would want to use it over any of the hundreds of other testing frameworks out there. How does it compare to xUnit? Is it even the same type of thing? Do any of the links lead to non-trivial examples?
TAP isn't a framework, it's a specification/protocol. Sort of like how HTTP isn't a webserver, it's a protocol that a webserver can speak.
TAP is more about the question of "how does a test suite report success/failure to whatever launched it", and all it describes is what the test-suite should report on stdout. It doesn't offer any opinions about language or implementation.
1) The whole point is to allow language/harness agnostic tooling, but the only tool of note seems to be smoulder. Everything else seems to be trivial things like showing pass/fail with emoji tick/cross symbols, etc.
2) The TAP standard allows tests to be numbered, and a total count to be given up-front. In practice, all tools seem to require this, and barf when given unnumbered tests. This makes TAP far less useful, since it requires some global coordinating process to know how many tests we're going to run.
I was interested in TAP since I could write test scripts in any language which spit out a simple format to stdout, and they could be nested; e.g. we can have an overall "test everything" script which runs the test script for each of our projects/repos; projects might choose to have different scripts for unit tests, functional tests, integration tests, etc. and those tests might be spread across many files. Ideally we shouldn't have to care about any of that structure. With TAP we need some way to count the tests before running them (or else we have to wait for all of the tests to finish first, which is naff); this requires some sort of declarative structure, at which point our scripts can no longer be opaque black-boxes, so we might as well use a "proper" test harness like xUnit.