I did not come in here expecting to read such effusive praise for testcontainers. If you’re coming from a place where docker wasn’t really a thing I can see how it looks beautiful. And in a fair amount of use cases it can be really nice. But if you want it to play well with any other containerized workflow, good freaking luck.
Testcontainers is the library that convinced me that shelling out to docker as an abstraction via bash calls embedded in a library is a bad idea. Not because containerization as an abstraction is a bad idea. Rather it’s that having a library that custom shell calls to the docker CLI as part of its core functionality creates problems and complexity as soon as one introduces other containerized workflows. The library has the nasty habit of assuming it’s running on a host machine and nothing else docker related is running, and footguns itself with limitations accordingly. This makes it not much better than some non dockerized library in most cases and oftentimes much much worse.
Testcontainers is not shelling out to the docker CLI; at least on Java, it is using a Java implementation of the docker network protocol, and I believe that’s the case also for the other platforms.
Not sure this matters for the core argument you are making, just thought I’d point it out.
The comment you are replying to makes so many mistakes about how Testcontainers works on Java that I'm not sure what source code the commenter is looking at.
Which is a perfectly fine counter-argument if you are, in fact, not going it right.
If you say you're doing continuous deployment because you deploy every Tuesday evening, it's perfectly fair to point out that that's not continuous deployment.
Of course, you should follow it up by explaining why as well, but many companies don't actually follow the agile principles.
This is a digression from the original topic, but my criticism of agile is in fact that nobody is doing it right. If the majority of companies that attempt it end up just wasting extra time, the process itself is broken.
I don't think you can claim that. It's the fate of most popular systems. they end up being poorly explained in elevators or adopted based on a blog article rather than investing the few hours or days or weeks to understand what made the process originally successful. It's so common we have a term for it: cargo culting. I don't think you can fault agile for the tribally-spread BS most places do today where points == hours. If anything, you can maybe fault it for seeming a bit too familiar and simple when there are a few important nuances.
Using quotation marks as a way of summarizing what someone might say but didn't literally say is a fairly common practice. I think the quotation is a fairly accurate depiction of the sentiment in the comment it responds to, so I don't see any issue with it.
It may be common, but that doesn't mean it's warranted. Logical fallacies, for example, are also common (so are spelling and grammar mistakes), and yet personally I prefer to commit them less often rather than more often. You think the quotation is a fairly accurate depiction of the sentiment expressed by another person. I don't think that. In fact, I think the opposite. One way to settle the matter would be to ask the person.
What do you say, doctorpangloss? Do you think this is a fairly accurate depiction of the sentiment you were expressing?
"this is wrong, but I'm not gonna explain why"
I doubt this person will respond, of course (they're not obliged to).
> It may be common, but that doesn't mean it's warranted. Logical fallacies, for example, are also common (so are spelling and grammar mistakes), and yet personally I prefer to commit them less often rather than more often. You think the quotation is a fairly accurate depiction of the sentiment expressed by another person.
This works both ways; your questions phrasing assumed that the person you responded to felt the same way about how quotation marks should be used. It seems likely that you knew the answer was that they weren't intending to literally quote anyone, but you didn't ask them about that first, which is why it comes across as passive-aggressive.
For clarity, this is the comment that the quote was referring to:
> The comment you are replying to makes so many mistakes about how Testcontainers works on Java that I'm not sure what source code the commenter is looking at.
The comment quite literally calls something wrong ("The comment you are replying to makes so many mistakes"), and it doesn't give any evidence to the claim that there are "so many mistakes" to explain this. When someone says one thing and then claims that it shouldn't be taken literally because they intended something entirely different, that's called gaslighting.
It doesn't matter if it interfaces via CLI or not. Testcontainers tends to make things work but also introduces difficulty in resolution. You seem to have missed the point.
I'm interested to hear what you would do instead! I'm using Testcontainers in a very basic scenario: A web app with a PostgreSQL database. There are different database backends available (like Sqlite) but I use PostreSQL-specific features.
Currently, in my integration testing project, I use Testcontainers to spin up a PostgreSQL database in Docker and then use that for testing. I can control the database lifecycle from my test code. It works perfectly, both on my local PC and in the CI pipeline. To date it also did not interfere or conflict with other Docker containers I have running locally (like the development database).
From what I gather, that is exactly the use case for Testcontainers. How would you solve this instead? I'm on Windows by the way, the CI pipeline is on Linux.
There is no alternative if you want Postgres "embedded" within your test, I have researched that for a long time, as full PGSQL as Docker image sounded as overkill, but nothing else exists.
I think grandparent’s concerns were not with using Docker in general but instead with how the Testcontainers library orchestrates it. I assume there’s an alternate approach behind the critique, and that’s what I’m interested in.
I've been using Dagger to do container service lifecycle as the API is pretty powerful. But some other alternatives, depending on your use case
- Some CI/CD support container orchestration e.g. Github service containers. This is my usual recommendation for CI setups. You're unlikely to get local runs this way though. It also has a lot of similar limitations to testcontainers
- If you're building for Kube, managing the container lifecycle and services with kube is probably going to be more straightforward than docker networking stack. You can run a minikube and spawn your containers in it, for example. Then your client library that was previously running testcontainers is simply making kube calls. This gives you much more control over the container lifecycle
- If you're not spinning up and tearing down containers repeatedly (I've seen people just use the same container in unit tests as a service and just wipe it. Often for performance reasons to avoid the overhead of spin up and tear down), just bootstrap a compose file before running your tests.
Sometimes testcontainers is really going to be the best solution for your problem though. One common use case is if you need to spawn a LOT of the same container over and over for your unit test suite, and it has to be a fresh container. My advice is to try to isolate this part of the pipeline from the rest as much as possible if you have to do this.
Never had any issues. We have 100+ build jobs running on Jenkins and most of them have some Testcontainer tests. These never collide if implemented correctly (randomised ports with check for e.g.) even when run in parallel.
On my machine running several docker dev environments it was also never an issue.
Can you specify what issues you had? Also I am pretty sure the library does not work as you describe. Isn't it using the Docker Engine API? I could be mistaken, never checked the source code.
Edit: Just checked the documentation. According to the docs it is using the Docker API.
> We have 100+ build jobs running on Jenkins and most of them have some Testcontainer tests.
That's probably why you can and do use it, Jenkins. Jenkins let's you install w/e on the hosts where as more modern systems default context is a docker container or at least speak it natively.
> Can you specify what issues you had?
Some of my devs have coded tests containers into their ITs. These are the only pipelines we can't containerize, because test containers don't seem to work inside docker, and won't work in k8s either.
Ah! That is not an issue of testcontainers. In our Jenkins every pipeline it dockerized and uses agents. So it is not running on the host directly. However it is also not running dind (Docker in Docker). Instead it is important to use dood (docker out of docker), which is best practice anyway.
Your CI needs to be configured once and all your problems should go away IF tests have randomised ports or even better also run in docker so that those ports do not need to be exposed on host level. Most of my colleagues however prefer not to use dev containers so randomised ports was the solution for us for now.
Just as an example, Gitlab is (was?)kind of infamous about this - everything you do in gitlab for a long time was already in docker, so you by nature of running in gitlab were forced to do everything in DinD mode. They might have changed this in recent times.
I think as long as you can expose the docker socket (should also be possible in gitlab) you can use dood. Dind has never worked out for me. Sooner or later there will be issues.
That's not a solution. This requires you to expose those host docker socket mounted into a guest container, which breaks some security mechanisms by preventing isolation from the host systems.
Which also does not work on k8s. It does work in docker compose, but also isn't portable to places where the docker socket isn't available, eg hosted pipelines.
Test containers seem to work well if you're not already properly using containers in your pipelines, and are not interested in deployment to k8s.
Well for the environment you describe, there is no solution to this problem right? If you want to spawn test containers, you need a privileged container. At least I can not think of anyway to archive this without exposing the docker socket.
You have to choose the right tooling for the right job.
I have had a similar intuition from when trying out testcontainers some years ago.
I do not know how the project has developed but at the time I tried it it felt very orthogonal or even incompabtible to more complex (as in multi language monorepo) projects, cde and containerized ci approaches.
I do not know how this has developed since, the emergence of cde standards like devcontainer and devfile might have improved this situation. Yet all projects I have started in the past 5 years where plain multilingual cde projects based on (mostly) a compose.yml file and not much more so no idea how really widespread their usage is.
I would guess that this speaks to an unattended (developer) user story related to other workflows, or perhaps the container-adjacent ecosystem overall. Testing with any workflow is always tricky to get just right, and tools that make it easy (like, "install a package and go" easy) are underrated.
Came here with exactly this on my mind. Thanks for confirming my suspicion.
That being said, having specific requirements for the environment of your integration tests is not necessarily bad IMO. It's just a question of checking these requirement and reporting any mismatches.
> the spirit of their argument is clearly not wrong.
Uh, the jury’s out on that seeing as how the parent didn’t give any specifics other than some hand waving about “issues with other containerized workflows.” OK, elaborate on that please, otherwise the parent isn’t really making a valid point or may be misinformed about the current state of Testcontainers w.r.t. to these vague “other containerized workflows.”
Counterpoint, I’m not the only person showing up in this thread saying “what exactly do you mean?” So I don’t think it’s clear what the OP is actually referring to. If you know, please enlighten us.
Seems like the „community-maintained” ones they endorse, like the Rust implementation, do
I did not realize Rust wasn’t officially supported until I didn’t go to their GitHub and see in the readme that it’s a community project, and not their „official” one
could you elaborate what limitations does is have?
how this does not play nice with remote docker/other docker containers?
I don't know this library but it looks like something that I started writing myself for exactly the same reasons so it would be great to know that's wrong with this implementation or why shouldn't I migrate to use that, thanks
A few examples of the difficulties I've had with testcontainers:
- Testcontainers running in a DinD configuration is complex and harder to get right
- Testcontainers needing to network or otherwise talk with other containers not orchestrated by testcontainers
- general flakiness of tests which are harder to debug because of the library abstraction around Docker
In general if anything else in your workflow other than testcontainers also spawns and manages container lifecycle, getting it to work together with testcontainers is basically trying to reconcile two different configuration sets of containers being spawned within Docker. I think the crux of the issue is that testcontainers inverts the control of tooling. Typically containers encapsulate applications, and in this case it's the other way around. Which is not necessarily a bad thing (indeed, I am a huge proponent of using code to control containers like this), but when you introduce a level of "container-ception" by having two different methodologies like this it creates a lot of complexity and subsequent pain.
Compose is much more straightforward in terms of playing well with other stuff and being simple but obviously isn't great for this kind of unit test thing that testcontainers excels at
Test containers is such a game changer for integration testing, they have language specific docker apis that make it trivial to bring up containers and verify that they are fully initialized and ready to accept connections.
Pretty much every project I create now has testcontainers for integration testing :)
I setup CI so it lints, builds, unit tests then integration tests (using testcontainers)
If you are testing a microservices "ball of mud", you can (and probably should) setup a testing environment and do your integration tests right there, against real dependencies. The tool seems nice for simple dependencies and local testing but I fail to see it as a game changer.
You mention this as an afterthought but that's the critical feature. Giving developers the ability to run integration tests locally is a massive win in a "ball of mud" environment. There are other ways to accomplish this locally, but the test-infrastructure-as-test-code approach is a powerful and conceptually elegant abstraction, especially when used as a tool to design testcontainers for your own services that can be imported as packages into dependent services.
For example we have pure unit tests. But also some tests that boot up Postgres. Test the db migration and gives you a db to play with for your specific “unit” test test case.
No need for a complete environment with Kafka etc. It provides a cost effective stepping stone to what you describe.
What would be nice if test containers could create a complete environment, on the test machine and delete it again.
Still a deploy with some smoke tests on a real env are nice.
It's not really a middle ground if you're not testing your service in the same conditions as in production environment.
If you're not testing integration with Kafka, and the producer, your service is still lacking integration tests.
Testing classes in isolation with testcontainer is fine. But I observed that with microservice architecture the line between E2E tests and integration tests are blurred.
Microservices can and should be tested from the client perspective.
In my last project we used https://java.testcontainers.org/modules/kafka/ to start a small Kafka container. It's not the exactly like a production installation, but it goes a long way.
I agree with this. At work we use both approaches but at different levels of the test pyramid.
To test integration with 1 dependency at class level we can use test containers.
But to test the integration of the whole microservice with other microservices + dependencies we use a test environment and some test code.
It's a bit like an E2E test for an API.
I would argue that the test environment is more useful if I had to choose between the two as it can test the service contract fully, unlike lower type testing which requires a lot of mocking.
Yeah, I prefer setting up docker-compose.yml myself, so I can startup services once, and also do manual testing.
The only thing I would maybe use testcontainers for is to deploy my own service into docker as part of integration test so I can test a more realistic deployment scenario instead of running it locally outside docker.
I very strongly disagree. Having "integration" tests are super powerful, you should be able to test against your interfaces/contracts and not have to spin up an entire environment with all of your hundreds of microservces, building "integrated" tests that are then dependant on the current correctness of the other microservices.
I advocate for not having any integrated environment for automated testing at all. The aim should be to be able to run all tests locally and get a quicker feedback loop.
Can you explain more in more detail why this is a game changer if i already have an inhouse framework that is similiar in using docker for integration tests? Does it start docker up faster then you could do normally? Is it just the out of the box apis it provides?
I dont know why integration testing like this is considered a gamechanger. the testing pyramid is a testing pyramid for a reason and its always considered them important. Sometimes starting with integration tests in your project is right because your dont waste time doing manual point and clicks. Instead you design your system around being able to integration test, this includes when you choose dependancies. You think to yourself "how easily will that be able to be stood up on its own from a command?" If the answer is "not very good" then you move on.
If you have an existing in-house framework for anything, maybe it's not worth switching over. It does help though when a best practice bubbles to the top and makes this in reach for those who don't have an existing in-house framework and who wouldn't know how to get started on one. It also helps for more people to have a shared understanding about a subject like this thanks to a popular implementation.
Meanwhile, Testcontainers is done quite well. It's not perfect, but it's sure better than the in-house stuff I built in the past (for the same basic concept).
No, it does not start faster than other Docker containers.
I do challenge the testing pyramid, though. At the risk of repeating my other comment on a different branch of the discussion: the value of integration tests is high, as the cost of integration tests has decreased, it makes sense to do more integration testing, at the expense of unit testing. The cost has decreased exactly due to Docker and mature application frameworks (like in Java: Spring). (See: Testing Trophy.)
one thing i have seen with testcontainers (been a user for a few years) is the ergonomic SDKs that they have
especially in languages like golang, it makes spinning containers up/down, accessing the ports (eg: a mongodb container for some e2e test flow) super trivial - its like a nicety layer on top of vanilla docker (w/ the cost of including their sdk in your test build process)
yes, 100% can be done using docker directly or docker rest api (and def doesn't make sense to migrate if you have already made an investment in an in-house framework that doesn't require much upkeep)
thanks for the responses, i just wanted to cut through the marketing. taking on standardised tools is a win for me, i just wanted to know about real world experience and use-case. Indeed taking on deps is not something i do lightly.
> value of test pyramid
I mean more from the perspective of covering your bases, you never just want one kind of testing pattern in your project. Each codebase is different and i agree that taking on high value test styles/cases is a project by project challange that should be tailored by many variables. The shape of your testing pyramid may be different to others. If your inheriting a legacy system, maybe its top heavy because the effort/reward ratio just isnt there. In this circumstances i usually take on the approach of "add more layers when bugs are found" to hone in on places that could use more or less test coverage.
Our inhouse framework is really just a wrapper around certain tools that fill different gaps (think docker/selenium etc) in order for different projects to build suites that are compatible with our ci/cd pipelines that do things like generate environments on demand to run test suites against. So dropping in testcontainers to replace the home-grown docker will be trivial. Keeping test frameworks fresh and compatible with the cloud vendors that agreesively upgrade is a challange just like keeping the API bleed of other programming deps is. Our test suites essentially have a domain language that is consistant. We can upgrade selenium, swap functions for different operations, without having to change any tests. Same goes for unit or integration tests - they are exactly the same in terms of assertions, syntax etc, they may just just have different environment setup logic. CI/CD can inject and overrride logic as it needs. Sometimes its suitable, in some cases, to mock certain external hard deps in integration tests for instance to having all the unit testing tools availible a plus. Or in other cases, we may take a unit test written against mocks, and inject real deps into it for certain CI/CD scenarios.
FYI Docker already has a RESTful API, and programming container start/stop is trivial to do in any language. I haven't used Testcontainers before, and can kinda see the utility, but IMO it really isn't worth it in the long term to take on a new external dependency for a bit of code that (1) is a critical part of the team's development and release process and (2) can be written in-house in maybe an hour.
> it really isn't worth it in the long term to take on a new external dependency for a bit of code that (1) is a critical part of the team's development and release process and (2) can be written in-house in maybe an hour.
This seems to be quite a contradiction. If it's so easy to just write from scratch, then why would it be scary to depend on? Of course, it's not that easy to write from scratch. You could make a proof-of-concept in maybe an hour... Maybe. But they already took the proof of concept to a complete phase. Made it work with Podman. Added tons of integration code to make it easy to use with many common services. Ported it to several different languages. And, built a community around it.
If you do this from scratch, you have to go through most of the effort and problems they already did, except if you write your own solution, you have to maintain it right from the git-go, whereas if you choose Testcontainers, you'll only wind up having to maintain it if the project is left for dead and starts to bitrot. The Docker API is pretty stable though, so honestly, this doesn't seem likely to be a huge issue.
Testcontainers is exactly the sort of thing open source is great for; it's something where everyone gets to benefit from the wisdom and battle-testing of everyone else. For most of the problems you might run into, there is a pretty decent chance someone already did, so there's a pretty decent chance it's already been fixed.
Most people have GitHub and Dockerhub dependencies in their critical dependency path for builds and deployment. Services go down, change their policies, deprecate APIs, and go under, but code continues to work if you replicate the environment it originally worked in. The biggest risk with code dependencies (for non-production code like test code) is usually that it blocks you from updating some other software. The biggest risk with services is that they completely disappear and you are completely blocked until you fully remove the dependency.
I think people depending on Testcontainers are fine and doing very well with their risk analysis.
> This seems to be quite a contradiction. If it's so easy to just write from scratch, then why would it be scary to depend on?
It's easy to write an implementation specific to your existing project's development environment and workflow correctly.
It's hard to write a generic version that works in any environment. It's hard to write a version that lets you build a company and make money.
It's scary to depend on this generic version because it's too generic, and it's built by a for-profit company now who wants to upsell you to some "testcontainer cloud" crap, which doesn't exactly incentivize them to make the OSS version perfect.
For example, we were already using bazel, so writing a version using bazel to create a correct bare rootfs for a dependency + using runc to execute it in tests resulted in something with a roughly 3ms warm startup time that fulfilled all our needs, and cached correctly (since bazel has good caching).
A junior engineer, starry-eyed at test-containers, switched some code over to it, and the warm startup time went up by 1000x from 3ms to 3s, as did the flake-rate, and docker's caching is far worse than bazel's so the several minute cold-starts also happened even more often.
> Testcontainers is exactly the sort of thing open source is great for; it's something where everyone gets to benefit from the wisdom and battle-testing of everyone else.
You get to have a mismatched mess of solutions to everyone's problems, including solution's to problems you don't have. You get code of the quality of the average OSS programmer which, while higher than the average programmer, is still pretty crap.
Free Software is great when it's run by a small group of smart opinionated people. As soon as you try to build a company around that OSS and, god forbid, hire some enterprise sales people and product managers, it quickly becomes worse at the actual small developer problem than what even a mediocre developer could hack out in an hour.
Depending on testcontainers is fine, but if you know what you're doing, writing something purpose-built is fine too, and probably gives you something much nicer in the end.
To be honest, it doesn't sound like you really had a good use case for Testcontainers anyways. Where it excels the most is in just pulling in some external containers, especially databases, e.g. PostgreSQL and Redis, directly in your test harness. In those cases, it works well.
Our use-case was redis servers, one of your supposedly good use-cases for testcontainers.
I didn't mention, but the testcontainer code also preferred to add that network io to the actual test runtime, which made measuring the test's performance harder, and meant we couldn't as safely cache the test's results. The bazel version made it easy to build the test's dependency as part of the test compilation process (as it should be) so the test runtime didn't have to do external network IO.
"Excels" is also a stretch; we had a few hundred tests launching dedicated redis servers, and with hand-rolled code, that worked fine with zero flakes. With testcontainers, it regularly flaked with some opaque docker network error or sometimes just plain a timeout because apparently launching 100s of containers in parallel is a hard problem for docker or something.
I'm sure it works well for some people, but if those people wanted to build out their own version without docker and testcontainers, specific to their development tooling and environment, it would probably work better in most cases I think.
Yeah, I couldn't tell you: I didn't have serious issues with performance or reliability when I used this. It didn't really seem like it was doing anything special, it seemed like it is just starting a Docker container like you'd expect, at least on Linux. I sincerely doubt I would've had any better experience handrolling it since my solution to handroll it would also be to use the Docker API; the reason for this is for portability.
This thought process taken to the extreme is what results in half the internet depending on leftPad and isOdd to function. Open source is great, but there is a difference between providing a utility vs adding needless layers of abstraction. It is the responsibility of the developer to figure out which is which rather than reach for the package manager for every task in front of them.
In this case, like I said earlier Docker already has a RESTful API. You don't need to replicate the entire feature set of Testcontainers, just make a couple of HTTP calls in your language of choice to achieve the exact same outcome.
That's just a slippery slope argument. I am not arguing for people to depend on leftPad, I'm arguing that depending on a library like this which is probably tens of thousands of lines of tested code that does roughly what you want already is a damn good idea. Again, I don't want to handroll a solution and have to test it out across Windows, Mac, Linux and then check again to make sure it works with both Docker and Podman, and then again to check AArch64 Windows, AArch64 Mac and AArch64 Linux, and so on. I'd prefer to use a library. And hell, even if any of those things don't work, I can report it to the maintainers and possibly contribute a fix, and then everyone else gets to benefit too.
When writing something like this for yourself you would only support one database, one programming language, one unit testing framework, much less documentation and none of the edge cases that someone else requires.
The effort would not be equal to replicating the whole project. It wouldn’t be the same quality of course. Question is do you need all of it or not.
Yeah, but that's kind of the thing, you get less battle testing most likely for also less surface area. If you pull in a project like this, it makes it vastly easier when you need more, because if it happens to already have built-in support, you don't need to spend additional engineer time on it.
To me it seems straightforwardly a win to start with Testcontainer and move to something else when it proves insufficient.
I think they make the biggest difference when testing data pipelines (which have historically been difficult to test). You can now easily test out compatibility between different versions of databases, verify data types, embed as part of your build, etc.
I believe the next step, once using test containers, would be automating data generation and validation. Then you will have an automated pipeline of integration tests that are independent, fast and reliable.
You can automate data validation with snapshot tests. I do it this way with a data pipeline and have a function that queries the destination DBs and puts them unto json to be written validated with a snapshot
Not sure how I hadn't encountered this before, I LOVE this pattern.
I find integration tests that exercise actual databases/Elasticsearch/Redis/Varnish etc to be massively more valuable than traditional unit tests. In the past I've gone to pretty deep lengths to do things like spin up a new Elasticsearch index for the duration of a test suite and spin it down again at the end.
It looks like Testcontainers does all of that work for me.
My testing strategy is to have as much of my application's functionality covered by proper end-to-end integration-style tests as possible - think tests that simulate an incoming HTTP request and then run assertions against the response (and increasingly Playwright-powered browser automation tests for anything with heavy JavaScript).
I'll use unit tests sparingly, just for the bits of my code that have very clear input/output pairs that afford unit testing.
I only use mocks for things that I don't have any chance of controlling - calls to external APIs for example, where I can't control if the API provider will be flaky or not.
I love integration tests. You know why? Because I can safely refactor all I want!
Unit tests are great, but if you significantly refactor how several classes talk to each other, and each of those classes had their own, isolated unit tests that mocked out all of the others, you're suddenly refactoring with no tests. But a black box integration tests? Refactor all your code, replace your databases, do whatever you want, integration test still passes.
Unit test speed is a huge win, and they're incredibly useful for quickly testing weird little edge cases that are annoying to write integration tests for, but if I can write an integration test for it, I prefer the integration test.
Legit. Probably an unpopular opinion but if I had to chose only one type of test (queue a long discussion with no resolution over defining exact taxonomic boundaries), I'd go with integration over unit. Especially if you're a new contributor to a project. I think it comes down to exercising the flow between... Well, integrations across components.
Even better? Take your integration test, put it on a cronjob in your VPN/vpc, use real endpoints and make bespoke auth credentials + namespace, and now you have canaries. Canaries are IMHO God tier for whole system observability.
Then take your canary, clean it up, and now you have examples for documentation.
Unit tests are for me mostly testing domain+codomain of functions and adherence to business logic, but a good type system along with discipline for actually making schemas/POJOs etc instead of just tossing around maps strings and ints everywhere already accomplishes a lot of that (still absolutely needed though!)
Right. Unit tests are typically a waste of time unless you have some complicated business logic (say, some insurance rates calculation, etc.).
This was advocated long time ago in the (great) book "Next Generation Java Testing: TestNG and Advanced Concepts" by Cédric Beust and Hani Suleiman (old people will remember his (in)famous The Bile Blog...).
Thanks for saying this out loud. I’m a solo dev and in my project I’m doing exactly this: 90% black box integration tests and 10% unit tests for edge cases I cannot trigger otherwise. It buys me precious time to not adjust tests after refactoring. Yet it made me feel like a heretic: everyone knows the testing pyramid and it comes from Google so I must be very wrong.
This advice is so misguided that I'm concerned for our industry it's getting so much traction.
> You really want to avoid testing implementation details because it doesn't give you very much confidence that your application is working and it slows you down when refactoring. You should very rarely have to change tests when you refactor code.
Unit tests don't need to test implementation details. You could just as well make that mistake with integration or E2E tests. Black box testing is a good practice at all layers.
What unit tests do is confirm that the smallest pieces of the system work as expected in isolation. Yes, you should also test them in combination with each other, but it serves you no good if you get a green integration test, when it's likely only testing a small fraction of the functionality of the units themselves.
This whole "unit tests slow you down" mentality is incredibly toxic. You know what genuinely slows me down? A suite with hundreds of integration tests, each taking several seconds to run, and depend on external systems. But hey, testcontainers to the rescue, right?
Tests shouldn't be a chore, but an integral part of software development. These days I suppose we can offload some of that work to AI, but even that should be done very carefully to ensure that the code is high quality and actually tests what we need.
Test code is as important as application code. It's lazy to think otherwise.
> If by "smallest pieces of the system" you mean something like individual classes then you are definitely testing implementation details.
No, there's nothing definite about that.
The "unit" itself is a matter of perspective. Tests should be written from the perspective of the API user in case of the smallest units like classes and some integration tests, and from the perspective of the end user in case of E2E tests. "Implementation details" refers to any functionality that's not visible to the user, which exists at all levels of testing. Not writing tests that rely on those details means that the test is less brittle, since all it cares about is the external interface. _This_ gives you the freedom to refactor how the unit itself works however you want.
But, if you change the _external_ interface, then, yes, you will have to update your tests. If that involves a method signature change, then hopefully you have IDE tools to help you update all calling sites, which includes application code as well. Nowadays with AI assistants, this type of mechanical change is easy to automate.
If you avoid testing classes, that means that you're choosing to ignore your API users, which very likely is yourself. That seems like a poor decision to make.
Congrats, you understand what "unit test" was originally supposed to refer to. This is not what it's commonly meant to most people for years. The common meaning is "test every individual function in isolation".
I think this came about because of people copying the surface appearance of examples (syntactic units, functions) and not understanding what the example was trying to show (semantic units), then this simplification got repeated over and over until the original meaning was lost.
> If by "smallest pieces of the system" you mean something like individual classes then you are definitely testing implementation details.
If your classes properly specify access modifiers, then no, you're not testing implementation details. You're testing the public interface. If you think you're testing implementation details, you probably have your access modifiers wrong in the class.
If I change something at the lowest level in my well abstracted system, only the unit tests for that component will fail, as the tests that ‘use’ that component mock the dependency. As long as the interface between components doesn’t change, you can refactor as much as you want.
Sure, that's a tradeoff that you make. Personally I update my implementations more often than I update the interfaces, so I'm happy to take that hit when modifying the interface in trade for knowing exactly where my implementations break.
in a perfect world each unit would do the obvious thing without many different paths throught it. The only paths would be the paths, that are actually relevant for the function. In such a perfect world, the integration test could trigger most (all?) paths through the unit and separate unit-tests would not add value.
In this scenario unit tests would not add value over integration tests when looking for the existence of errors.
But: In a bigger project you don't only want to know "if" there is a problem, but also "where". And this is where the value of unit tests comes in. Also you can map requirements to unit tests, which also has some value (in some projects at least)
edit: now that I think about it, you can also map requirements to e2e tests. That would probably even work much better than mapping them to unit-tests would.
> in a perfect world each unit would do the obvious thing without many different paths throught it.
I don't think that's realistic, even in an imaginary perfect world.
Even a single pure function can have complex logic inside it, which changes the output in subtle ways. You need to test all of its code paths to ensure that it works as expected.
> In such a perfect world, the integration test could trigger most (all?) paths through the unit and separate unit-tests would not add value.
This is also highly unlikely, if not impossible. There is often no way for a high-level integration test to trigger all code paths of _all_ underlying units. This behavior would only be exposed at the lower unit level. These are entirely different public interfaces.
Even if such integration tests would be possible, there would have to be so many of them that it would make maintaining and running the entire test suite practically unbearable. The reason we're able and should test all code paths is precisely because unit tests are much quicker to write and run. They're short, don't require complex setup, and can run independenly from every other unit.
> But: In a bigger project you don't only want to know "if" there is a problem, but also "where". And this is where the value of unit tests comes in.
Not just in a "bigger" project; you want to know that in _any_ project, preferably as soon as possible, without any troubleshooting steps. Elsewhere in the thread people were suggesting bisecting or using a debugger for this. This seems ludicrous to me when unit tests should answer that question immediately.
> Also you can map requirements to unit tests, which also has some value (in some projects at least)
Of course. Requirements from the perspective of the API user.
> now that I think about it, you can also map requirements to e2e tests.
Yes, you can, and should. But these are requirements of the _end_ user, not the API user.
> That would probably even work much better than mapping them to unit-tests would.
No, this is where the disconnect lies for me. One type of testing is not inherently "better" than other types. They all complement each other, and they ensure that the code works for every type of user (programmer, end user, etc.). Choosing to write less unit tests because you find them tedious to maintain is just being lazy, and finding excuses like integration tests bringing more "bang for your buck" or unit tests "slowing you down" is harmful to you and your colleagues' experience as maintainers, and ultimately to your end user when they run into some obscure bug your high-level tests didn't manage to catch.
> Even if such integration tests would be possible, there would have to be so many of them that it would make maintaining and running the entire test suite practically unbearable. The reason we're able and should test all code paths is precisely because unit tests are much quicker to write and run. They're short, don't require complex setup, and can run independenly from every other unit.
I think having a good architecture plays a big role here.
I've heard this aversion to unit tests a few times in my career, and I'm unable to make sense of it.
Sure, integration tests "save" you from writing pesky unit tests, and changing them frequently after every refactor.
But how do you quickly locate the reason that integration test failed? There could be hundreds of moving parts involved, and any one of them malfunctioning, or any unexpected interaction between them, could cause it to fail. The error itself would likely not be clear enough, if it's covered by layers of indirection.
Unit tests give you that ability. If written correctly, they should be the first to fail (which is a good thing!), and if an integration test fails, it should ideally also be accompanied by at least one unit test failure. This way it immediately pinpoints the root cause.
The higher up the stack you test, the harder it is to debug. With E2E tests you're essentially debugging the entire system, which is why we don't exclusively write E2E tests, even though they're very useful.
To me the traditional test pyramid is still the best way to think about tests. Tests shouldn't be an afterthought or a chore. Maintaining a comprehensive and effective test suite takes as much hard work as, if not more than, maintaining the application itself, and it should test all layers of the system. But if you do have that, it gives you superpowers to safely and reliably work on any part of the system.
> But how do you quickly locate the reason that integration test failed? There could be hundreds of moving parts involved, and any one of them malfunctioning, or any unexpected interaction between them, could cause it to fail. The error itself would likely not be clear enough, if it's covered by layers of indirection.
As long as the error is reproducible, never in my career have I had a hard time locating the source of the error. Bisection does wonders (as a general concept, not specifically referring to git bisect).
That said, I have encountered plenty of non-reproducible test failures. Moral of the story: make things reproducible, especially tests.
> I've heard this aversion to unit tests a few times in my career, and I'm unable to make sense of it.
It's very simple: most of the time people are told by management that they MUST achieve a 80-90-95% of code coverage (with unit tests), which leads to a lot of absolutely worthless tests - tests for the sake of it. The irony is that the pieces that really count don't get tested properly, because you unit-test the happy-path and maybe 1 or 2 negative scenarios, and that's it, missing out a bunch of potential regressions.
EDIT: This just to say that I don't believe the author of the comment said "don't write unit tests" (I hope not, at least!) but, if I can rephrase it, "well, the integration tests give you a better dopamine effect because they actually help you catch bugs". Which would be partially true also with properly written unit tests (and they would do so in a fraction of the time you need with integration tests).
> most of the time people are told by management that they MUST achieve a 80-90-95% of code coverage (with unit tests), which leads to a lot of absolutely worthless tests - tests for the sake of it
So strict rules from management in a company that likely doesn't understand software development, and lazy developers who decide to ignore this by intentionally writing useless tests, lead to thinking that unit tests and coverage are useless? That doesn't track at all.
I'd say that the answer is somewhere in the middle. If the company doesn't understand software development, it's the engineer's job to educate them, or find a better place to work at. It's also the engineer's job to educate lazy developers to care about testing and metrics like code coverage.
> if I can rephrase it, "well, the integration tests give you a better dopamine effect because they actually help you catch bugs"
And unit tests don't? I would argue that unit tests give you much more of that dopamine, since you see the failures and passes much more quickly, and there should be much more of them overall. Not that we should structure our work towards chasing dopamine hits...
I'd say that most of the people who advocate for this position haven't worked with a well tested codebase. Sadly, not all of us have the privilege of working with codebases like SQLite's, which go much beyond 100% line/statement coverage[1]. Is all that work in vain? Are they some crazy dogmatic programmers that like wasting their time? I would say: no. They just put a lot of effort and care in their product, which speaks for itself, and I would think makes working on it much safer, more efficient and pleasant.
I would also argue that the current state of our industry, and in turn everything that depends on software, where buggy software is the norm would be much better overall if that kind of effort and care would be put in all software projects.
> Sadly, not all of us have the privilege of working with codebases like SQLite's, which go much beyond 100% line/statement coverage[1]...
Linked SQLite page mentions this:
> 100% branch test coverage in an as-deployed configuration
Branch test coverage is different from line coverage and in my opinion it should be the only metric used in this context for coverage.
90-95% line coverage is exactly why many unit tests are garbage and why many come up with the argument "I prefer integration tests, unit tests are not that useful".
> Branch test coverage is different from line coverage and in my opinion it should be the only metric used in this context for coverage.
It's not different, just more thorough. Line and statement coverage are still useful metrics to track. They might not tell you whether you're testing all code paths, but they still tell you that you're at least testing some of them.
Very few projects take testing seriously to also track branch coverage, and even fewer go the extra mile of reaching 100% in that metric. SQLite is the only such project that does, AFAIK.
> 90-95% line coverage is exactly why many unit tests are garbage
Hard disagree. Line coverage is still a useful metric, and the only "garbage" unit tests are those that don't test the right thing. After all, you can technically cover a block of code, with the test not making the correct assertions. Or the test could make the right assertions, but it doesn't actually reproduce a scenario correctly. Etc. Coverage only tracks whether the SUT was executed, not if the test is correct or useful. That's the job of reviewers to point out.
> and why many come up with the argument "I prefer integration tests, unit tests are not that useful".
No. Programmers who say that either haven't worked on teams with a strong testing mindset, haven't worked on codebases with high quality unit tests, or are just being lazy. In any case, taking advice from such programmers about testing practices would not be wise.
If the test fails consistently (as it should) it is usually just a question of using a debugger and stepping through some suspect sections of the code to find the issue.
Compared to the amount of time saved by not rewriting unit tests every time you refactor stuff, it's a great trade-off.
> Sure, integration tests "save" you from writing pesky unit tests, and changing them frequently after every refactor.
The DB migration has "ADD COLUMN rateLimit INT"
The application class member is annotated with "@Column(name=”ratelimit”, nullable=true)"
The failure is at the interface between the app and the DB. What testcontainers does is allow you to write a quasi-unit test (not a truly full-blown integration test, but testing a small piece of functionality) across the boundary of two components. I am not aware of a way to reasonably unit test for this error. That might just be me -- seriously if there is a tried strategy for unit testing things like this I'd love to know it.
That's an _integration_ failure. We're talking about testing two entirely different things here. Of course you shouldn't expect to test integration scenarios in unit tests.
But you also shouldn't expect to test all unit scenarios in integration tests. This is what the "trophy testing" model advocates for. That somehow unit tests are not needed if you have thorough integration tests, which is entirely false.
They test the application at different layers, because they're meant to ensure behavior to different types of users. They're both useful for catching bugs and unexpected behavior, and they're meant to complement each other.
how do you handle resetting a sql database after every integration test? Testcontainers may help here by spinning up a new instance for every test but that seems very slow
I do this a lot for Postgres testing. In my setup, I create a single database for the entire test run. Each test creates its own schema in that database and applies the latest table definitions.
With this setup, I only eat the container creation once, while allowing every test to operate in isolation from one another, be parallelized, and test against a real database.
I do a similar trick for S3 containers by applying a unique guid prefix to the buckets in each test.
It depends on what you’re testing. Applying the schema is pretty fast (30-40ms), compared to the container creation (1-2 seconds). If you need a lot of test data it would take time, but most of the time Im only applying enough rows to hit my test conditions. For crud apps I usually orchestrate the test setup using the public APIs of my application against the fresh instance.
I once failed a take home assignment because of this. It was writing a couple of api endpoints and for testing, I focused on integration over unit. I even explained my reasoning in the writeup. There was no indication that the company preferred unit tests, but the feedback was "didn't have enough unit tests". What a dumb company.
Another technique I've found very useful is generative integration tests (kind of like fuzzing), especially for idempotent API endpoints (GETs).
For example, assuming you have a test database with realistic data (or scrubbed production data), write tests that are based on generalizable business rules, e.g: the total line of an 'invoice' GET response should be the sum of all the 'sections' endpoint responses tied to that invoice id. Then, just have a process that runs before the tests create a bunch of test cases (invoice IDs to try), randomly selected from all the IDs in the database. Limit the number of cases to something reasonable for total test duration.
As one would expect, overly tight assertions can often lead to many false positives, but really tough edge cases hidden in diverse/unexpected data (null refs) can be found that usually escape the artificial or 'happy path' pre-selected cases.
Running unit tests as integration tests will explode in your face. In any decent complex code base testing time will go through the roof and you will have a hard time getting the genie back in the bottle.
Testing that you actually run "sum()" is a unit test.
This is exactly the strategy I have discovered to bring the most value as well. And honestly, something that simplifies the setup of those containers is pretty great.
Yes, you just focus on a few high level behaviors that you want to validate, instead of the units. It’s more difficult to pull these tests off, as there are more chances for them to become flaky tests, but if they work they provide much more value.
I’d prefer a dozen well written integration tests over a hundred unit tests.
Having said that, both solve different problems, ideally you have both. But when time-constrained, I always focus on integration tests with actual services underneath.
Yeah - I find that sticking to tests like this means I don't have hundreds of tiny unit tests that rely on mocks, and it's still very supportive of refactoring - I can make some pretty big changes and be confident that I've not broken anything because a given request continues to return the expected response.
I didn't quite understand why this was made. We create our local test environments using docker-compose, and so I read:
> Creating reliable and fully-initialized service dependencies using raw Docker commands or using Docker Compose requires good knowledge of Docker internals and how to best run specific technologies in a container
This sounds like a <your programming language> abstraction over docker-compose, which lets you define your docker environment without learning the syntax of docker-compose itself. But then
> port conflicts, containers not being fully initialized or ready for interactions when the tests start, etc.
means you'd still need a good understanding of docker networking, dependencies, healthchecks to know if your test environment is ready to be used.
Am I missing something? Is this basically change what's starting your docker test containers?
Shows how you can embed the declaration of db for testing in a unit test:
> pgContainer, err := postgres.RunContainer(ctx,
> testcontainers.WithImage("postgres:15.3-alpine"),
> postgres.WithInitScripts(filepath.Join("..", "testdata", "init-db.sql")),
> postgres.WithDatabase("test-db"),
> postgres.WithUsername("postgres"),
> postgres.WithPassword("postgres"),
> testcontainers.WithWaitStrategy(
> wait.ForLog("database system is ready to accept connections").
This does look quite neat for setting up test specific database instances instead of spawning one outside of the test context with docker(compose). It should also make it possible to run tests that require their own instance in parallel.
And, to quote non-code text, you have to do it manually; there is no formatting operator and the code-indent method won’t work (unreadable at many browser widths). I tend to do it like so:
> *Paragraph one.*
> *Paragraph two. Etc.*
Which produces the desired effect:
> Paragraph ‘one’.
> Paragraph two.
(To use a * in a paragraph that’s italic-wrapped, backslash it.)
This seems great but is actually quite slow. This will create a new container, with a new postgres server, and a new database in that server, for each test. You'll then need to run migrations in that database. This ends up being a huge pain in the ass.
A better approach is to create a single postgres server one-time before running all of your tests. Then, create a template database on that server, and run your migrations on that template. Now, for each unit test, you can connect to the same server and create a new database from that template. This is not a pain in the ass and it is very fast: you run your migrations one time, and pay a ~20ms cost for each test to get its own database.
I've implemented this for golang here — considering also implementing this for Django and for Typescript if there is enough interest. https://github.com/peterldowns/pgtestdb
As a user of testcontainers I can tell you they are very powerful yet simple.
Indeed all they do is provide an abstraction for your language, but this is soo useful for unit/integration tests.
At my work we have many microservices in both Java and python, all of which use testcontainers to set up the local env or integration tests. The integration with localstack and the ability to programmatically set it up without fighting with compose files, is somewhat I find very useful.
Testcontainers is great. It's got seamless junit integration and really Just Works. I've never once had to even think about any of the docker aspects of it. There's really not much to it.
It’s not coming across in your comment, but Testcontainers can work with unit tests to start a container, run the unit tests and shutdown. For example, to verify database operations against the actual database, the unit test can start an instance of Postgres run tests and then shut it down. If running tests in parallel, each test can start its own container and shutdown at the end.
Wouldn't that just massively, _massively_ slow down your tests, if each test was spinning up its own Postgres container?
I ask because I really like this and would love to use it, but I'm concerned that that would add just an insane amount of overhead to the point where the convenience isn't worth the immense amount of extra time it would take.
A better approach is to spin up one container and a _template_ database before the tests. Apply migrations to that database. Then, each test creates its own database from the template, runs, and drops the database.
Tests can be run in parallel, and they are fast because the database is prepared just once, tests simply make a copy.
We're doing this in my company, I'm happy how it works.
Testcontainers are for testing individual components, apart from the application.
I built a new service registry recently, its unit tests spins up a zookeeper instance for the duration of the test, and then kills it.
Also very nice with databases. Spin up a clean db, run migrations, then test db code with zero worries about accidentally leaving stuff in a table that poisons other tests.
> Also very nice with databases. Spin up a clean db, run migrations, then test db code with zero worries about accidentally leaving stuff in a table that poisons other tests.
Are you spinning up a new instance between every test case? Because that sounds painfully slow.
I would just define a function which DELETEs all the data and call it between every test.
It supports both patterns (and variations in between). So you get to pick between isolation at a test level or if you want less overhead, rolling back the commit or other ways to cleanup.
Can only speak for the Golang version of the lib, but spinning up new instances was surprisingly quick.
I usually do one per suite with a reset method run before each test.
It's a decent compromise between performance and isolation, since weird interactions can only originate from the same suite, rather than anywhere in any test. Also permits parallel execution of db test suites.
This looks to be like just language specific bindings over the docker compose syntax. You're right that docker compose handles all of the situations they describe.
The major issue I had with docker compose in my CI environment is flaky tests when a port is already used by another job I don't control. With testcontainers, I haven't seen any false positive as I can use whatever port is available and not a hardcoded one hoping it won't conflict with what other people are doing.
Unless I'm mistaken, this is only a problem if you're forwarding ports from the Docker containers to the host machine, which isn't necessary if the test itself is running from inside a Docker container on the same bridge network as your dependencies. (Which compose will set up for you by default.)
I looked at testcontainers and ended up rolling my own version. One issue I had is that Docker is a very leaky abstraction. I needed to write one test and have it run in all these scenarios:
- on a Mac
- on a Linux VM
- in a Docker container on a Linux VM, with a Docker socket mounted
The networking for each of these is completely different. I had to make some opinionated choices to get code that could run in all cases. And running inside Docker prevented the test from being able to mount arbitrary files into the test containers, which turns out to be a requirement often. I ended up writing code to build a new image for each container, using ADD to inject files.
I also wanted all the tests to run in parallel and spit out readable logs from every container (properly associated with the correct test).
Not sure if any of these things have changed in testcontainers since I last looked, but these are the things I ran into. It took maybe a month of off and on tweaking, contrary to some people here claiming it can be done in an hour. As always, the devil is in the details.
edit: I did end up stealing ryuk. That thing can’t really be improved upon.
Many of the Mac networking specifics have become less of a problem. I use Rancher Desktop, which uses the correct virtualization framework based on OSX versions and allows you to customize the lima and virtual machine provisioning scripts so that you don't have cross-platform headaches like this (for the most part). Also runs kubernetes locally out of the box so you can test app deployments without waiting around for resources to free up on your shared dev cluster. Newest versions have almost everything Docker Desktop has. Highly recommend if you are on mac.
> No more need for mocks or complicated environment configurations. Define your test dependencies as code, then simply run your tests and containers will be created and then deleted.
Wait what? They think you don't need unit tests because you can run integration tests with containers?
It's trivial to set up a docker container with one of your dependencies, but starting containers is painful and slow.
1) At least in the Java world, the term "unit testing" is often confused by "things you do in JUnit", which runs both "pure" unit tests and project-level integration tests, i.e. spinning up an application context (like Spring) and testing against real REST endpoints etc.
2) While unit tests are cheaper and quicker than (project-level) integration tests, they also in many cases don't provide results as good a result and level of confidence, because a lot of run-time aspects (serialization, HTTP responses, database responses, etc.) are not as straightforward to mock. There's been some noise about The Testing Trophy, instead of the Testing Pyramid where, in short, there are still unit tests where it makes sense, but a lot of testing has moved to the (project-level) integration tests. These are slower, but only by so much that the trade-off is often worth it. Whether it's worth it, depends heavily on what you're testing. If it's a CRUD API: I use integration tests. If it's something algorithmic, or string manipulation, etc.: I use unit tests.
When I saw the Testing Trophy presented, it came with the asterisk that (project-level) integration testing has gotten easier and cheaper over time, thus allowing a shift in trade-off. Testcontainers is one of the primary reasons why this shift has happened. (And... I respect that it's not for everyone.)
Yeah, certainly I see the value of those kinds of tests. And clearly as you say the simpler tests don't provide as realistic a simulation as the more expensive tests.
But on the test philosophy angle, my take on what's happening is just that developers traditionally look for any reason to skip tests. I've seen this in a few different forms.
- right now containers make it trivial to run all of your dependencies. That's much easier than creating a mock or a fake, so we do that and don't bother creating a mock/fake.
- compiler folks have created great static analysis tools. That's easier than writing a bunch of tests, so we'll just assume static analysis will catch our bugs for us.
- <my language>'s types system does a bunch of work type checking, so I don't need tests. Or maybe I just need randomly generated property tests.
- no tests can sufficiently emulate our production environment, so tests are noise and we'll work out issues in dev and prod.
What I've noticed, though, is that looking across a wide number of software projects is there's a clear difference in quality between projects that have a strong testing discipline and those that convince themselves they don't need tests because of <containers, or types, or whatever else>.
Sure it's possible that tests don't cause the quality difference (maybe there's a third factor for example that causes both). And of course if you have limited resources you have to make a decision about which quality assurance steps to cut.
But personally I respect a project more if they just say they don't have the bandwidth to test properly so they're just skipping to the integration stage (or whatever) rather than convince themselves that those tests weren't important any way. Because I've seen so many projects that would have been much better with even a small number of unit tests where they only had integration tests.
Sounds like some kind of protestant work ethic mentality: testing should be hard work, the harder writing your tests was the better your soul and the better your system.
I've seen plenty of projects that made oodles of mocks and fakes and unit tests and just sucked, outright didn't work at all in a way that would've been obvious if they'd done testcontainers-based integration tests or even just manual testing in prod. I would absolutely trust a project that was written in Haskell and had no tests, or only integration tests, ahead of one that had lots of unit tests. Indeed if anything I'd say the number of mocks/fakes is negatively correlated with the actual project quality.
Just to add, there's also a (Chicago) school of thought that pushes back against mocks and fakes, so even if you're religiously (to stick with the metaphore) writing unit tests, you might still not invest in mocks and fakes.
They might mean that rather than using a mock, use a real typed object/instance of a real thing and inject it into the unit that you’re testing. Admittedly, that might meet the definition of a fake/mock once you get down to the level of testing something that needs db access. Another way of interpreting that is that you can use in memory versions of your deps to mirror the interface of your dependency without needing to repeatedly, and possibly haphazardly mock certain functions of your dependency.
I think testing is very important. But it's very hard to test everything. (It is not hard to get 100% test coverage by some metric, but that does not mean that all scenarios or even the most useful ones are covered.) So it's an economics game: how can you get the most value for the least amount of money? Or if you want me to rephrase that in a more-positive way: how can you get the most value out of the time that you have available? And I contend that a shift "up" in the pyramid (at which time it looses that shape, hence the "testing trophy") is where the current sweet spot lies. You have to use the tools that you have.
I have been doing E2E testing exclusively for close to a decade on several apps and it works great.
Note, not integration, E2E. I can go from bare vm to fully tested system in under 15 minutes. I can re run that test in 1-5 (depending on project) ...
Im creating 100's of records in that time, and fuzzing a lot of data entry. I could get it to go "even faster" if I went in and removed some of the stepwise testing... A->B->C->D could be broken out to a->b, a->c, a->d.
Because my tests are external, they would be durable across a system re-write (if I need to change language, platform etc). They can also be re-used/tweeked to test system perf under load (something unit tests could never do).
No mocks doesn't mean no tests. It means running tests against the full code path which includes requests to running instances of the services you might otherwise mock. For many apps and use cases, the overhead in managing container state is worth it.
> It means running tests against the full code path which includes requests to running instances of the services you might otherwise mock.
Yeah, those are called end to end tests and you run them after integration tests which you run after unit tests. It sounds to me like they're saying just skip to the end to end tests.
> For many apps and use cases, the overhead in managing container state is worth it.
Yeah, and typically you'd run them after you run unit and integration tests. If I have 10 libraries to test that have database access, I have to run 10 database containers simultaneously every few minutes as part of the development process? That's overkill.
> Yeah, and typically you'd run them after you run unit and integration tests. If I have 10 libraries to test that have database access, I have to run 10 database containers simultaneously every few minutes as part of the development process? That's overkill.
If it's actually causing you problems, then by all means replace some of them with more lightweight tests, at the cost of some test environment faithfulness. But don't optimise prematurely.
Testcontainers is awesome and all the hate it gets here is undeserved.
Custom shell scripts definitely can't compete.
For example one feature those don't have is "Ryuk": A container that testcontainers starts which monitors the lifetime of the parent application and stops all containers when the parent process exits.
It allows the application to define dependencies for development, testing, CI itself without needing to run some command to bring up docker compose beforehand manually.
One cool usecase for us is also having a ephemeral database container that is started in a Gradle build to generate jOOQ code from tables defined in a Liquibase schema.
I dont understand how this is better than a docker-compose.yml with your dependencies, which plays nicer with all other tooling.
Especially if there are complex dependencies between required containers it seems to be pretty weak in comparison. But i also only used it like 5 years ago, so maybe things are significantly better now.
One specific case that I encountered recently was implementing "integration" tests, where I needed to test some behavior that relies on the global state of a database. All other tests before were easily parallelized, and this meant our whole service could be fully tested within 10-30 seconds (dev machine vs. pipeline).
However, the new tests could not be run in parallel with the existing ones, as the changes in global state in the database caused flaky failures. I know there will be other tests like them in the future, so I want a robust way of writing these kinds of "global" tests without too much manual labor.
Spinning up a new postgres instance for each of these specific tests would be one solution.
I would like to instead go for running the tests inside of transactions, but that comes with its own sorts of issues.
Because you may want to spin up a new postgres database to test a specific scenario in an automated way. Testcontainers allows you to do that from code, for example you could write a pytest fixture to provide a fresh database for each test.
I don't know about Postgres but a MySQL container can take at least a few seconds to start up and report back as healthy on a Macbook Pro. You can just purge tables on the existing database in between tests, no need to start up a whole new database server each time.
Testcontainers integrates with docker-compose [1]. Now you can run your tests through a single build-tool task without having to run something else and your build. You can even run separate compose files for just the parts your test touches.
Agreed and in my experience libraries like this perpetuate that anti-pattern. Inexperienced developers think because there's a library that enables it, it must be OK, right?
Low bid contractors will probably use this library to pump their code coverage numbers. Some of the shit shops I have worked at that hired lowest bid contractors have done some shady shit to meet “management expectations”.
honest question: how are you writing integration tests? We are writing these as separate test suite often with the same test style. And in this scenario testcontainers are very valuable.
Really, you use testcontainers so that you can manage everything for your test with a single build command, instead of running something extra, then running your tests, then shutting down your docker containers. Plus, with it integrated into your test suites, you can run code against your docker containers on setup/teardown, before/after container start, before/after each test, etc.
Meanwhile docker compose selling point: 'because you don't have to muck around with testcontainers; I guess some people might find that more attractive'.
Oh, absolutely! And as the other guy pointed out, docker-compose can be quite reusable when developing locally if you write it right.
But at $WORKPLACE we often use pytest-xprocess to start the required app in the same container where the tests run. It's probably the easiest way mostly because a custom wrapper does all the heavy lifting (starts the app, checks that it is running and responding to requests before the tests start, correctly terminates it when tests end).
Arguably you're no longer testing a unit if the unit involves an integration with an external component, making it an integration test per definition.
Integration tests are fine, but they test something else - that your component integrates as intended with <something>, while a unit test moreso tests that your unit behaves in accordance with its specification.
I've rarely found this to be worth it, for the effort required for a proper mock, in a complex system. I've seen most people mock in ways that are so superficial that it's basically a no-op.
Mocks are a contentious topic as you've probably guessed. In my opinion they're a sign of coupled code, you should be able to hit very high coverage without a single mock, but if you're a dev in an org that tracks code coverage you'll probably end up writing a fair number of them since the odds are high you'll be consuming coupled code.
If you have a dependency like a third party API (or even internal code), and you write an API client, then depend on that client, would it be considered couple code?
In such cases, if I am using dependency injection and creating a (stub?) versions of that client which returns a hardcoded or configured output, would that be considered a mock? OR would this be OK and not "coupled"?
Most people will say something like for unit tests you should test your functions by passing the state as parameters to test. I'm going to call this "outside in" loose coupling.
Mocking is for the inverse. When you want to test a unit of code that is calling some other outside unit code. Its really not any different just "inside out".
So imo with DI you gain loose coupling through dependency inversion. But because of dependency inversion you need to mock instead of passing state as params.
So I think if you are injecting a mocked stub this is still loose coupling because you are testing against its interface.
You're still passing state through your test but its coming from inside instead of outside, hence the mock.
Another way I have thought about this is: framework (framework calls you) vs library (you call library).
Frameworks naturally lend themselves to a more mock way of testing. Library lends itself to a more traditional way of testing.
Testing something that accepts a callback is also essentially a mock.
A good rule of thumb for a unit test is that you should be able to run it a few thousand times in a relatively brief period (think: minutes or less) and it shouldn't ever fail/flake.
If a unit test (suite) takes more than a single digit number of seconds to run, it isn't a unit test. Integration tests are good to have, but unit tests should be really cheap and fundamentally a tool for iterative and interactive development. I should be able to run some of my unit tests on every save, and have them keep pace with my linter.
This makes no sense tho. Simple example, your code needs to reach into Cosmos / DynamoDB, why mock this service when u can get so much wrong by assuming how things work?
Mocking doesn't mean you have to reimplement the fully featured service. In the simplest form your internal library which calls out to Cosmos is mocked, the mock records the request parameters and returns ok, and the test verifies that the expected data was passed in the call.
Then you're testing the implementation and need to change the test and mocks every time the implementation changes.
Making stuff quicker is a good reason to mock stuff. So is not hitting real network services. But, in all cases, the best thing is to avoid mocking if possible.
Why do you care how Cosmos or DynamoDB or any other dependency is implemented? You only need to mock the interface to these services. Their internal code can change every day without affecting your tests.
And if you want to catch potential changes in Cosmos that modify the behavior of your own service, that isn't the purpose of unit tests.
I want to be able to update to the latest version of DynamoDB (or something else - not every dependency is as stable as DynamoDB) and know that all of my code that calls it still works.
Its great but I find it harder to debug. And I have to say, I usually dont need it. Typically i just have some command which spins everything up from docker-compose files. I prefer this over putting configuration in code like you often do with test containers. You can also load from docker compose files but at that point the test container API isn’t really doing much.
Its pretty much required when you want to setup/teardown in between tests though. This just usually isnt the case for me.
But I need it to catch the bugs you commit to CI so you can fix them right away instead of letting me catch them and report them and wait wreck my productivity.
(this is of course not directed at you personally, feel free to replace you/I/me with whatever names you can imagine!)
I've been using Docker containers for integration testing for last few years. I usually roll my own custom solution in Go using the https://github.com/ory/dockertest package though that adds necessary functionality around running migrations, creating kafka topics, or similar. Will definitely need to check out this next time I'm writing a test package
It's pretty neat - it depends on testcontainers-core, sqlalchemy and psycopg2-binary and then defines a PostgresContainer class which fires up a "postgres:latest" container and provides a helper function for getting the right connection URL.
You can do something similar with docker compose, driving the system from the outside. Create dockerized versions of dependencies like the database, build and run tests, and then run tests against the production app container.
It's particularly useful for testing a set of microservices.
We use docker environments like this for tests, but it does have its issues.
You often need to add custom behavior like waiting for the app to load and start serving, healthchecks, etc. Having it all in code is pretty useful, and it's self-contained within the code itself vs having to set up the environment in different places (CI, Github actions, local dev, etc).
The negative is that code isn't portable to prod, it doesn't test your environment as well (important for staging), and you're missing out on sharing some environment settings.
I feel like it definitely has its place in the stack and in certain companies.
This is a very nice project, but it's awful that they're blurring the line of unit test and integration test. They are very different and both very important.
Things in the software world are very trendy. If this starts a trend of making people think that they're writing unit tests when they are writing integrations tests, we are fucked.
If I need to change code that you wrote I need a lightning fast way to figure out that I haven't broken your code according to the tests that you wrote. That's unit tests.
My changes might break the whole system. That's integration tests. I just to run that once and then I can go back to unit tests while I fix the mess I've made.
There is a services-flake module allowing you to spin the entire nammayatri stack (including postgres, redis, etc.) using a flake app. Similarly, there's one for running load test, which is also run in Jenkins CI.
A unit test with real dependencies is by definition an integration test isn’t it?
I guess the unit in “Unit Test” is a bit subjective but every place I have worked we wrote both unit and integration tests that lived side by side. Integration tests used Test Containers and Unit Tests were isolated with mocks. So, only functional difference was the “unit” under test being more, or less, isolated.
I read through the docs and am still confused about what this actually does beyond running a single docker run command in the background and returning control to your code when the container is up.
Reading through the comments, I'm quite shocked to see how many deterrent conversations are happening without any understanding of the underlying tech stacks being tested. Testcontainers can be fantastic, especially when you are facing test environment overhead challenges, assuming you have the appropriate architectures / service boundaries to support it. I believe there is more code out there in existence with architectures that make using Testcontainers more challenging than it is worth.
I see testcontainers being used in tests making the test code style feel more like typical unit tests with fake implementations for system components. Which is misleading as these are more on the integration testing side typically. In essence this is another DSL (per language) for managing containers locally. And this DSL comes in addition to whatever system is actually used for managing containers in production for the project.
Something that improved developer experience by far and also sped up our builds is starting the container dependencies via docker-compose and connect to it for integration testing. This allows reuse of containers, you can connect to it after/during an integration test to debug without having to keep searching for ports constantly.
With TestContainers - I've perceived that running integration tests / a single test repeatedly locally is extremely slow as the containers are shut down when the java process is killed. This approach allows for this while also allowing to keep it consistent - example, just mount the migrations folder in the start volume of your DB container and you have a like-for-like schema of your prod DB ready for integration tests.
I think it’s a step in the right direction. There’s probably a few uses cases where the dependent system is configured much differently in your “test container” vs production instance. But if the aim is to programmatically spin up dependencies and provide 99% guarantee that your app/workflows will work, then this seems it can do the job.
One small note: test run time will probably increase. If a person has an outdated computer, I suspect they will have a hard time running the IT suite. Especially if it’s a complicated system with more than one dependency.
> Each test gets a fresh, clean instance of the browser, without having to worry about variations in plugins or required updates.
Except where everyone is saying that's too slow and instead they have a long-lived instance which they manually teardown each time. That's even what the examples do (some, at least, I didn't check them all).
If you've already bought into the container world then why not embrace a few more. For everyone else, not sure there's much point in extra complexity (they call it simplicity) or bloat.
I found test containers to be slow to startup last year. It wasn’t worth the effort considering how long it took to run compared to traditional spring IT h2 hibernate.
If you build inside docker, running tests that use docker is a pain.
Go has a lot of in-memory versions of things for tests, which run so much quicker than leaning on docker. Similarly, I found C# has in-memory versions of deps you can lean on.
I really feel that test containers, although solving a problem, often introduces others for no great benefit
That's an integration test. These are integration tests. You're literally testing multiple units (e.g., Redis, and the thing using Redis) to see if they're integrating.
Why do we even have words.
These are valuable in their own right. They're just complicated & often incredibly slow compared to a unit test. Which is why I prefer mocks, too: they're speedy. You just have to get the mock right … and that can be tricky, particularly since some APIs are just woefully underdocumented, or the documentation is just full of lies. But the mocks I've written in the past steadily improve over time. Learn to stop worrying, and love each for what they are.
(Our CI system actually used to pretty much directly support this pattern. Then we moved to Github Actions. GHA has "service containers", but unfortunately the feature is too basic to address real-world use cases: it assumes a container image can just … boot! … and only talk to the code via the network. Real world use cases often require serialized steps between the test & the dependencies, e.g., to create or init database dirs, set up certs, etc.)
> GHA has "service containers", but unfortunately the feature is too basic to address real-world use cases: it assumes a container image can just … boot! … and only talk to the code via the network. Real world use cases often require serialized steps between the test & the dependencies, e.g., to create or init database dirs, set up certs, etc.)
My biased recommendation is to write a custom Dagger function, and run it in your GHA workflow. https://dagger.io
If you find me on the Dagger discord, I will gladly write a code snippet summarizing what I have in mind, based on what you explained of your CI stack. We use GHA ourselves and use this pattern to great effect.
I tried to use Testcontainers just last week but ended up using simple docker commands instead. I didn’t find an easy way to connect an already running set of containers started via docker compose. Was straightforward to do with a set of scripts that just call docker exec.
This looks pretty useful! Question on the nginx container. For my tests, this container is only useful when files are mounted into the container. For example, it needs an nginx.conf passed in. How do I do this with NginxContainer?
Somewhat related: anyone here using AWS Neptune graphql database? How do you develop locally against Neptune? Apart from Localstack, is there a way to mock Neptune for local testing and development?
My team maintain a lot of flink connectors, We've changed external test resources to testcontainer as much as possible, it makes things simple and saves money as well.
Integrated this in an afternoon. It was surprisingly simple and works great locally and also inside GitHub actions to do psql integration tests of our core app.
If your running podman you should pull your production deployment configs down and tweak those. You will get a much more complete env that way (routing, network, scale, load balance)
Some tests might have side effects. Probably not a great idea to test the function “bill customer” on a prod deployment. That’s why containers for testing is great—it’s easy to spin up an environment that can be messed around with without consequences (even if things go wrong or your tests have side effects).
I'm surprised this is getting so much attention. I thought this just standard practice at this point? If you use things like Gitlab CI then you get this via the `services` in your pipeline. The CI job itself runs in a container too.
I use a very similar thing via pytest-docker: https://github.com/avast/pytest-docker The only difference seems to be you declare your containers via a docker-compose file which I prefer because it's a standard thing you can use elsewhere.
I never really liked testcontainers. Too complicated. And I don't want to have my tests make too many assumptions about what level of control there is over their environment. IMHO it's just the wrong place to be messing with docker.
I don't like layering abstractions on top of abstractions that were fine to begin with. Docker-compose is pretty much perfect for the job. An added complexity is that the before/after semantics of the test suite in things like JUnit are a bit handwavy and hard to control. Unlike testng, there's no @BeforeSuite (which is really what you want). The @BeforeAll that junit has is actually too late in the process to be messing around with docker. And more importantly, if I'm developing, I don't want my docker containers to be wasting time restarting in between tests. That's 20-30 seconds I don't want to add on top of the already lengthy runtime of compiling/building, firing up Spring and letting it do it's thing before my test runs in about 1-2 seconds.
All this is trivially solved by doing docker stuff at the right time: before your test process starts.
So, I do that using good old docker compose and a simple gradle plugin that calls it before our tests run and then again to shut it down right after. If it's already running (it simply probes the port) it skips the startup and shut down sequence and just leaves it running. It's not perfect but it's very simple. I have docker-compose up most of my working day. Sometimes for days on end. My tests don't have to wait for it to come up because it's already up. On CI (github actions), gradle starts docker compose, waits for it to come up, runs the tests, and then shuts it down.
This has another big advantage that the process of running a standalone development server for manual testing, running our integration tests, and running our production server are very similar. Exactly the same actually; the only difference configuration and some light bootstrapping logic (schema creation). Configuration basically involves telling our server the hosts and ports of all the stuff it needs to run. Which in our case is postgres, redis, and elasticsearch.
Editing the setup is easy; just edit the docker compose and modify some properties. Works with jvm based stuff and it's equally easy to replicate with other stuff.
There are a few more tricks I use to keep things fast. I have ~300 integration tests that use db, redis, and elasticsearch. They run concurrently in under 1 minute on my mac. I cannot emphesize how important fast integration tests are as a key enabler for developer productivity. Enabling this sort of thing requires some planning but it pays off hugely.
We use [kubedock](https://github.com/joyrex2001/kubedock) to run testcontainers in kubernetes clusters. As long as you're only pulling the images, not building or loading them (explicitly not supported by kubedock), it works pretty well.
Why'd you run them in kubernetes? Seems like extreme overkill for launching a short lived container for an integration test. What could kubernetes possibly add to that?
Because we are a big company and would like to utilize resources better.
We also want homogeneity in tech when possible (we already heavily use kubernetes, we don't want to keep docker hosts anymore).
Teams of testers need to be accounted in terms of resource quotas and RBAC.
What exactly do you see as an overkill in wanting to run short-lived containers in kubernetes rather than in docker (if we already have kubernetes and "cook" it ourselves)?
That reasoning seems more like one from policy/cargo cult rather than reasoning specific to your org. For something short lived and meant to be isolated I wouldn't want to subject them to even more infrastructural dependencies outside their control.
It's overkill because these containers typically have a lifetime counted in single digit seconds, and it takes kubernetes not only more time but also more compute resources to decide where to allocate the pod than to actually just run the thing.
been doing this for years, I would not say this gets rid of testing though.
Running integration tests are significantly more complicated to write and take longer to run.
There is also race conditions present that you need to account for programmatically.. Such as waiting for a db to come up and schema to be applied. Or waiting for a specific event to occur in the daemon.
That being said, this looks like a decent start. One thing that seems to be missing is the ability to tail logs and assert specific marks in the logs. Often you need to do an operation and wait until you see an event.
Testcontainers is the library that convinced me that shelling out to docker as an abstraction via bash calls embedded in a library is a bad idea. Not because containerization as an abstraction is a bad idea. Rather it’s that having a library that custom shell calls to the docker CLI as part of its core functionality creates problems and complexity as soon as one introduces other containerized workflows. The library has the nasty habit of assuming it’s running on a host machine and nothing else docker related is running, and footguns itself with limitations accordingly. This makes it not much better than some non dockerized library in most cases and oftentimes much much worse.