I am curious how much effort goes into creating and maintaining unit tests and fuzzing tests. Sometimes it takes longer / more lines of code to write thorough tests than it does to implement the core feature.
At that point, is it worth the time invested? Every new feature can take 2-3 times longer to deliver due to adding tests.
In the case of curl, the cost/benefit analysis is probably skewed by it being deployed on a massive scale, such that any bugs in curl have an unusually large impact. If your company's in-house CRM has an exploitable bug, that's bad but the impact is just your company. If libcurl has an exploitable bug, that's millions of devices affected.
Code that isn't tested, isn't done. Tests not only verify the expectations, but also prevent future regression. Fuzzing is essential for code that accepts external inputs. Heartbleed was discoverable with a fuzzer.
>Code that isn't tested, isn't done. Tests not only verify the expectations, but also prevent future regression
A few weeks ago someone at the company implemented a feature without tests. It worked perfectly, everyone was happy. A release of another new feature by a different team a few days later broke the previous feature :)
True but cURL is in a very different situation compared to your average run-of-the-mill program. It's one of the most massively deployed open-source software out there. When you are not beholden to capricious shareholders and incompetent managers you can actually focus on quality.
For libraries, tools, and frameworks, testing is crucial as it ensures that the code relying on them can address the issue at hand. Code can only be as reliable as what it's leaning on. So, to answer your question, a lot of time.
In a business-oriented project(at most jobs), code may undergo frequent changes due to requests from business, thus too much focus on testing could potentially slowing down development speed if extensive testing is implemented for each change.
However, regression tests can still provide valuable insights and allow for faster development later in the life of the project.
While many projects only focus on happy path testing, the use of such tests might not be as high. Coupling them with Negative Testing, and even better, implementing boundary testing, compels developers to consider both valid and invalid inputs, helping to identify and address potential edge cases before they become bugs or security issues in production.
For instance, this [0] codebase has more tests than actual code, including fuzzing tests.
There's a tradeoff for sure, but to fully evaluate that tradeoff you must also take into account the future time spent on the code base, including the amount of time needed to debug future features, adopt a new build environment, allow new team members to muck around and see what does and does not work...
If it's a small project that won't be extended much, then perhaps then re-runnable unit tests may not make the bar as a net positive tradeoff. But some time must still be spent testing the code, even if it's not tests that are written in code.
It depends on how risk tolerant you are. If you are at a startup, the growth of your startup is often largely dependent on how quickly you can add features / onboard customers. In that context, writing tests not only slows you down from adding new features, it might make it harder to modify existing features. In addition, early customers also tend to be accepting of small bugs -- they themselves already have "taken a risk" in trusting an early startup. So testing is not really valuable -- you want to remain agile, and you won't lose much money because of a bug.
On the other hand, if you are Google, you already have found a money-printing firehose, and you /don't/ want to take on any additional unnecessary risk. Any new code needs to /not/ break existing functionality -- if it does, you might lose out of millions of revenue. In addition, your product becomes so large that it is impossible to manually test every feature. In this case, tests actually help you move /faster/ because they help you at scale automatically ensure that a change does not break anything.
While cURL does not make any money, it is solidly on the mature/Google end of the testing spectrum. It has found a footing in the open source tooling "market" and people rely on it to maintain its existing functionality. In addition, it has accumulated a fairly large surface area, so manual testing is not really feasible for every feature. So testing similarly helps cURL developers move faster (in the long run), not slower.
> As of version 3.42.0 (2023-05-16), the SQLite library consists of approximately 155.8 KSLOC of C code [...] the project has 590 times as much test code and test scripts - 92053.1 KSLOC.
I once found a confirmed bug in SQLite one time, and it’s the highlight of all my OSS bug reports.
Interestingly enough I found it while fuzz testing a query builder and cross testing it against PostgreSQL, MySQL and an in-memory filtering engine I wrote.
What the fuck. If they're investing that much then why don't they just go straight to formal verification. This is what things like frama-c (or whatever's popular now) are for.
The problem is that effort to do formal verification goes exponential beyond a certain point.
seL4 is around 10-12 KLoC, and it took a decade of effort from multiple people to make it happen.
At the size of SQLite, especially where they have to operate on platforms with different behavior (as an OS, seL4 is the platform), formal verification is just too much effort.
All that said, your reaction is totally understandable.
There's also an interesting thing where formal verification requires a formal specification, which afaik there isn't one for SQLite. One of the toughest problems that someone would run into trying to put together a formal specification for code as widely deployed as SQLite boils down to Hyrum's Law[1]: on a long enough time scale, all observable behaviours of your system become interfaces that someone, somewhere depends on.
That massive suite of test cases isn't a formal specification but given that it achieves 100% branch coverage that implies to me that it:
- pretty tightly bounds the interface without formally specifying it
- also pretty tightly constrains the implementation to match the current implementation
Which, when you have as many users as you do with SQLite, is probably a fair way of providing guarantees to your users that upgrading from 3.44.0 to 3.45.1 isn't going to break anything you're using unless you were relying on explicitly-identified buggy behaviour (you'd be able to see the delta in the test cases if you looked at the Fossil diffs).
Also a formal specification can have bugs. Formal verification checks that the code matches the spec, not that the spec implements all desired behaviors and no undesired behaviors.
You're right, though it's worth noting that you can also use formal verification to verify properties about the specification! For example you can verify that a security system doesn't allow privilege escalation.
Yep, though of course it's still on you to remember to verify everything you care about. Actually writing out all your requirements in a formal (or just programming) language is the hard part of programming.
Because formal verification doesn't scale at the moment. There are a relatively few number of experts on formal method around the globe and some of them need to work on SQLite indefinitely unless you're okay with just one-off verification. Frama-C has a different background because it's from Inria, the institution with many formal method experts.
The best-case IMO is a test suite where writing the test is almost as easy (or maybe easier?) than doing the equivalent manual test. The test code is maybe 2–4x longer than the change.
The worst case is… arbitrarily bad, to the point of being impossible. Test setup is hell because there are too many dependencies. You have to mock the “real world” since (again) things depend on other things too much and you can’t really do a simple string-manipulating change without setting up a whole environment. Also you introduce bugs in your tests that take longer to debug than the real change you are making.
What I feel that we (current project) have fallen into the trap of is that there wasn’t a test suite from the start. Just manual testing. Then when you get to a stage where you feel you need one, the code base is not structured to make automated testing simple.
I would generally double whatever your expectations are for the initial feature development. Tests are essentially a second implementation from a different angle running in parallel, hoping the results match. Every feature change means changing 2 systems now. You save a bit of subsequent time with easier debugging when other features break tests, but that's somewhat eaten up by maintaining a system twice the size.
There are reasons many MVP developers and small teams whose focus is more on rapid feature implementation than large team coordination or code stability forego writing tests. It doesn't make sense in all circumstances. Generally, more complex, less grokable, more large-team-oriented or public library code is when you need testing.
Tests are only a second implementation if you use test doubles incorrectly. Test doubles should only be used for I/O outside of the program under test that you can’t really run locally / is a network dependency (eg mocking a SaaS service or something) or for performance (mocking database responses vs spinning up a test database instance). If you do it write, most of your tests are just testing each layer and everything below it.
I have yet to see a case where omitting tests actually helps you move meaningfully faster - you’re probably generating more heat than light and that makes you feel like you’re moving faster.
I guess I'm referring here to the unit test standards that seem typically employed by automated code review frameworks - i.e. "100% coverage" checkers. With those, any subsequent change to the system requires understanding and modifying both the original code and the set of tests and seems to end up being around double the effort. It's not actual duplication - but mocking expected inputs/outputs takes on its own system (and often programming language in most frameworks) which is not always trivial. You may be referring to something different though.
In those situations, yes I stand by them being a costly overhead - which makes sense in large collaborative systems, but not so much in small agile MVPs.
Generally speaking, yes. Its not like most code isn't changed as a consequence of writing those tests, so the practice has immediate benefit, but future changes can be made swiftly and securely due to the confidence those tests should be giving you.
But also your time estimate does sound wonky, 2-3x sounds extreme. Maybe you need to improve your test writing process?
It’s the 101 of talking about testing, it’s been discussed to the death. As usual, “it’s a trade-off”.
You’re right that writing tests can make shipping a feature slower. However, already having tests makes shipping a feature faster (higher confidence it works, less manual testing of the feature and the adjacent ones). It also lowers the amount of bugs, which are going to slow down delivery of new features. They also increase maintenance cost though (tests are code that needs maintenance too).
Is it worth it? Depends, although it’s quite rare that no test at all is the right trade off
It's an economical decision. If you're developing a new indie game with $10k revenue expectation in its lifetime, then probably it's not worth your time. But if it's a core infrastructure of multi-billion dollar business, then yes it's worth your time since any non-trivial security incidents may cost more than your annual salary.
Fuzzing isn't really a "2-3 times longer" thing. You have to set it up once (which for basic fuzzing is quite easy), you throw compute power at it, and it turns up bugs for you. As you make changes to the project the fuzzer will explore new paths mostly automatically.
You may want to spend some time looking at code coverage and doing some of the advanced things outlined in this article, especially for very high risk / reward projects like Curl, but even that is not a lot of work.
> Every new feature can take 2-3 times longer to deliver due to adding tests.
alternate framing: you can have a feature done in half or a third of the time, if you don't care whether it actually works or not
...or whether it will continue working, as other new features are added to the system and the codebase evolves and grows. that sort of maintainability is one of the key things you get from investing in writing tests - you can refactor without fear of breaking some existing feature.
it also allows you to hire engineers and give them a "safety net" where they can make changes and be confident they're not breaking some crucial functionality of the unfamiliar codebase.
done correctly (an important caveat, because lots of people do testing poorly and then conclude that all testing is bad) that time spent writing tests is not wasted effort. cutting corners by skipping tests is very much a false economy in the long term.
At that point, is it worth the time invested? Every new feature can take 2-3 times longer to deliver due to adding tests.