Hacker News new | past | comments | ask | show | jobs | submit login
How did software get so reliable without proof? (surfingcomplexity.blog)
185 points by azhenley on March 3, 2020 | hide | past | favorite | 130 comments



How does software get reliable at all, ever? In my experience there are two ways:

1. You write reliable software from the start. It feels a lot like writing a mathematical proof, or at least sketching one. After a few dumb bugs are fixed the code continues to work without issue, year after year, and nobody ever looks at it again. (This isn't as hard as it sounds! It's not the right approach for everything, but it should be more widely used than it is.)

2. You write normal software, i.e. unreliable crap, but you look at the ways that it goes wrong and fix them. This requires that you put some work into detecting that things have gone wrong. A lot of software companies will trace all crashes, and aggregate stack traces and logs from them. In this kind of environment, you can improve software reliability by sticking a paranoid number of assertions in your code and obsessively checking to see what triggered them. After a while all the common problems will have been removed, and the remaining bugs will be less prevalent than hardware failures -- and then you can declare victory.


From my experience, the problem with no. 1 is that, in order to build something correct, you have to know what you need to build, and that is a can of worms (actually, a snakes nest) that too many people refuse to address.

The problem with no. 2, on the other side, is that it will eventually get correct, but you cannot predict when it will be reliable. And the same people who refuse to sign requirement documents are the same people who insist on having arbitrary deadlines.


> The problem with no. 2, on the other side, is that it will eventually get correct

Actually, it doesn't necessarily get correct. There are old programs out where the ratio of bugs fixed by a modification / bugs introduced is less than 1. For small systems this never becomes a problem, but huge complex systems it really is a thing.

The only way I know of to prevent it from happening is unit tests with excellent coverage and CI.


The sad thing is most companies I've seen don't do either 1 or 2. They do:

3. Your write normal software, i.e. unreliable crap, and you fix only the bugs that block ship. Once you ship, you either:

3a. Don't touch it. Move on to a different project, leaving the software as perpetual crap, or

3b. Cram in more features, only fixing new bugs that prevent shipping those new features. Repeat 3b until nobody can maintain it anymore, then blame some nebulous "technical debt" for why we need a rewrite.


I think what you're describing here may be more about which domains are being modeled by software than it is about approaches to writing the software.

You can't have a program that's like a mathematical proof if the concept for the program isn't itself like a mathematical structure.

I agree in essence with your statement that the first approach should be more widely used. What that comes down to though is that the set of program concepts that do resemble mathematical structures is larger than most programmers realize—but this a separate question from whether that set is large at all relative to the totality of software being written.

I'm pretty sure that, "It's not the right approach for everything," is putting it far too mildly. And the dichotomy designating software which isn't of this kind as "crap" is taking a shallow view of things: it's because the problem is more complex/irregular/unpredictable that corrective feedback must be relied on over a compact explicit model. The software isn't crap, the problem is just hard.

I think we'd agree that whether the label "crap" is appropriate or not depends on whether it falls into that set of programs where 1) it is amenable to compact mathematical representation but 2) feedback methods were used instead.

This is the set of Genuine Crap. Problem is, there is (ime) a much larger set of only apparent crap, which takes on that appearance as a direct consequence of the corrective feedback process.

All of this leads to a state of affairs where through conflation of Apparent Crap with Genuine Crap, a perception has been created that normal software engineers are mostly doing it wrong in some way and should feel bad.

I would encourage people to stave off that view until it's more definitively established that the sets of Apparent Crap and Genuine Crap actually have a large intersection.

And maybe more importantly: it should be considered whether the crappiness truly derives from engineering decisions or business requirements.

(I don't mean to rant too much at your comment specifically pjscott, it's just that this is a frequently expressed sentiment that I get tired of hearing and which seems to be possibly mistaken.)


There is a right approach for everything, That is the "Warehouse/Workshop Model" commonly used in industry. Enable software to be produced like integrated circuits.

https://github.com/linpengcheng/PurefunctionPipelineDataflow


In my experience the majority of enterprise service applications and smaller few-purpose systems are build following 1.

We’ve have an ASP Web Forms food ordering system that’s basically four forms and a simple CRUD admin system page that’s been running flawless for half a decade. It’s only needed upgrades to its .NET system as we moved up the TLS version ladder, and it has thousands of daily users. There are no tests, no real monitoring outside of “is it still up?” and no complaints.

Eventually we’ll have to clean up the database, eventually. But chances are someone will have build something new by then.


I doubt that would be 1 and not developed using 2 followed by a "don't touch" policy, tbh


We have one project under a don't touch policy. Sure it is probably the most stable project because no features are getting added but it is also becoming a maintenance nightmare the longer we accumulate interest on our technical debt. There is a version 2 planned that will add a lot more features and I'm sure we will spend the first weeks on upgrading the libraries it is using because they are a major version or two behind.


Maybe in this specific case there is also a factor of simplicity, simple designs, made with few elements, tend to be more robust, as a general rule.


We have 50 such systems, most don’t see the “thousands of users” a day though.


> How did software get so reliable without proof?

Simple, software started shipping with more bugs in backlogs :-P

— More seriously, software today is endemically crappy, and often poorly designed. I dunno why Hoare thought it was any better. The only saving grace is that (modern) software largely stays away from serious stuff. Eg: The airline industry is reluctant to upgrade software from decades ago, but happy to incrementally upgrade other pieces of the system on shorter cycles. Then of course, we have systems like the Boeing 737Max MCAS where the software did what it was supposed to (taken literally) but the software system was poorly designed.

EDIT: Just remembered this fantastic talk by Joe Armstrong https://youtu.be/lKXe3HUG2l4

The smooth running of every abstract system depends crucially on the (human) operator handling the point of contact with reality, in practice, often bending reality to make it tractable for the system. Any bureaucracy would grind to a halt if it wasn’t intelligent humans carrying out the processes! Software is no different.

Just like it’s hard to take a technology from “zero to one”, it is hard to take the amount of necessary human oversight from one to zero. For this reason, I would much rather think of software as amplifying the capabilities of that human, rather than automating away other humans. In practice, these systems will end up needing a highly skilled and trustworthy human operator to shepherd them — might as well design the system to make it maximally easy for those operators to understand/debug/tweak.


> software today is endemically crappy, and often poorly designed

So is every other engineering product that hasn't gone through years (decades) of evolutionary refinement.


I doubt software engineering is engineering discipline. At least not the way we are doing it now. We are not engineers, we are craftsmen. It is a miracle that the software that drives the world more or less works. So far there hasn't been any major catastrophe due to software but it will happen. Software sucks.


Over time, I think I am coming to agree with you.

I went to an engineering school. People who designed bridges had a wide, but constrained parameter space, and well-accepted design patterns.

I started out in semiconductor (MCU) systems and sometimes circuit design, and we had a broader (but not yuuuge!) parameter space, but it was growing as transistors got cheaper. Less well-accepted design patterns, because what you do with 50K transistors and 500K transistors and 5M transistors and 50M transistors - to use effectively - you need different patterns - and that changed so fast!

I did software-ish things with my h/w teams, and they would mock me because "software is easy". "You can do anything". And to do a rev, you didn't need to burn a $50K and 6 weeks (or whatever it is now) at the fab for each turn.

The problem with software is that it is SO unconstrained. You truly CAN do anything - except engineer in an unconstrained environment. I guess this realization (and Python blowing up on me in run-time environments) have taught me that: Constraints suck. But they are good. Software could use more of them, because they force discipline at dev time, at compile time, which reduces blow-ups at run-time.


> People who designed bridges had a wide, but constrained parameter space, and well-accepted design patterns.

And yet they still fall down due to stupid mistakes that nobody noticed.

https://www.usatoday.com/story/news/nation/2019/10/22/design...


Yes, they do. And they make the news.

I was always amazed about the stories of resonant frequency issues, not just from walking but also from wind! - https://en.wikipedia.org/wiki/Tacoma_Narrows_Bridge_(1940)


I think that's why frameworks are so popular: They extensively constrain the solution space. Folks lament their usefulness. ("You don't need React for that! Just use jQuery.", "Why not use raw SQL instead of an ORM?") I don't mean to belittle framework criticisms. Leaky abstractions, over-engineering, and peculiar implementations encouraged by a framework's model are legitimate commonplace warts. We've seemingly converged on these super-APIs because flowing through such a narrow (but not too narrow!) solution space is worth the relatively minor headaches.


> I went to an engineering school.

My father attended MIT in the 1940s. He said that the engineers at MIT designed the new chemistry building, with all the latest features to support laboratories. Only when the building was completed and scientists were moving in did anyone realize that there were no toilets.


Rumor has it that the U. of Waterloo was designed without including the weight of ... books. So the building is sinking.


This rumor also exists for the Robarts Library at UofT, and in both cases, I am quite certain, it is just that - a rumor.


Ha!

No women - what did they care? :-)


Engineering has a long tradition of trial and error. That is how the field of architecture got its current knowledge that currently probably focuses more on material research.

I don't see how software is different. We try new patterns, languages, deployment techniques and think about the optimal way to store data with specific operations in mind. And here you cannot just do anything if you want to solve a problem efficiently.

If we find better practices or more use cases, the environment becomes more restricted. We already have quite a few constrains for a relatively young field because there are many developers.

On the contrary I believe engineering to be a creative process. Of course lacking constrains might make it seem unfocused, but I still call that engineering.


> Engineering has a long tradition of trial and error. That is how the field of architecture got its current knowledge that currently probably focuses more on material research.

Imagine you are building a cathedral and every month the material you use changes to something that behaves totally different. While there is knowledge you can extract even when the material you work with and the fundament you work on seems to change every time, we live in a world that has given up to build systems that last.

In times where deploying a patch meant sending out CDs, software that didn't have to do that was quality software. Nowadays that feeling has flipped: if software doesn't offer updates at least once a month people feel like there is something wrong with it.

To me it feels like we collectively gave up on building things that last in software.


The grand cathedrals in Europe do sort of last, but not without continuing efforts to keep them from collapsing. Nearly everyone I've looked at had centuries of reinforcement of various kinds added.


Strangely, I am on the opposite site - that programming is more of a subset of mathematics (or at least should be) than engineering and since theoretically many parts of the program can be inferred without running, trial and error should not be as acceptable as it is (I don't mean it on a small scale like I do feel sometimes all I do is try different word combinations until the compiler is happy :D).


> Constraints suck. But they are good. Software could use more of them, because they force discipline at dev time, at compile time, which reduces blow-ups at run-time.

Can I take that to mean you're in favour of compile-time enforced strong and static typing?


Not the parent, but that would be very much a yes from me. Writing something safety-critical in a language like Python or JS would terrify me.


If you enjoy having a language which enforces some constraints on you, you may enjoy languages of the ML family.

Elm, for instances, enforces constraints such that the generated js never crashes your browser. Haskell has evolved a powerful system that lets you check web apis against their implementations, which can be pretty handy to enforce that clients should still be able to consume it.


I can recall a few articles where software felt like engineering. bits derived data needs, complexity based hardware sizing. Real solid constraints to use as foundations. rare


People have been building bridges for thousands of years. We've been doing software for barely two generations. Try to imagine what the first generation of bridge-building looked like. Don't worry: we'll catch up. Give us a few thousand years.


The fact that our CPU designs and languages keep changing every year doesn't help. I'm sure there are lots of Z80 experts that know how to develop high quality software for that processor but nowadays we need to support x86, ARM and soon RISC-V. This forces us to use Javascript, Java or Kotlin which are cross platform by default. However, Kotlin is a very new language. Javascript and Java are currently undergoing rapid iteration (Java 14 is out!).


It only takes one bit flip or a single typo to break your program. It only takes a single incorrectly wired transistor in your CPU and it all comes crashing down. Yet somehow it keeps working. The answer is that making decently reliable software is easier than you think.


> It is a miracle that the software that drives the world more or less works

It's more of a survivorship bias.


I don't buy it. I think the real problem is that software became one of the worlds biggest commodities practically overnight and developers are usually more incentivized to get stuff done faster rather than produce higher quality.

There's practically no material cost to software development, the proliferation of the internet makes patching trivial, and the overwhelming majority of software amounts to an nothing more than an inconvenience if it fails. Considering the high value of software and the relatively low cost of bugs, it's only natural that speed is valued so much more than quality.


Ironically it seems people as well as society as a whole fails to account that a zillion times inconvenience is death by a thousand cuts. This includes everyone writing code.

Goddammit when you take a long, hard look, objectively (like really) stepping back for a moment, every piece of software (and hardware to some extent as the line between both is getting blurry) around us is just a huge pile of Rube Goldberg machinery that barely happens to work on the happy path.

This process is not entirely unlike global warming and similar large scale psychological risk assessment failures.


And each individual piece might not be mission critical, but some mission critical component of the system relies on some non-mission critical piece of software or service.

The more I look at how fragile all human created systems are, I realize that humanity will perish with the equivalent of tripping over its shoelaces. Own goal!


I'd go even further with this.

I'd say that most software engineers are ultimately powerless to improve software beyond a certain point because doing so would cost more than their employer is willing to spend.

Given enough time, any software engineer could make a reasonably solid product/service that would stand the test of time. Most projects are run on a deficit of either time or resource, and as a result we end up with cut corners.


New categories of software are created every single day. Things like databases, compilers, filesystems that belong to the well known categories are all rock solid compared to your average iPhone or Android app whose app stores have existed for slightly more than a decade.


The problem I see is that we keep throwing away things that work well and are understood.

Relational databases will work well for most products, its understood how to design them well, optimize them, tune the servers. But nowadays we need to stick everything into microservices on a Kubernetes cluster with Mongo instead and we are told that this is progress.


Not really. When programming was done by real engineers (rather than "coders", "developers", or horribly misnamed "software engineers") - with proper engineering discipline which involved deliberate design and documentation rather than "agile" hacking, it was more reliable. By far.

My personal recent experience of actually doing software the old-fashioned way involved writing correctness critical high-performance code for a major data warehouse vendor, which mostly stayed with zero known bugs in production - greatly contributing to the vendor's reputation as reliable and dependable place to keep your data in.

And how do I know what the old-fashioned way to write code is? Well, I've been writing code professionally for nearly 40 years.


> it was more reliable. By far

[talking about non-software engineering] There's a huge difference between reading books about proper engineering and how it is actually practiced. Much of the sloppiness is covered by simply over-engineering it. Designs are constantly improved over time based on service experience. Heck, lots of times the first machine off the line cannot even be assembled because the design dimensions on the drawings are wrong.

The idea that non-software engineering is done by careful professionals following best practice and not making lots of mistakes is painfully wrong.


As an ex-machinist, I can confirm that bad drawings are a thing.

But then, non-software engineering is a wide field. There are products that you can afford to iterate on, and then there are very expensive (and potentially very dangerous) projects where you generally can't afford many slip ups.

If your engineers make lots of mistakes (which aren't caught in time) in a project that costs millions and can't be replaced by another unit off an assembly line, that's kind of a big deal. Thankfully, we don't hear about bridges, skyscrapers, heavy industrial lifts or nuclear power plants failing all that often.


The things you mention are over-engineered by wide margins to cover for mistakes.

The first nuke plants are pretty dangerous by modern standards, and we know how to fix them, but because the first ones are dangerous we are not allowed to build fixed ones.

The Fukushima plant, for example, had numerous engineering faults leading to the accident that would be easily and inexpensively corrected if we were iterating.

Airplanes are a good example of how good things can get if you're allowed to iterate and fix the mistakes.


The more I learn about modern software practices, the more I come to think that software is worse today because programmers are poorly disciplined and badly trained. Of course, that's likely because the barrier to entry is so much lower today than it used to be.


The barrier of entry to software has been zero ever since the 8080 was introduced.


Back then there were a lot less stackoverflow copy & paste mistakes.


There were also far fewer users to really discover the nasty edge case bugs.


> There were also far fewer users to really discover the nasty edge case bugs.

If you consider why a company makes something reliable or not, it's a relatively simple formula:

    = Number of Users x (Benefit of Getting it Right x Probability of Getting it Right) - (Cost of Getting it Wrong x (1 - Probability of Getting it Right)))
As the number of users in any system increases, the cost overall cost of getting it wrong also increases. You can then devote more fixed cost resources to improving probability of getting it right.


Old fashioned way?

You mean all that softwarte written in '90s with no security in mind?

I bet you're talking about outliers. Nowadays average developer practices are levels above that what was decades ago while being supported by great tooling.


I feel like there are two separate things going on here.

You can have an extremely reliable piece of software running say, an industrial lathe or a stamper or book printer or whatever - software which can run 24/7 for years if not decades, software which will never leak memory, enter some unknown state or put anyone in harms way - and yet have zero "security", because if you plug in a usb keyboard you can just change whatever and break it entirely. Software which has no user authentication of any kind, because if you are on the factory floor that means you already have access anyway because the authorization step happens elsewhere(at employee gates etc).

It's like people making fun out of old ATMs still running Windows XP, because it's "not secure". If the machine isn't connected to the internet, reliability is far more important - who cares windows XP is not "secure" if the ATM can run constantly for years and reliably dispense money as instructed and there isn't a remote way to exploit it.

I feel like that first kind of software(the reliable kind) is far rarer today - people just throw together a few python packages and rely on them being stable, without any kind of deeper understanding of how the system actually works, and they call themselves software engineers. The "security" part usually also comes as a side effect of using libraries or tools which are just built with "security" in mind, but without deeper understanding what having truly secure software entails.


It really depends on the scenario. When I wrote software for load balancing phone calls, it was minimal, had a well defined state machine, passed all the static testing I could throw at it, etc. At the same time, I wrote some crappy web service code which could fail and get retried later, because nobody would see that. If the worst thing that can happen is that one in a million visitors will get a white page, it doesn't economically make sense to do better. Even if you know how and have the tools.


I would find it very hard to bring myself to do that. I won't knowingly write incorrect code even if the chance of failure is very small. Luckily I don't have a boss breathing down my neck telling me not to waste time.


I don't think anyone knowingly writes incorrect code. But you can spend between 0 and infinite time thinking about whether the code is correct. At infinite you never release, so you don't need a boss to have some reasonable time limit. If this is your non-hobby work, you need to decide when to stop looking and accept the potential issues.


I've worked on software for some time and have seen a wide range of interesting failure modes. All of the interesting failures (some of which have had large monetary or visible outage impact) involved systems that broke in ways that aren't well-addressed by proof systems. Instead, they've been about things like statistical binning of parts, rare errors that escaped through middling checksums, and complex distributed systems. In most of these cases, there is some physical component, such as corrupted packets which pass CRC checksum, or a chip that occasionally miscomputes the hash of a string. Not sure how proof systems handle this, but in practice, those have been some of the most impactful errors, and they are typically dealt with using heavy but not comprehensive software testing.


In my experience the interesting failures are a tiny fraction of the failures; your mind just glosses over the vast sea of boring failures. The most impactful software flaws have not been subtle logic misalignments, they've been utterly basic errors like Cloudbleed or the Arianne 5 overflow. Maybe proof systems can only address 99% of software failures; my guess is that with experience we'll find ways to cover the rarer cases as well, but even if we don't, they're still worth using.


Well, maybe. Additional costs with these approaches are often ignored. I wouldn't want to give up proof systems as a tool, but too many treat them as a zero cost panacea. A proof system or type checker is not an excuse to not give a fk about your craft, but I've seen it that way more frequently than I'd like to admit.

Personally and honestly, outside of already well understood or defined systems, I'd say that people should invest their time in understanding their business domain, before they dig deep into proofs and related techniques.


> Well, maybe. Additional costs with these approaches are often ignored. I wouldn't want to give up proof systems as a tool, but too many treat them as a zero cost panacea. A proof system or type checker is not an excuse to not give a fk about your craft, but I've seen it that way more frequently than I'd like to admit.

The acceptable defect rate to the business is presumably fixed, so if a proof system or type system reduces you defect rate below that then that should be an "excuse" to write fewer tests and move faster, as icky as it feels to do that. I find the maintenance burden of tests is often underestimated; types in particular may not be free, but they get you more bang for your buck than any alternative, in my experience.

> Personally and honestly, outside of already well understood or defined systems, I'd say that people should invest their time in understanding their business domain, before they dig deep into proofs and related techniques.

Isn't it the same thing though? Encoding your business domain into the system you're using is the hardest part of using any formal system, but it's also an excellent way of forcing you to really understand the domain.


In my experience, people either truely care about the business domain, and are willing to be uncomfortable to delve into it, or not. Types and proofs seem to be orthogonal.

Some domains are worth it, some are not. Sometimes it fits well, sometimes it doesn't.

Just don't be the one sinking into the muck saying, "...but the math worked". The math will be far more reliable than your understanding of the domain.


I agree, the vast majority of failures I see are "operation timed out", for utterly banal reasons.

But I'd like to point out that the "impact" of Ariane 5 was very different from the "impact" of Cloudbleed. Space operations are definitely a situation where large amounts of advance planning and testing makes a huge difference, because you only get one shot to put a $500M satellite in orbit.


>complex distributed systems

amazon has been using TLA+ to prove/check correctness of production distributed algorithms and systems for about a decade


> amazon has been using TLA+ to prove/check correctness of production distributed algorithms and systems for about a decade

Such an in-your-face, smug and flippant comment but ...

On some projects.

In some teams.

And Aurora lost a lot of data initially in spite of whatever TLA+ they were using, somewhere at Amazon.


Sure, here's an example: https://lamport.azurewebsites.net/tla/formal-methods-amazon.... They still have to let users know about rare bugs that led to data corruption. You'll generally formal proof checking applied to simplified model systems of individual components, while interesting failures tend to involve ten or more components.


Isn't it evolution and Darwinian selection?

Bad pieces tend to die out, good pieces get replicated, "breed", mutated and incorporated[1]. Biological systems are not fault-tolerant. They are resilient on the average, under typical conditions.

[1] Among species, we get vertical transfer of genes (parent(s) to offsprings), horizontal (e.g. plasmids between bacteria). Getting as a piece of DNA as a submodule (this element-of relationship) is rare... but happens (e.g the mitochondrium). In software, it is one of the most common relationships.


>Isn't it evolution and Darwinian selection?

Yeah I find the question Hoare poses a little bit hilarious because it is so stereotypically academic and 'mathy'.

Software got reliable the same way not only biological systems but also markets produce reliable solutions without experts needing to write formal specifications down. By decentrally and through many iterations producing incremental improvements with price signals and adoption as a measure for success, and faulty software being replaced by better competition.


Selection depends on a fitness function. The fitness function for a lot of software is some weighted combination of profitability, mindshare, and novelty, not some abstract measure of quality and stability.


Not quite? Life is resilient under the worst case. If it weren't, the first time the worst case came around (drought year/early freeze/whatever) that form would become extinct.

Sure there's a distribution (gene pool) and some is merely resistant under normal conditions. That kind prospers, but keeps the 'worst case' genes handy. A few members of the population are the 'prepper' members, and if this year is the worst case then the species survives.

Anyway something like that.


Drought is still a "typical" condition. (I.e. one from its environment, from the last hundreds of thousands of years or so.) Selection pressure worked on that. Yes, it may be a condition that makes 10%, 50% or 90% percent of the population to die - and it is not a problem, in the grand scheme of things.

Not normal conditions are things that were never tested, or never tested to that extend. E.g. an invasive species from a different continent, a chemical compound, different pH of water, etc. Consequently, many species die out completely - as there is not enough time for the selection of better-fitted genes for this setting.

Also: compare with machine learning, and giving a sample out of the training dataset distribution (all photos are shot in good light, give one in the dark). Usually, the answer is meaningless.


Yeah I was going to compare it to the anthropic principle. It's the way it is because if it wasn't then we would have something different. No beautiful underlying reason.


More like fractal evolution.


How did proof itself get so reliable? The process of convincing yourself that a proof is correct isn't so different from the process of convincing yourself that a program is correct.


Yes, it is. This can be shown with theory: proving a program is correct is NP-hard, but verifying the proof is in P-space. It is way easier to show a proof is correct than to generate the proof in the first place.


True, at least for formal proofs. I was thinking more of traditional mathematical proofs, which are often as hard to formalize as programs are to prove correct. In both cases, convincing yourself that the proof or program is correct involves reading it and seeing whether each of its parts follows from or fits into the other parts to make a coherent whole.

Of course, there are ways to fool the reader, but I think people learn to avoid writing in ways that are likely to mask errors (e.g., not reusing the same symbol to mean completely different things, both in proofs and in programs).


And equivalently, if you had heavily-relied-on proofs that were as narrowly read and hard to follow as software, mathematics would be in just as bad a state.


Every piece of software defines an input space of all possible inputs for that software. This space is staggeringly, exponentially enormous. However, the space of inputs that are actually given to the software are generally an exponentially small subset of the possible inputs. Making this often very stereotypical and "friendly" space of inputs do mostly sensible things is at least polynomially, if not exponentially, simpler than making the software be entirely correct.

It is both true that software is a streaming pile of bugs, where the vast, vast majority of responses to possible inputs is somehow wrong, yet the vast, vast majority of inputs actually processed by the software are mostly correct (not entirely, but mostly) because they are in the generally polynomial subsets of the exponentially complicated space that the developers made correct... ish.


To expand on this, floating point works for 95% of programs. It is the 5% that needs exact accuracy/precision offered by arbitrary precision libraries. Most programs hardly exhaust the entire range of your typical floating point usage range. For example in a GUI you won't be able to easily tell the difference between 100% and 99.9999999% by eyeballing.


This is why model checking and directed fuzzing are so effective! There are situations so stupid only a machine, or reality, could come up with them.


I think I’ll use your comments as a response next time someone calls me out in a code review or says I deployed a bug. Specifically the polynomial exponentially complicated bits.

The look on their faces would be priceless. And they would forever think I’m a jerk... since they wouldn’t get the sarcasm lol.


Except that we continue to slowly automate away human intervention (must be done reasonably; it must still be possible, it is just less current in practice). Just look at the state of the art in field X at date T, then 10 or 20 years after.

We are getting better at some software, and why would we not? Not all SW, but some, and even probably a lot of. Likewise for VLSI (which in some aspects is quite similar to SW); there are some quirks, but it basically works despite the modern designs being of unprecedented complexity. With a reliability way higher that tons of high ends mechanical devices. Same story for off the shelf standard operating systems even for absolutely not critical software: given the application are somehow tolerant to crashes (the user just restart), it is remarkable that we get so few.

I'm a firm believer that there is no SW crisis. There are people or companies refusing to apply best practices here and there, or even not knowing they exist, but that is another story.


As a software person, I found Richard Feynman's paper "Personal Observations on Reliability of Shuttle" quite interesting.

https://www.history.nasa.gov/rogersrep/v2appf.htm

The part about software reliability stuck in my head, because he quickly came to the conclusion it was fine.

Now shuttle software is probably the most special type of software on the planet, but it made me think how reliable software is... compared to hardware (and possibly people).


Physical things tend to not yield to abstractions as much. While people talk about how difficult software debugging is when the illusion of a high level abstraction is broken, still it's more of an exception.


> The ultimate and very necessary defence of a real time system against arbitrary hardware error or operator error is the organisation of a rapid procedure for restarting the entire system.

"Just putting out the plug and stick it back in" is one common way nowadays of how to get out of an unforeseen state. It has quite some history and goes at least back to the "let it crash" philosophy of Erlang. Of course this still does not work for all kind of domains, especially when one is closer to the metal. But still, we may have found a sufficiency compromise between formal verified software (and thus, higher costs) and some kind of fault-tolerant software (increased productivity).


Or, in another word, "disposability". We have a lot of systems that aren't repairable, don't get debugged, don't have things fixed mid-flight.

And...it works, with respect to most existing challenges. Restarting and replacing is easy to scale up and produces clear interface boundaries.

One way in which it doesn't work, and which we still fail, is security. Security doesn't appear in most systems as a legible crash or a data loss or corruption, but as an intangible loss of trust, loss of identity, of privacy, of service quality. We don't know who ultimately uses the data we create, and the business response generally is, "why should you care?" The premise of so many of them, ever since we became highly connected, is to find profitable ways of ignoring and taking risks with security and to foster platforms that unilaterally determine one's identity and privileges, ensuring them a position as ultimate gatekeepers.


I'm already starting to get tired of all these frequently-restarting Nomad jobs, and I've only been at it for a few months.


Simply an absurd amount of hard work. Millions upon millions of person-hours spent banging foreheads against C++ and the like. Billions of gray hairs. But that’s progress for you.


The most trivial software system you can think of nowadays is already so large in scope and/or complex that you can’t prove it doesn’t work either, because there’s no complete specification on how it’s supposed to work - if it existed, it would be the bug-free software itself, because all software is specification.

In the end all that matters is whether the system is useful, in an economical sense (value delivered > defects over time).

So I would argue software did not get reliable, our bar for defects remained low, but the relative value delivered increased a lot because software has amazing leverage.


Moreover, when you get an existence proof of a bug, it's usually easy to change such a system to seem ok.

As Hoare also said, "There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors."


So I guess the answer is: because of the feedback loops that exist around the software systems.


Exactly. Software systems are reliable because they have economic value. We're incentivized to make them work, regardless of how much technical debt and duct tape lies under the hood.


> How did software get so reliable without proof?

Two words: survivorship bias!

The software field exploded in the past half a century, producing a huge volume of activity producing a huge amount of cruft. A lot of the unreliable stuff gets shaken out; so what we see is a biased sample consisting of the successes that remain. Software that is halfway successful gets to be maintained, and over many years or decades, the reliability improves.


Human operators surely play a part sometimes, but there are systems which eliminate or minimize that and are still surprisingly reliable. I think people have learned to build resilience into their systems, which expect and anticipate failures. These systems still fail and when they do it's often in very complex and surprising ways, but those failures are rare.


Interesting. I wonder if there is a point beyond which multiplying these abstractions doesn't really do anything to boost either performance or reliability i.e. will someone come up with a way to wrap and package kubernetes clusters into yet another russian doll abstraction?


I actually wasn't thinking in terms of abstractions although those can help too. Simple things like always validating your inputs, adding sanity checks and safety limits to your code, building in isolation, can go a long way towards catching a lot of errors and/or limiting the blast radius of failures.


Most of the software I have written for work has no near on no functional / unit tests. It is industrial process monitoring software that runs 247 on remote machines thousands of miles away. Simple fact is it has to work, no if's but's, even when there are faults in its environment.

This does not mean to say that the software has not been tested.

Testing is subjective to a softwares use. You could have 10000 tests and still have a series of bugs.

Testing for my work comes as two categories, "Tested by design" and "Environmental Testing". "Tested by design" involves developing your software by which the designing and writing the software inherits the tests as you go. When I go about designing a feature, I expand on a chosen solution of that feature and then methodically branch out on the uses of that feature, build a map of dependancies, scenarios, outcomes, consequences etc for that feature. These become the basis of how I write the code for that feature because I have inherantly put in mitigations for everything I have considered before hand. There is no point in testing something excessively if you can and/or have garanteed by code that it will perform exactly as designed. That may seam a contraversial thing to say.

With "Environmental Testing" we simulate the environment which the software is put into with real external factors. This is particular for the software I write. This is not always the best for another software project.

I've written my works software from scratch 3 years ago and it has had various incremental additions and changes. Yes there have been bugs. I will admit that entirely. However the key thing with work's software is that is has rarely crashed or rarely faulted while containing those bugs. Most faults that have been found have been configuration errors/issues.

I would be interested in how others approach testing strategies for the software they have written.


Having worked in some places that aren't software companies it is pretty incredible how something a developer who is no longer in the company set in the last decade is still kicking ass with no monitoring or documentation.

It's truly remarkable! All those databases that software depend on and go without people even realizing they exist until someone literally decomissions a datacenter only then to figure out lots of apps require that particular machine to be running... they're day in and day out working with 0 maintenance for years... impressive.

Postgres, mysql and mongodb are examples of databases I've seen like that! As well as inhouse apps that depend on those databases... at most they have a cron job for backing up and keeping at most the last X backups.


I don't think software is all that reliable. Almost every piece of major software that I use has one or more bugs and periodically locks up or crashes. We as users have been conditioned to accept this with the situation getting worse rather than better over the years.

The main reason software seems more reliable than in the past is because in a SaaS environment the crash doesn't happen on your computer anymore. But the effect is the same unless extreme care was taken on the server side to ensure that no data was lost. In most cases we perform the equivalent of a reboot or retry: we reload the page.


This article misses the elephant in the room: how to define "reliability". Most software perceived to be reliable (eg, MS Paint, bash) have numerous bugs, it's just that no one comes across them. Software that is perceived to be "reliable enough" (eg, MS Word, Chrome), have bugs that pop up regularly.

Here are some vague uptime reliability measures and what you need to get there:

- 70% reliability: you can often just free-code it and pray.

- 95% reliability: you better have a bunch of tests.

- 99.5%: you need automated testing, coverage tools, documentation, team coordination tools, and and solid merge/deployment process.

- 99.99%: all-of-the-above plus tests that aggressively attack and analyze production systems (eg, Netflix Chaos Monkey), coverage tests that do branch coverage and not just line coverage, simulating network failures, and/or test on a variety of hardware and user-configurations.

- 100%: The only way to be absolutely certain, is to convert each line of code into a mathematical operator and solve for a formal proof of correctness.

The 100% reliability standard is not just theoretical. The U.S. government issues software standards for safety critical devices (military, aviation, healthcare), and some of them do indeed require mathematical proof of correctness (EAL6+ being a key standard). They generally do not require 100% bug-free code, but just that some particular features are 100% bug free (e.g., the OS/scheduler will never deadlock).


> and some of them do indeed require mathematical proof of correctness > (EAL6+ being a key standard)

Nitpick: EAL6 is still semi-formal, EAL7 is formally verified design and tested. EAL7 is very rare, on the CC certified products page [1] there are only 2 EAL7 certified products: a hardware diode, and a virtual machine OS.

[1] https://www.commoncriteriaportal.org/products/


> Nitpick: EAL6 is still semi-formal, EAL7 is formally verified design and tested. EAL7 is very rare, on the CC certified products page [1] there are only 2 EAL7 certified products: a hardware diode, and a virtual machine OS.

You are definitely correct. Also, disclosure... I worked on that EAL7 Virtual OS. When I left it was only EAL6+, likely leading to my bias towards it.


1. Proof costs. Like, a lot. If you don't need the reliability, why would you pay for it?

2. How much reliability is needed? Not 100% (usually). If my OS crashes, I'm unhappy. If the OS on my company's production server crashes, that's a lot worse. But our production server is actually a redundant cluster, so if one machine crashes, people are unhappy, but nothing horrible happens. (In point 1, I talked about the cost of proven-to-be-perfect software. Needing to have a redundant cluster is part of the cost of not having perfect software, which also must be considered. But because the hardware isn't 100% reliable either, we needed a redundant cluster anyway...)

3. Formal methods are often playing catch-up. Now we have to worry about timing side-channel attacks through the cache. There may now be formal methods to prove that such things are not possible in a piece of software; I'm pretty sure there were not such formal methods five years ago. We keep finding new areas that can fail. (One advocate of formal methods replied to me some weeks ago that formal methods proved that the software would never do anything that it wasn't specified to do. I doubt that claim with respect to something like side-channel attacks. Even if it's true, though, I suspect that often the specification didn't include "no side channel attacks" until certain recent events brought it to more popular attention.)

Proofs are therefore suffering from a double whammy: Not only are they painfully difficult and expensive, but they're incomplete. When something comes up that wasn't covered by the proof of correctness, you now need to fix it. And then you have to re-run all the proofs all over again, because your fix may have invalidated them.

4. All proofs have to start somewhere, usually with a formal specification. If the specification is wrong (looking at you, MCAS), all the formal proofs in the world won't save you. (Well... a system-level formal proof might. But that's even harder and more expensive. And even that fails if the system spec is flawed.)

TL;DR: Real proofs are too expensive for the benefit they provide, in almost all cases. The cost-benefit just isn't there. Hope lies in making them easier and more automated to use (better type systems, for example).


Maybe it has something to do with most software not only being not right but not being wrong either (I’m paraphrasing Wolfgang Paul).

We do put a lot of effort into writing correct programs for some definition of correct. But a lot of our specifications don’t include things that aren’t obviously wrong like, “not leaking secrets,” wrong.

I share the enthusiasm of SPJ who said, “it’s amazing that it works at all let alone so well.” It’s easy to forget that that little white box you type a few words into is more impressive in scope and complexity than the tallest buildings humans have built. And yet it is so largely invisible!

I think the more important question is how much longer have we got left before regulators and insurers demand we assume more liability for things when they do go wrong.


I actually think that the FOSS movement and licensing is really progressive in this space. This thing comes with NO Warranty etc is really great because you own the risk of the systems you build. If only that got surfaced more to the users so they realized that a T&C page isn't really protection for anything.

Rather than force devs to use formal methods, I'd really like it if software got to the point where regulators and insurers demand that we treat personal information like radioactive waste and try and handle the least amount possible.


> Rather than force devs to use formal methods

I don't think professional liability would require all developers to use formal methods any more than it would require all practicing software developers to become licensed engineers.

It may require companies to hire a licensed engineer to sign off on projects and it could require those engineers to be insured so they may want to use formal methods on a risky project to keep their rates down.

The end result, I think, is the same: pressure to do better by the standards of the state of the art and not skimp on privacy and security.


If you compare K&R C and C++17, everything that has been added is some form or another of 'proofs'.

So I take issue with the 'without proof' part; even if they're not proofs that adhere to strict mathematical forms, they're still there.


In 2020, software is reliable?


most software today is a mess. only when you include the user in the loop with his version-3-new-final backups and willingness/need to retry in the event of error you get something that's workable.

we've effectively shifted partial blame on the users, the famous "operator error" class of bugs, made sure to help then save often and substituted reliability with resilience

but it doesn't mean software is reliable. the whole system as a whole and including users corrective actions is somewhat resilient enough for productive work


I think the answer to the question posed may just be "it's like natural selection".

A lot of software is horrifically bad. But not all flaws are created equal. A software that runs an online shop, but crashes 50% of the time during checkout will probably not create a successful company and thus won't survive long.

However a software that has comparably many flaws and fails in all kinds of rare corner cases, but still gets the job done in most average cases, may just survive, even though it's equally bad.


How did people get the impression software was so reliable?


What strikes me in both the original article and the post is the failure to recognise that a strong form of selective pressure is being applied upon software, and that adaptive processes (of which evolution is but a single example) tend to optimise for the criterion that is being selected for even in absence of an overarching design (‘proof’). I recommend reading Darwin Among the Machines for comparison to hardware.


By fixing everything that goes wrong noticeably and by having no idea that many features are currently actually not working as they are supposed to?

To actually create reliable software currently the best bet is to have automated tests for as many things as possible and exploratory manual testing around any feature that changes.


By focusing on the critical path through the call tree. It's shocking how many fatal errors static analysis finds in my typical C/C++/C# code but the call graph that those errors are on simply never gets exercised. The mathematical proof would include all those unused paths.


Probably money? If there is money to be made somebody is going to find a solution that mostly works


When it comes to writing reliable software the best paper I have read is “Design By Contract”. That one paper combined with proper usage of exceptions will go a long way in preventing bugs. Of course there is also a need for more rigor in requirements.


Fascinating how perception has changed since then; I think most people today would have the opposite sentiment: "How did software get so unreliable?"

My guess is: software creation and distribution became exponentially more accessible around that time (1996) with the advent of the web. Not only were many less-qualified individuals creating software (which isn't intrinsically a bad thing!), many more people were consuming software (making consumer software a much more lucrative market), and technology changes allowed companies to "move [much] fast[er] and break [more] things".

Businesses were always optimizing for profits, and the dot com boom simply changed the equation to where reliability was no longer profitable in the majority of the field.


Richard Feynman said it best:

1. Write down the problem

2. Think real hard

3. Write down the solution

http://wiki.c2.com/?FeynmanAlgorithm


Short explainer video covering Hoare's original paper:

https://youtu.be/eY8AyCZ5uUg


> How did software get so reliable

Talk about begging the question!


How many HTTP request does nginx usually serve before crashing? How long can a Linux kernel stay up before panicking? Some complex software is startlingly reliable.


By begging the question!


> How did software get so reliable without proof?

Because most software is just moving data from one place to another, combined with some very simple business logic.


It didn't.


The proof is in the pudding.


See Betteridge's law of headlines.


The main reason I've found for programs being 'correct' is that the range of actual inputs for programs is severely limited.

Most programs see inputs that are similar enough to previous inputs that the latent bugs are never triggered, or at worst triggered one at a time.

Running programs far outside their expected parameters on the other hand has nearly always lead to disaster.


Would you say that a typical web-browser (a complicated piece of software) always sees the same kind of input?


Pretty much.

Throwing [0] to chrome and firefox has made made them hang for 5 minutes so far. Generating the HTML took under a second and 13MB, large, but not larger than the fatter webpages out there.

In terms of input browsers don't see but should be able to parse you can't get much simpler than this.

[0] In Guile2.x:

    (define print 
      (lambda (x)
        (display x)
        (newline)))
    
    (define div-hell
      (lambda (n)
        (cond
          ((eq? n 0) (print "TEXT"))
          (else
            (print "<div>")
            (div-hell (- n 1))
            (print "</div>")))))
    
    (print "<html><header><title>DIV HELL</title></header><body>")
    (div-hell 1000000)
    (print "</body></html>")


>The problem with the question “How did software get so reliable without proof?” is that it’s asking the wrong question. It’s not that software got so reliable without proof: it’s that systems that include software got so reliable without proof.

No. This person isn't sounding smart enough. The real question isn't how systems that include software got reliable without proof but instead the reverse: how did software that includes systems get so reliable without proof? That's how you should conclude an article with some hand wavy technical jargon.


Fixed a ton of bugs. And it's not that reliable, and super reliable software in airplanes is proved.


The same way bridges, aeroplanes and medicine happened. What makes software so special?


Brilliant analysis.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: