Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A short fable of software engineering vs. regular engineering (bham.ac.uk)
111 points by theaeolist on Jan 10, 2016 | hide | past | favorite | 91 comments


Automotive engineering is antediluvian wrt process compared to software engineering.

We can manufacture billions of copies of our product with error rates that are almost too small to measure. Six sigma? Hah!

Our design processes are far in advance of the automotive industry. Continuous design is normal practice. It's only the most careful (or most backward) shops that'll take months to update a design.

The automotive industry, by comparison, regularly takes years to update a design even by a minor version. The best shops at this i.e. the F1 race teams can fix bugs in days, minor releases in months and, if this season is anything to go by, struggle horribly with major version upgrades.

TL;DR it's fallacious to compare manufacturing to programming. The latter is a design process even if it feels superficially like an assembly line sometimes.

One final point. I would have very much liked to have read an analysis regarding the different approaches of the engineers vs the craftsman. This is an interesting topic and does have crossover between disciplines. Oh well...


> Automotive engineering is antediluvian wrt process compared to software engineering.

I couldn't disagree more strongly.

> Our design processes are far in advance of the automotive industry. Continuous design is normal practice. It's only the most careful (or most backward) shops that'll take months to update a design.

I'd say the latest trends in software engineering processes are actually adaptations of what has been practiced in the automotive sector (and other manufacturing sectors) for a long time, but outside of regulated industries, practiced with far less rigor. I don't think the claim that development processes in software are "more advanced" holds water (what does that even mean?).

It's impractical to practice continuous design on a physical product like a car so I think that's a poor point, but kaizen (which I will argue serves an analogous role) is certainly used throughout manufacturing and has origins in the TPS.

> The automotive industry, by comparison, regularly takes years to update a design even by a minor version. The best shops at this i.e. the F1 race teams can fix bugs in days, minor releases in months and, if this season is anything to go by, struggle horribly with major version upgrades.

Ignoring regulatory and documentation concerns, part of the reason it takes years to update a design is that the costs associated with producing a physical product are completely different from a pure software product. It's more cost effective to do good design work and heavy V&V of your design before you order tooling, rather than trying to modify your tooling (if it's even possible) after an error is discovered.


Exactly! The auto industry works like it does because it HAS too, not because it is better. Software design is much better and more advanced, because our industry allows us to choose better practices.


You know you have a programmer's mindset when you think flailing around is a "better practice" than actually designing your product. :-p


I don't think we're on the same page.

> Software design is much better and more advanced, because our industry allows us to choose better practices.

I don't know what "much better and more advanced" is specifically referring to, but in the general sense, I see SDLC processes used in the stereotypical tech startup to be cargo cult appropriation from other more mature industries.


"better"

What does that word even mean?


Number of errors fixed. Software engineers fix more errors in their designs, ergo, they must be using better practices! /s


Human safety adds massive overhead to any mechanical engineering project. I worked in aerospace. In probably 1-2 days, we could conceivably change a part drawing, send it to the shop, then have it start being manufactured en masse. Why don't we do that? Because the part drawing must be reviewed by the analysis group first. Altering part geometry can change load paths. We might have enlarged something to strengthen it, but now it requires a larger hole elsewhere. It can take days to months to anaylze the part change. We do manual calculations, physical simulations, and physical testing. Each of those is still not at the point where we could pass it to a strong AI, they're all part art and part science that require an engineer's manual, informed decisions. Correcting experimental data, deriving new calculations, even just making diagrams and describing all of this along the way takes time. We must document everything. We might have to design and build a test fixture from scratch (airplane wing test fixtures aren't exactly off the shelf). Experiments get run wrong, they set the wrong loads, a tech installed a sensor 1 inch off, meaning we have to throw out a month's worth of data. Humans make mistakes. We do our best to catch them. Only after all this can we get regulatory approval (FAA in our case) and start pushing it to production. Honestly the attitude that software eng is "light years ahead" is infuriating, mechanical engineering is rife with time consuming checks and triple checks so that people do not die.


Have them ride a plane designed using an Agile process.


When my pilot announced last night we were doing an autopilot landing, that's exactly what I was thinking.


Why should this discussion be limited to out and out software companies. Think NASA. Think New Horizons. Think about the nuclear reactors and what would happen to them when a sensor gives a wrong reading for reaching critical level. These are some pretty serious use-cases where Software Engineering processes are highly developed and rigourous.

The computing systems used in a car itself (what do you think triggers the airbag on an accident?) can be trusted because of maturity in Software Engineering.

Startups don't choose to follow these methods to keep costs low and stay lean. I say that in the broadest sense, because a banking startup will be different from your average SaaS startup.


I agree - I think the reason they took it aside and investigated the root cause was that they couldn't afford to produce very many defective parts at £7,000 apiece, whereas in software each defective instance is (nearly) free.

That said, I've worked in teams who were developing software that would be boxed, shipped, and installed on millions of remote devices, and I've worked on teams whose main focus was "keep the service running at all costs". The former definitely emphasized root cause analysis, because screw-ups were insanely expensive and difficult to fix once they were shipped. The latter - not so much.


People who write software shouldn't brag about making billions of perfect copies, or being able to make changes in hours, because that is just an aspect of their field which is inherently easier than automotive engineering. It isn't any process improvement that has made that possible, and automotive engineering can't simply copy the software engineering process to reap the same benefits.

Working in an easier field doesn't mean your field is more mature. Especially when there is such dramatic uncertainty in the schedule and the output is so flaky.


Yeah, call me when you software crashes less than a car (due to design or manufacturing defect).


As our world intersect with more and more lives on the lines like with self-driving cars, we need to understand how these things interact in terms of development cycles and process.

We (as software developers) don't need to adopt their models and processes, but we need to figure out how to work with them to ensure higher reliability across the board.

The alternative will be formal certifications/licensing pushed on us by legislators who don't know how to send an email and think mathematics is a weapon to be regulated.


As a counterpoint, I'm a software tester on a project right now that has:

* Several of the component programs read from multiple config files

* Manual registry settings

* Config files have True/False, On/Off, and Yes/No settings.

* The version is reported to a db on program run, but the version is read from an xml file, not the program.

It is true that a new car program can take 5 years or more to roll out, but at least it actually happens. This system has been in use for at least 15 years, and it just keeps getting kludged.


First, I appreciate this story, but it seems the author compares the build process of software to the production process of the car line.

It's comparing the build stage vs the literal produce-the-thing stage.

I also think the stoppage of the production line happens in software.

If my code breaks while in production, I grab as many log files and stats as I can, roll back the code, deploy, and then study exactly why everything failed, and if we recovered post rollback.

That seems like the similar case on the car production line - haul the faulty model aside (get the logs/metrics), figure out exactly what happened, fix it.

Who knows how the actual assembly line was put together. Maybe they iterated hundreds of times.

I don't expect my code to fail in production, otherwise I wouldn't have put it in production, but I understand it _may_ fail. That's why I put in logging and metrics collection, the same way those engineers hauled aside the car.


A more accurate comparison would be "creating software" which is all design, and "the process by which the production line is spec'd, planned, designed and created"

Executing a program is analogous to running the production line.

It's intellectually dishonest to compare design with production.


I agree completely with your last point and think that all Programmer vs Engineer[1] discussions should at least take it into account.

[1] https://bensu.github.io/abstracting-engineering-away/


I hate these articles that compare software engineering to traditional engineering because the typical software engineer has very different requirements wrt failure and budget than does the traditional engineer. There are examples [1] of software being written up to the standards of traditional engineering. It can be done, it's just very expensive and it almost always makes more sense financially to write software that's more likely to have bugs. When that doesn't make sense, more disciplined practices are used.

[1] http://www.fastcompany.com/28121/they-write-right-stuff


With respect to the linked article, that software development team might as well live in another planet. Sure, they develop software for a living, and so do many of us here, but their approach is so different from ours that they are effectively practitioners of a different profession. It is dishonest to insinuate that any programmer could perform well at a job consisting in developing error-free software.


The Fast Company article is interesting; thanks. Readers should know that it's from 1996 (something not revealed until you reach the bottom).


    [...] a whole bunch of engineers tried to figure out not how to
    fix the problem but why the problem happened in the first
    place.
This is exactly what debugging is.

    My friend was quite obviously an accomplished hacker.
    His hacker mentality would have fared quite well in
    software production, where debugging is part of the
    normal workflow [...]
Finding a workaround in production is not debugging even if it's useful.


"This is exactly what debugging is"

Yes and no. Debugging (in the sense that programmers do it) stops when you learn "X wasn't initialized" or "that buffer isn't long enough for the trailing zero".

This is root cause analysis (https://en.wikipedia.org/wiki/Root_cause_analysis). It goes way deeper into "why the problem happened". The buffer may be undersized because the programmer isn't educated well enough, because (s)he was too tired to think properly, because the spec specifying the maximum size was wrong, etc.

Each reason leads to a different fix, often through more iterations of asking "why?"

The programmer's lack of education can be caused by a problem in hiring practices, by low quality of internal courses, etc.

The programmer may be tired because of long working hours, because of troubles at home, because (s)he goes out too much, etc.

That buffer size might be a typo, an OCR error, somebody declaring the maximum size to be N because, otherwise, the data couldn't be fit in the current hardware, etc.

Each explanation comes with a different way to prevent the problem from occurring again (something that traditional debugging does nothing about. Initializing that variable makes the problem go away, but in itself does nothing from you writing code with initialized variables in the future)


Debugging is different to looking at the process and identifying how the bug even got there through the lifecycle of the module at hand. The article to me seemed to be talking about that rather than debugging a live issue. Debugging in local is exactly the same as measuring things as you work, I.e. The width of a bolt and what the error is. Incidentally, debugging in local should be replaced with unit tests so you can get a consistent automated check. Debugging is only really useful when you don't really understand either an algorithm or set of frameworks. The equiv in eng is to just start throwing different components together and then executing them to measure what the effects are, which by the way I doubt would ever be done using materials destined for live.


Humans in advanced production lines are "actuators". They are not there to think, but to replace an hypothetical machine that it's not possible to build yet. We have those magical machines in software, we call them compilers, interpreters, operative systems, file systems, etc.

I still have memories of my pass by Uni and how a Software Engineering professor constantly refered to programmers/developers as "coders" or "monkeys". He never stopped of repeating to us, day after day, that we shouldn't program ever. Our job as engineers was to plan, design, assess and manage those "monkeys".

More than 15 years of professional career later, my initial suspicion have been confirmed one time after another: those professors from Uni need a good deal of fresh air and to get in touch with the reality, instead of spend the day pontificating like celibate catholic priests about what you can and cannot do in your bedroom.

I know that many people here love Dijkstra and people like him, and even enjoy to feel their intellectual whipe in their mind, reading their essays and feeling bad about themselves as the essays repeat constantly how that everything is wrong and broken in the software profession.

For those people, I suggest to read "To Engineer is Human" by Petrosky and learn about how real engineers really work.


> Humans in advanced production lines are "actuators". They are not there to think, but to replace an hypothetical machine that it's not possible to build yet. We have those magical machines in software, we call them compilers, interpreters, operative systems, file systems, etc.

Unfortunately this view of people that work in manufacturing is pervasive, but it is also incredibly naive. That view is essentially the reason why American car companies (along with many other industries) got their clocks cleaned by the Japanese car companies over the last 3 decades. I've been involved in a lot of companies where we worked really hard to correct that mentality and the results were always obviously positive, but incredibly difficult to achieve. Engineers just love to look down their noses at everyone else.

There is nothing worse than some new kid that just graduated with a BS in engineering pretending like he's smarter than a guy that has been building the product we're designing for 30 years. Unfortunately some people never grow beyond the 'new kid' stage.


I don't get your point.

My comment tried (possibly not very clearly) to talk about the differences between the other engineerings and software. People keep thinking that programmers are like factory employees. And that's so wrong. Each engineering endeavour is different. In software, our factory workers, are the tools that do the job. What in other places is designing in software is just programming. In my view (and I'm not alone in that) people keep applying the manufacturing metaphor to software for the wrong reasons.

In construction, the distance between design and implementation is so huge, that the construction workers save or doom construction projects all the time. Spend some time in a big construction site, in their meetings and you will discover why those projects take so long to finish. Hence, they are important because what the designer design is just an specification for the project, it's a map not the territory. An ideal brick in a CAD program is not a brick. Your perfect design for the AirCon machine in that room got destroyed by an in place decision by a worker of changing the wiring in a wall, ignoring the plan. Through tremendous effort during centuries construction has accumulated more or less accurate models for bricks, but the workers do so many things that the architects doesn't know how to do that the distance persist, and my times just ignore them.

In manufacturing, you have the Ford model of car manufacturing. It has being employed in nearly every single factory in the world. Yes, the Japanese empowered their employees more than in other countries but hey, they took note and introduced some of the practices. As of today American companies are as competitive as the Japanese ones without all the mumbo jumbo and black zen magic.

My point here is that each engineering is different. Different factors forced us to discover what work for them and what doesn't. That knowledge was accumulated in the industry and depending on the engineering, the feedback loop from the industry into the University could vary a lot. In software, the distance between the two is huge, due to the lack of transparency and secretism that most software companies exhibit.


I guess that treating factory workers as "code monkeys" is also wrong :-)

I'd like to work in a flat(ish) meritocracy. Maybe I have to start my own company...


Ironically, the people above the engineers consider the engineers to be monkeys too, and so on up the line.


If you treat people working on the production line as mere automatons, human robots, you're a fool.

Here's a good article explaining how Toyota make use of workers to develop new ways of building and improving construction:

http://www.japantimes.co.jp/news/2014/04/07/business/gods-ed...


Comparing the production line for F1 with your average software shop is an unfair comparison.

If you are ready to pay the same amount of money - you can get the same level of quality in software too.

The problem is most people do not even think software should cost anything.


Not only that... I don't think they will "refactor" the engine in the middle of the production, or include new features in the transmission before a deadline.


The boss saw an article in "racing weekly" about using square wheels and our salesman told him you'll have it done by tomorrow.


Exactly. We know how to build reliable (nearly) bug-free software.

The problem is, for most things, it's too expensive. What we have is good enough.


> We know how to build reliable (nearly) bug-free software.

Do we? Let's say, starting tomorrow, you have to write rocket-guiding software. If your program has bugs, you will be summarily executed. What are the odds that you will survive the process? I wouldn't estimate mine at anything significantly above zero.


How long can I take?


I have no idea how much time this would take in the real world. Pulling a number totally out of my ass, let's say five years. You may not use subcontractors to develop the software.


Well, then I'm pretty confident I could survive at least five years... :P


Which I guess is enough time to buy life insurance. At least your family won't have money problems. :-)


To be honest, for commensurate reward, I'd probably be sufficiently confident that I could get something done bug-free in 5 years if it was something that I felt could hack out bug-full in a week. Much of that time would go toward task-specific verification-related tooling...


Its not f1 its small batch production of sports cars - f1 you effectively build a pair of new cars for each race/test.


Software is created once. If it runs on more than one computer, it's just copies of the actual result of that one time creation. If it's improved with new features, then that's a new unit, itself created once. The effort goes into the creation of the one thing, not the copies. Version 1.1 is not version 1.2.

This observation is often used to point out that "software is different" from cars and other hardware. Cars and hardware have lots of chances for defects, but software copies are essentially perfect. Assembly line products are not copies like a package or disk is a copy; each finished unit has lots of effort in it.

But if you still want to use the hardware example, find what the equivalent products are. A lot of effort goes into physically assembling a car, and then the car cranks out transportation.

A lot of effort goes into physically assembling a software team, and then the team cranks out software.

And while of course there are profound differences between a car an a human team, it might be useful to put engineering resources into designing and producing a team.


I try to take this kind of approach with my own work. My workload is really light, so every task I get, I take the opportunity to dig into the task and the business needs surrounding the tasks so that I can come up with a solution that works, is stable, can be extended when needed, and doesn't require a lot of maintenance.

When an issue comes up, the first thing to do is to put together a workaround, a way for business to continue in absence of a proper fix. I then, without the need to rush, dig into the issue, figuring out a way to reproduce it, then isolate it, then fix it in a way that makes the whole system more robust and not less.

Having worked on the codebase / infrastructure this way for two years, my workflow is pretty-well refined. For my current issue, I'm devising a command-line tool to download and parse the log files to give me exactly the info I need to run down web issues. I can yak-shave to my heart's content.


Always thought formal methods were an attempt to achieve this.

However the biggest obvious difference that prevents dev becoming like an engineering assembly line is that the engineers on the line are not surrounded by car customers changing how the car should or shouldn't work/look/cost every two weeks.

Not that cleaning up the actual process of development so its more easily and naturally verifiable wouldn't be a good thing.


I run a company which develops medical devices, many of which combine mechanical, electronic and software engineering. It's well known (and a running joke) that software engineers from non-medical fields think it is acceptable to ship their product with serious defects. It takes a long time and a lot of training to get them up to the level that we need, more so than for mechies and sparkies.


Do you have any solid books etc for the kind of processes you use?


For software in medical devices, the process that's accepted by the regulatory authorities is ISO 62304 http://www.iso.org/iso/catalogue_detail.htm?csnumber=38421

That's a good start


I think the author misses the point of their own story, this isn't an software vs regular engineer thing, this is a problems with monotonous static jobs.

His friends job is to fit one small body panel on the chassis, over and over again. So when he gets one that doesn't fit his instinct is to make it fit as opposed to asking why it doesn't. His job isn't questions, his job is to fit it on.

When he was working in his own body shop this hacker wouldn't have stood for his tools putting out misprints and mistakes, but then his job was managing the whole shop.

Static monotony gets static monotonous thinking.


The engineers probably realized that cutting and bending the body panels to make them fit, weakened the structure and invited corrosion. That's why they spent days trying to figure out how to correct it.


This is more about production lines vs one offs than engineering vs hacking.

Civil engineering is real engineering but jobs are usually unique. Problems in civil engineering projects are usually 'debugged' and 'patched' on site without going all the way back and revisiting the original design.


Are you suggesting that "rolling with the punches" on a civil engineering project is happening without going through some sort of an engineering change order process?


There's an interesting story in Atul Gawande's The Checklist Manifesto about fixing a skyscraper problem that threatened the whole project - if memory serves it was moving too much. That was definitely a 'one-off' solution. It's used in the book as an example of how the automation of the building process freed up the real engineers to solve the specific problems to that site.


It was the Citicorp building in New York City. In 1978, a year after it was built, Diane Hartley (a student at the time) discovered that aerodynamic calculations of wind loads on the skyscraper had been vastly underestimated. She contacted William LeMessurier, the structural engineer, and he re-did the analysis considering quartering winds. When it became clear the building would structurally fail in the next big storm, emergency repairs were done, welders working in secret at night, on almost every floor, installing extra cross braces. The building was saved and became an engineering ethics example.

Diane Hartley got no credit for her rôle in saving the building until nearly twenty years later.


Thanks! My copy was lent to someone a while back...


No, there will still be engineering input but the solution will be akin to the assembly worker hacking the part into place.


Production line itself is the software. The product is the data.


Engineering is at its heart is both an applied science and a methodology for it's application. So yes, software engineering is a valid form of engineering. We're taking computer science principles and applying them to a practical application. We don't write papers, but we use the knowledge to drive industry.

So there's this topic of failure analysis, and we don't always do this well in the world of software engineering. We don't like to analyze our failures probably as much as we should, possibly because we find them embarassing, or perhaps they might be bad PR if they were known. So we just want to get the fix done and move on.

But the thing is for most software it doesn't matter how we deal with bugs, and that's because they cause minor problems. But when it comes to loss of life or money, bugs do matter. Doctors and lawyers have professional boards that oversee actions and disputes. Airplanes and Trains have the NTSB that oversee accidents and engineering failures. Civil engineers study the collapse of bridges as part of an effort to make bridges safer.

Failure in those cases are bad.

If my software solution is off by a few bugs, meh, so what? It doesn't cost the company money, usually. And it probably didn't kill anyone, and it probably just fixed some small css issue that looked great on firefox, but bad on chrome...


Software engineering isn't an assembly line. In our case, the assembler, is quite literally a program that virtually never makes mistakes. When a program won't compile, as would be analogous to this McLaren case, we do stop everything and figure out why it won't.


> I wonder how many years until software production will reach the level of maturity where we don’t just fix bugs but, when bugs happen, we stop in our tracks, shocked that bugs happened at all in the first place

This happens all the time in software engineering. Less because we're shocked that bugs could happen, and more because when you are working from the assumption that what has happened should not be possible, when it happens, that means your mental model of what is happening within the system is wrong in some way.

There are the bugs you understand (or think you understand), and those you can safely allow workarounds for in most cases, because you understand the scope of the problems. Then there are the bugs that you don't understand, and it's important to be very careful and provide them the attention they deserve, because while they may be simple fixes, they could just as likely be an insidious case of data corruption or loss that you weren't ware of.


The big difference is that the software industry still gets away with the consumer paying for defects. This is where the auto industry was in about 1952.


An assembly line isn't comparable to software engineering - it's closer to a compiler.


Having done both mechanical engineering & software engineering professionally, I think the assembly line is a lot closer to software operations.

If you take the same code and put it through a compiler over and over, you will get roughly the same result. If you run the same code in production day after day, some days you will get different results because of the interplay between various systems or conditions.

If you take the same design and put it through a computer controlled machine you will likely get the same results unless something breaks. If you take the same design and put it through an assembly line, you will get different results on different days because of the subtle interplay between various variations in processes.

The distinction is critical, because while you generally qualify a single operation in an assembly line much like we would run tests on a piece of software, you need to monitor an assembly line just as we need to monitor our software in production.


Yes, that is a better analogy.


I've been in dev shops where, when a NULL turned up in a database column where it wasn't expected, the site was rolled back to a previous version, people were called in on the weekend, and a group of engineers spent hours trying to figure out why that NULL got there and how to stop it from happening again. It's not like a DBA couldn't have just run a quick UPDATE query to clean up the null in a matter of seconds...

I see this as indicative that debugging production issues happens in non software engineering contexts as well, in spite of the fact that we're always told that their processes and procedures are so much more mature and robust than ours.


"He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible..."

Software engineering is so far removed from other facets of engineering that comparisons to poetry make for better discussion.


I think the author is mistaken. No two car companies or software companies are the same.

The software written for mission critical systems, such as nuclear reactors and even cars (think direct injection, ABS, AWD and now auto-cruise control systems) goes through an extensive series of evaluations. Same goes for the software that helped New Horizons reach Pluto and beyond. It's just not possible to achieve such feats without a thorough software development process.

Agreed that the same cannot be said for startups, but comparing a startup to a multi-billion dollar car company with tonnes of regulatory and compliance checks is not a fair comparison to say the least.


I think that this also happens in every place where mission critical software is developed.

But here is our dirty little secret - none of the software 99% of us write is mission critical. And probably 75% of it is never used.


This was actually "debugging the program", as the running program is the manufacturing process, and the coding and development is the engineering and design.

Still, software systems are both more fragile and can easily be more complex than real-world systems, so the naïvity is actually trying to apply engineering processes with rigid "design" phases and handoffs to complex software processes.

(As seen in the software, in just about every car.)


As a civil engineer I am seeing a lot of ignorance on the part of the hackers towards the physical sciences, especially the author.

Software debugging has 2 parts:

a) find the bug - lucky for you a machine does this step automatically

b) fix the bug - once the bug is found this is self explained

Industrial debugging has 2 parts:

a) find the bug - requires going on the production floor and measuring things

b) fix the bug - once the bug is found this is self explained

How are they different?


I'm a little confused on what you're trying to say. If I read your post correctly, you think software bugs fix themselves because computers find them automatically?

Neither is true.


Dick Smith screwed customer with gift vouchers - you used to work for them - that should be a career ending move for you as you worked for such a scummy organisation!!!


Wow, have I stung such a bee with you that you're following the comment up a second time, in the wrong thread?

I've answered you in the other thread.


Seems like the hacker wanted to patch the problem, whereas the engineers wanted to fix the root cause. I don't think either is wrong; you might want to patch the issue first and then fix the root cause. Or you might suspect the root cause is simple enough to fix and not spend extra time on the workaround. Or you may never find the root cause.

The reason why there's a lot of bugs in software is that software is soft. People know that changes can be tested with relative ease, and they also have a good idea of how wrong things will go if they do go wrong, and quite often an error is tolerable. This means we can often play a bit more loosely with correctness and reliability in order to gain some speed.

There's plenty of software problems where this isn't the case, and you end up with a very different process.


The OP is mistaking manufacturing for design. The manufacturing phase of software - compilation - is hugely efficient, reliable, and cheap. The industry certainly does "drop tools" and see what has happened when a compiler like GCC screws up, which is a better analogy here.


More on this idea, a talk I came across a while back.

https://youtu.be/9IPn5Gk_OiM


When issues occur in my code, I do identify the root cause but then try to see if it may occur elsewhere in my code and how to prevent similar issues in the future. And I know lots of coworkers who do the same. I don't think this is a major engineering insight but common sense.


Some big software companies operate like the engineering companies described in the article but it's often not a good thing.

I've worked for a software company where it would take almost a week to add one new property to a JSON object because there was 3 different sets of tests I had to write/update across multiple code repos and it involved the coordination of multiple teams across multiple cities.

I've also worked for startups were it would take me an hour to make a similar change on my own but with lighter (albeit just as effective) test coverage.

It all comes down to company processes and approach to risk.

I think the lean approach is much more efficient if you have a good/experienced engineering team which you trust and who actually care about the product.


The assembly-line story would be way more like software dev if it had a prelude where the installer repeatedly pointed out that the panels didn't fit and was told "well, we bought them from the vendor so use them anyways". :)


> I wonder how many years until software production will reach the level of maturity where we don’t just fix bugs but,

If Google tried to figure out why the hard drives fail in every server, how many years before we could type McLaren in the search box ?

Instead they embraced a different truth - things fail, so we'll build our systems to handle that.

And I think this is the reason why we can get away with bugs in software - because we can isolate and prepare for components failure or imperfections.

So instead of trying to absolutely NOT introduce any bugs, we can instead focus on making our systems fault-tolerant and introspective and handle all errors in a graceful way.


How exactly do hard disk failures support your argument for tolerating bugs in software? Seems like a non sequitur.


Cryptocurrency bugs can be quite expensive. Maybe that will make software engineering more like other types of engineering.


Formal proof in software engineering has been a thing for a very long time, no need for cryptocurrency for that. Control systems for rockets, autonomous subways [1], and many life critical devices are built using such methodologies.

In a startup environment you are first and foremost trying to figure out what your customers want. Formal proof is not going to help solve this problem. I'm sure McLaren is also not spending days trying to understand why a panel is off by a few millimeters when they assemble a prototype.

[1] Paris subway (line 14) is controlled by a software written using a formal proof method called the B-Method: https://en.wikipedia.org/wiki/B-Method


The problem is that unlike with "regular" engineering, there are mathematical theorems that basically say that writing bug-free code is somewhere between prohibitively expensive and impossible. The cost of proving a non-trivial program property has a proven lower bound of O(S) where S is the number of states the program may take; alternatively, we say that proving correctness is at least PSPACE-complete (if not EXPTIME) in the size of the program. Not saying we can't do our best and use various tools, but this cost cannot -- in general -- be reduced to anything acceptable, and therefore we will never be shocked when a bug does occur.

It is easier to achieve string AI than to write bug-free code, and even that won't help: a super-intelligence will still have bugs in the code it writes.


This is what happens if the compiler glitches or the os fails; people dig in to find out why. Normal code is much more like doing a service - if the breaks don't fit it's because the person fitting them has screwed up, and you take them off and fit them (or a new set) back on again. Maybe there is an inquest in the team about using spanners properly, maybe the old fella in the shop tries to communicate some wisdom about checking things first.

At the end of the day it's about money and risk.

But, the interesting difference between software and mech eng isn't the attitude to faults, it's the fact that software isn't designed to work within tolerances. We don't build software around dimensions or stresses, we expect it to work always like maths, but it isn't, quite.


What's a meaningful definition of "tolerance" for a discrete system?


Don't fix a production issue by 'hacking' in the solution live on production. Figure out what caused it and fix that. At least, that's what we do for our software.


Having worked as software dev for formula 1 and as hardware engineer ditto, I can verify this.

You see all. You see people hacking a bugfix in one day on the F1 site, but this fix will not survive. Some shops restore from their backups nightly, you will certainly face a lot of meetings and reports to get your one-line fix back into production. Even on R&D problems, which mostly describes F1.

This case reported here was on the assembly line, there are different rules.

If a strange misproduction happens you certainly want to be able to evaluate the riscs properly. The analogy is a Heisenbug in the production compiler. How often does it happen, how expensive is the problem, how expensive is the fix. There's not one car on the risc, there's the whole line in question. The new robot? The SW or the HW? Wrong measurement, wrong planning, broken sensor, broken motor? Or a real Heisenbug, which appears in HW much more often than in SW. Magneto/electric cabling problem. Or sometimes even light. The cable guy bending an overlong fibre cable <30cm radius. A fibre with a bad cut at one end causing all sorts of crazy effects, and crazy hackish fixes. Missing CAN terminators (analogy to missing SCSI terminators on harddrive chains) in certain F1 cars. It might work, but I wouldn't want to sit in this thing. This new 140.000 rpm high-speed rotor behind my back, making the driver uncomfortable, who is starting making bad mistakes out of fear, and rather pulls this thing out. Relations to completely crazy effects from outside, such as a cleaning lady turning on a bad fan, causing a jitter or spike, having some effects elsewhere. A bug, a real bug, in the system, esp. in Asia or Australia.

I see not much differences from SW and HW engineering. But the biggest difference I know about is the methology.

Agile in car manifacturing or HW engineering at all only happens at Toyota, nowhere else. Properly planned waterfall, with excellent and high powered middle management is the norm. If in Asia, Europe or the US.

There do exist some very small HW shops, with 1-4 engineers and 2 managers/sales people, and 2 support. There the development is of course engineering driven. Or there are some exceptions, such as Toyota or VW with crazy assembly optimizations, or e.g. Honda where their boss is one of the best engineers (they are lucky). But they eventually they ran out of luck with their funding and had to sell their very best test site, which was then taken over Daimler. That's why their are leading the F1 currently.

Another difference is proper training. These good engineering companies train their people a lot. Maybe 10x more than in normal SW engineering.


if software were like that hacked jeep, it wouldn't be so bad. lot of software looks is like this: https://benahrens.com/wp-content/uploads/2014/03/how-to-be-m...


One not-terrible way to start on your bug-free crusade would be to switch to functional programming languages and styles and "pure" functions.

Having done 10 years of OO before I began coding in a functional style and realized all the benefits, this is my experience talking.


Hacking != Software Engineering




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: