Hacker News new | past | comments | ask | show | jobs | submit login
Yagni (2015) (martinfowler.com)
101 points by mooreds on Aug 12, 2018 | hide | past | favorite | 59 comments



Like any generalization, it's not always true.

Boss decides to add y,x,z options "just in case"? - YAGNI

Engineer wants to get the datamodel correct up front to avoid costly rework and data migration in the future? - Not YAGNI

Fowler fortunately states this distinction: "Yagni only applies to capabilities built into the software to support a presumptive feature, it does not apply to effort to make the software easier to modify."


Engineer wants to get the datamodel correct up front to avoid costly rework and data migration in the future? - Not YAGNI

The key here is avoid costly rework. The counterpart to YAGNI, is Avoid Painting Yourself Into A Corner. APYIAC. Actually, that is first and foremost, and YAGNI is secondary. How do we know this? Because we know change is universal and unavoidable. Planning as if change doesn't and will never occur is like planning without acknowledging weather.


That's a bit of a cop-out, though. How much software development isn't either implementing the actual features or making the software easier to modify?

This one is a bit like the TDD advocacy that says you shouldn't need to do much design work up-front because you can let the tests drive the design along with everything else.*

*Except for the part where you refactor your code, which by definition shouldn't be changing its behaviour and therefore can't be driven by adding tests to your test suite, where exactly the same design issues will immediately arise.


I always thought tests were supposed to make refactoring easy since they ensure correctness. In practice I've only seen that a couple of times since the tests usually end up tightly coupled to the implementation.


It's easy for tests to pick up structural artifacts from the code, creating friction for making changes instead of reducing it.

At the end of the day it's a matter of the quality of the tests. Bad tests can be worse than none (because at least you know to be afraid).


In my experience, a great deal of software development, or indeed much human activity in general is wasted on things no one wants.


Except for the part where you refactor your code, which by definition shouldn't be changing its behaviour and therefore can't be driven by adding tests to your test suite, where exactly the same design issues will immediately arise.

By comparison to Smalltalk, most environments seem broken to a seasoned Smalltalker. The main refactoring tool in Smalltalk would happily refactor your tests, so you'd just do your refactoring and re-run the tests. No interference with any reasonable implementation of TDD needed at all. You just counted refactoring as writing the tests and then immediately ran the unit tests.


Perhaps my analogy wasn't clear. I wasn't really commenting on refactoring. I was arguing against the common claim that you don't need to do much design work if you're using TDD. In reality, most of the same work does still happen, it's just being brushed under the carpet to make the process look simpler and more efficient than it actually is.


> How much software development isn't either implementing the actual features or making the software easier to modify?

The key is the adjective, presumptive. YAGNI is about features that aren't needed now and it's uncertain whether they ever will be.


This seems awfully close to a No True Scotsman argument. The Fowler quote was,

"Yagni only applies to capabilities built into the software to support a presumptive feature, it does not apply to effort to make the software easier to modify."

If we exclude features we definitely do want from consideration, so we're only talking about presumptive features, and if we exclude actually implementing those presumptive features, so we're only talking about "capabilities built into the software to support a presumptive feature", and if we then further exclude things that "make the software easier to modify", then how much development is really left for YAGNI to affect?

I can't speak for anyone else, but when I write new code, it is generally either implementing some specific functionality that we want right now or improving the common foundations of the system in some way. The usual reasons for the latter would be either gaining some immediate benefit, such as improved performance or robustness, or making the code easier to modify so we can then gain some benefit. By Fowler's argument, I don't see much of a middle ground where YAGNI would mean anything, unless maybe a developer is in the habit of starting to develop code for some specific feature when there isn't actually any requirement for it yet, but why would anyone do that anyway?


Fear. I worked with a programmer that did it a lot. What if the customer wants this, or what if the customer wants that? And his reason was just what the article highlights: "I programmed it now so I won't have to program it later." It doesn't make sense unless you are trying to get a program done for good and hope never to have to work on it again afterward.


I'm still struggling to understand how that happens. You've got a professional programmer who's just randomly writing code for imaginary requirements they made up? I mean sure, obviously that's bad practice and unnecessary, but who does that, and why do their managers and technical leaders allow it?

I've always taken YAGNI to be a cautionary principle about over-engineering and over-generalising software designs before there's sufficient expectation that the extra flexibility will justify the implementation cost. I'm not sure I've ever seen it used (or needed) in the context of someone just randomly adding arbitrary functionality because they felt like it, and it had never previously occurred to me that someone might restrict the term to only that scenario as Fowler seems to be doing in the quoted comment.


The features aren't random. They're usually related to other features. They might be variations on a feature. For example, what if the user doesn't like the apps fonts and colors? Let's let them customize the look and feel to whatever they want. Or we've made this app that let's you send text messages to people, but what if it needs to be sent to an international number? Suppose this feature is right now unnecessary because the app is just for sending text messages among employees at a company that is situated in Wichita.

These aren't crazy features, but they're unnecessary and may never be necessary. As to "why do their managers and technical leaders allow it," often they are the very people who spearhead it. Or in my case, our team was embedded in a non-software company, and our leaders didn't understand exactly what we did, so they were hands-off.


Fair enough. If you've seen examples of that sort of thing actually happening in practice, I defer to your experience, and we can certainly agree that they are unnecessary and effort shouldn't be spent on them!


As a resident, Wichita* :)


I'm not satisfied by his criteria.

There's a particular design failure mode where an emphasis on 'easier to modify' results in a codebase that's difficult to follow. You're making some future code change easier at the cost of all current code changes.

You can infer a concern about overall cost of modification into his statement, and claim that such a person is increasing the cost of code modification, but it's dangerous to leave it unsaid.


The best examples of YAGNI are usually when you say, "I'm adding this data structure even though we don't need it. We're going to implement feature X next month that does need it, so I'm adding it while I'm here". It happens quite a lot. And then feature X gets cancelled and you end up with this weird data structure in your code that doesn't do anything except complicate the code base.

There are other, more subtle issues as well. For example, you might want to add an iterator for something, even though your current task doesn't need iteration. However, you have strong feelings about how the iterator should work, so you add it because it helps you take control of the development. In those cases, it's easy for the rest of the group to say, "Thank you for that nice spike. However, that functionality is YAGNI right now. Let's take it out and we can discuss how it should operate when we're working on that feature". Basically, it's a way to stop people from bullying by intentionally creeping the scope of their work.

And finally, sometimes you honestly think something is needed and someone points out that it isn't necessarily true. It might be needed, but without getting more details it's hard to know right now. In those cases it's often better to remove the code as "YAGNI", keeping the design simpler. This allows it to move in multiple different directions -- some of which may obviate the need for that code. If you leave the code in, then it becomes a bit of a manifest destiny -- you write your code consistent with that extra complexity, even though you may not need it at all.

The above hopefully answers your actual question. Below is a very long discussion of my opinion of how XP development should ideally be practiced with respect to design. I hope you find it interesting -- or at least enough of a departure from what you've seen before to make it interesting for you.

Avoiding YAGNI, TDD and avoiding up front design all dovetail in an XP style development team, but my experience has been that the dovetail becomes a monstrosity if you are not actually doing XP. This is possibly why you seemingly have a negative opinion of it.

The main thing I tell people when I'm explaining XP is that "no big design up front" is not usually discussing the kind of design that they are thinking about (for example a week or so of trying to understand how to implement something and coming to agreements on design by drawing diagrams or writing example code). It's more about avoiding the kind of old-school design phases that lasted 6 months or more. When I was first starting out, we had minimum 18 month development cycles (and at one place we had a 5 year development cycle!). So we did 6 months of requirements analysis, 6 months of design, 3 months of development and 3 months of QA. But in reality, when you started development, you realised that the chunk you bit off was so big that it was basically just fiction. It turned into 12 months of patting ourselves on the back for being brilliant analysts and designers followed by 3 months of hacking and 3 months of absolute pure panic.

XP attempts to reduce the size of these phases and to distribute them more evenly into the development effort. It does not attempt to avoid them. With our old-school ways of doing things, our analysis and design was completely wasted because it was too disconnected with the actual development. With XP, you try to (as much as possible) marry the analysis, design and coding.

Of course that's usually not possible. One of my biggest problems with "Agile" teams is that they do so little analysis up front that they have no idea what they are building -- and it ends up being the same big hack we did in the old days. They avoid the panic because they just don't do QA and bury their heads in the sand thinking that their hack-a-thons are producing brilliant solutions.

Ideally you have a good 2-3 month backlog of stories that have been thought out enough that it only takes a day or so to implement each one. The key is that these stories are changeable and you "groom the backlog" at least once a week to see how your understanding of what you are building has changed since the last grooming session. By the time you commit to a sprint (very complicated discussion, but I like 2 week sprints), all of the stories should be implementable with only minor details remaining. While a "story" is a "reminder to have a discussion", if you haven't had the discussion before you started writing code, you are pretty much screwed.

Additionally, there is no way you can write stories of that size without having a basic understanding of the overall design that you are going for. Again, if you get to the point where you are going to put the story into the sprint and the developers don't know (or don't agree) how it should be implemented, then you need to insert a spike story instead.

This handles the "strategic" level of design. There is no "design phase". Rather it is spread out over development and updated over and over again. Any overall guidelines should be captured in a "coding standard" (which is not just a document describing how to indent your code, but rather a document that describes the overall "flavour" of design and the kinds of typical solutions that are preferred).

"Tactical" design is dealt with very differently, though. The idea is that code is design. It's just a very formal description of the design. Importantly, it's the only design that counts. It doesn't matter what your design diagrams look like if your code is not implemented that way. In most XP teams, diagrams, etc are usually considered transitory artefacts. The design is embedded in the code and you should write the code in such a way that the design is easy to see. That is not to say that you shouldn't draw diagrams, or utilise other kinds of design tools! It's just that you shouldn't attempt to persist them in any volume or (heaven forbid!) generate your code from them. The maintenance is just too expensive and error prone. Get used to understanding design by reading code -- you have to do that anyway.

The design aspect of TDD (if you are reading it as Test Driven Design) is a particular method. I could easily double or treble the size of this post and still leave you none the wiser. However, the basic thing to understand is that test code is very hard to write if you did not provide enough access in the design to write the test. Very, very often, as you expand the system the access that you provided for the tests makes it easy to access from the production side. Similarly the design is malleable because there is less coupling, though sometimes at the cost of cohesion. One very wise person (whose name I completely forget) suggested that a good way to understand how this works is to remove all global variables from your code. After doing this, start removing as many instance variables from your objects as you can. Once you've done that, consider ways of removing even more. Observe how that impacts your ability to easily write tests and to refactor the overall design of the system.

Although you always have to consider your strategic moves in design, coding (and especially TDD) is about tactical moves -- If I do X, then it allows me to do Y. At some point, you realise that this gives you an ability to move strategically. But that strategic decision is almost always better to discuss outside of the code (or in a spike), because it affects the whole team.

One of the main goals for XP style development is to end up with a design where it is easy to change direction. This is great for organisations where changing direction is important. Sometimes you are in a big stodgy company that literally doesn't care what the software does and never asks you to change directions. In that case, I probably would not worry about it so much. I've worked in groups like that and it doesn't matter what kind of lode stone you strap around your neck -- you won't get into problems. But in most places (and especially startups!) being able to change directions at the drop of a hat is extremely valuable -- sometimes to the point of rescuing a company. When you have discussions like "I know we asked you to do Y 3 weeks ago, but we were completely wrong. We need Z now. Is that possible?" and you reply, "Yep. No problem. Changing software is our job", then you know you've got it right. Getting it right is super hard, though ;-)

I think one of the reasons you don't see people talking about strategic ideas of design in XP is simply because they take it for granted. If you don't have a strategy for what you are doing, you are doomed. This is one of the reasons why I don't really like doing outside-in approaches, except if I'm spiking. Certainly, I consider advice to always do outside in as being exceptionally bad advice. It's a great technique to help you discover what you need to do, but I don't recommend it otherwise.

Anyway, I realised I launched a book off of your 2 sentence off-the-cuff comment, but I often find that people take popular misconceptions of good ideas and then treat the good idea as if it were the equivalent to the misconception. Now, I'm not going to say that my interpretation is mainstream at all. I'm also not Kent Beck, so I have no authority to say what XP is supposed to mean, but especially when you think an idea won't work, it's easy to choose the worst possible interpretation of it as being the most authentic. Hopefully this gives you food for thought.


The best phrase I've heard in this space is "Reversible decisions".

Any decisions that can be unmade later with relative ease, should be made as quickly as possible just to get them off the board (opportunity cost). If Do X vs Don't Do X is reversible, then YAGNI can apply - do whichever is cheaper and fix it later if you're wrong.

What pisses people off is when people try to block conversation on an irreversible decision by shouting YAGNI at you. There are a lot of adjectives for this behavior and none of them are good.

Reversible Decisions+YAGNI is pretty much your answer to the bikeshedding problem. Instead of wasting hours trying to decide what color to paint the shed, just budget painting it 5 times, and get back to figuring out if you can find a cheaper vendor to provide you millions of dollars worth of ultra high quality concrete. Or whether that crack the geologists found in the bedrock means you need a different site altogether.


An essential distinction is the difficulty differential.

How much more expensive are options X, Y, and Z in the future compared to now?

How much more expensive is a correct data model in the future compared to now?

If it has a 50% chance of being needed in the next two years and will be 10% harder by then, YAGNI.

If it has a 50% chance of being needed in the next two years and will be 1000% harder by the then, do it now.


Generalizations are always wrong?


The problem I've seen when yagni is applied is not building something now leads to a costly data migration down the road when you actually do need it.

I run a site that stores audio files. For the first half of its life, it simply stored the URL of the file on S3. I could have used objects representing audio assets with metadata to store the reference, but I didn't think I'd need it. When I was building another feature that _did_ need that functionality, the migration took far longer to write, test, and run than the process of building the asset objects feature and the feature I needed them for combined.

Additionally, the new system made it easier to debug customer issues. I didn't know that I actually needed this feature the whole time.


I would still argue you did the right thing. Sure it would be nice to always have enough domain expertise to pick the right solution in the first place, but you didn't have that knowledge at the time. This experience cost you a little pain but your system is working and you are now smarter.

A developer will be presented with thousands of situations like this across her career. If she takes the YAGNI/under-engineering approach, it almost always ends up a similar story - some growing pains, some re-engineering, a lesson, and a working system.

On the other hand, if she errors on the side of over-engineering, there's no bound on how far off the rails she can go. At best she ends up with some extra useless code that someone will delete later. At worst it becomes a time sink that consumes the entire budget and schedule.

YAGNI forces you to always hew to the side of under-engineering. And if there really are design considerations that you know are easier to bake in early than retrofit, it's almost always possible to arrange dev stories in such a way to force those design decisions out early.


> The problem I've seen when yagni is applied is not building something now leads to a costly data migration down the road when you actually do need it.

I see YAGNI fail when it's used an excuse to not think about the future at all, rather than a reason to delay building parts of a system that aren't needed yet. Relative to building software, thinking about it is cheap.

I usually think about:

    * how needs might change in the future
    * what would have to change in the current system 
      in order to satisfy those future needs
    * how difficult changing the system to meet 
      those future needs would be.
Ultimately, I know whether the current software design can easily adapt in the future.


With hindsight, all yagni justifications look either correct or stupid.


Yeah, though this is one area experience really helps


The only exceptions are APIs and data serialization/storage. Your cost of change on these are huge, so better to eat some extra cost now to future proof.


I'm pretty sure the original rule recognizes no exceptions, and certainly not on something so broad as an entire category. It may be that much narrower exceptions could exist, but future-proofing sounds like it could simply be justified as a feature actually needed now, if not just good overall coding practice [1].

The danger of allowing such a broad exception is that it doesn't take a huge leap to imagine a situation where it's used as permission to do something like a custom implementation of ACID in the app backed by a NoSQL key-value store (or "sharded" MySQL, before NoSQL was a thing).

[1] YAGNI isn't a standalone rule but one of many in the "XP" collection. Any one of them, out of context, in complete isolation, could just as (if not more) easily require a long list of caveats/exceptions to make sense in the general case.


Yeah, although there is a difference between planning for a specific future which may not arrive, and planning to make any future change less painful.


But what if you built out all the meta data but did it wrong, then you'd have to migrate from one complex model to another which is usually more difficult then migrating from a simple one to a complex one.


I built the same thing I'd planned originally, so the point is moot. YAGNI does nothing to stop you from misunderstanding the problem. And in the case of my problem, the metadata was fresh anyway. Generating it from scratch would have the same migration cost, but all the original tooling existed so I wouldn't be building the feature from scratch.

If the point is "what if the whole concept was wrong?" then sure, you got me. But the point is that I _did_ follow a YAGNI mindset and paid a considerable cost. Even if I had built the wrong thing, that doesn't negate the fact that technical debt had accumulated and prevented me from doing the thing I actually needed to do.


> YAGNI does nothing to stop you from misunderstanding the problem.

A lot of times it does because you usually gain more information about your users and problem domains as time goes on. So if you build a feature when you need it a lot of times you understand the feature better than before hand.

But I think we're all in agreement that if you understand a domain very well, think there is a very high chance a feature will be required, and it's a good deal more expensive to build later than now than you should build it now.


To be clear, did your debugging problem come down to the fact that the metadata for the objects was on "the other side of" the S3 URL, being held as object metadata headers in S3 that you would need to do a HEAD request to retrieve per object, rather than as e.g. columns in a local RDBMS—such that you couldn't do aggregate queries on it to figure out what a customer's files "looked like" in a statistical sense?

If so, I hate to say it, but doesn't YAGNI still apply here?

This was, essentially, a scaling problem: the O(N) time-cost of querying metadata for N S3 objects was too high. You still could have written code to query that metadata out the "naive" way anyway, and it would have worked for low N. But it wouldn't have worked "at scale."

But there are two [point five] ways to solve a scaling problem:

1. When you need to scale, write more code, customizing your logic to make it more performant, add locality or caching, etc. (This is what you did, and the way most software engineers think.)

1.5. Anticipate the need to scale, and write code "the more performant way" in advance. (This is where the YAGNI admonishment comes from.)

2. When you need to scale, attempt to find an infrastructure-level solution that involves writing no code. This is the way ops people tend to think, since they don't write code (or at least, they trust their knowledge of infrastructure solutions better than their coding abilities.)

An example of approach #2, in this case, would be something like "put Varnish between you and S3, and configure it to only cache HEAD requests (and to synthesize HEAD response cache entries from proxied GET requests, without caching the GET response itself.)" Then your existing O(N) S3-metadata-querying code would—after warming the cache—suddenly be faster; it would probably be fast enough to answer whatever sort of debugging questions you'd like.

The reason people say "YAGNI" is that, often, an ops person can take your developed software as a black box, and solve its scaling problems without touching the box. And this is often the optimal way to solve these problems: you probably can't write a caching layer for your web app, inside your web-app's process, that will work half as well as Memcached. Or a logging system that will work half as well as rsyslog. Or a web server that will work half as well as Nginx. Neither can the people who develop packages for your programming language's ecosystem. The real experts in "what you need for production-scale" converge on an infrastructure component and develop that, rather than contributing to FooLang's logging implementation.

And, since these scaling problems can be solved without ever touching your code, it's especially silly to try to anticipate scaling problems you might have and solve them in your code, early. The only scaling problems you should worry about at design time are the ones that can't be solved by IPCing infrastructure components together. (Such as, for example, the cost of a huge number of concurrent threads. Erlang/Akka/etc. exist because they solve a particular scaling problem that is intractable to anything but a process-architectural solution.)


> or at least, they trust their knowledge of infrastructure solutions better than their coding abilities

Or we trust our knowledge of such solutions more than even your (anyone's) coding abilities :)

I realize, of course, that you're using "trust" as a metaphor for expertise, as well as that we all have a tendency to be biased in favor of solutions that favor that expertise/familiarity.

The trouble is, programmers outnumber ops people by a vast, vast margin, especially here on HN.

> often, an ops person can take your developed software as a black box, and solve its scaling problems without touching the box.

Not often enough, not any more. It seems that this skillset has been sufficiently devalued in the industry that, however rare it was originally, it hasn't spread, which means there may well be a significant plurality of coders out there who have never worked with such an ops person.

> And this is often the optimal way to solve these problems: you probably can't write a caching layer for your web app, inside your web-app's process, that will work half as well as Memcached

This goes back to my original, partly facetious, comment about trusting anyone's coding. My real point is that I don't need any kind of faith in something like memcache, since I've seen it work many times before, whereas someone's coding ability (even my own) does require a leap of faith, since it's totally unknown how effective the effort will be or on what time scale.

> And, since these scaling problems can be solved without ever touching your code, it's especially silly to try to anticipate scaling problems you might have and solve them in your code

That's a message that's difficult to hear for management that's made up entirely of programmers (such as at many software startups, even grown to a larger size). Witness the willingness to pay huge premiums for cloud ("infrastructure as code"!) solutions, that eliminates huge low-level scaling opportunities.


> To be clear, did your debugging problem come down to the fact that the metadata for the objects was on "the other side of" the S3 URL, being held as object metadata headers in S3 that you would need to do a HEAD request to retrieve per object, rather than as e.g. columns in a local RDBMS—such that you couldn't do aggregate queries on it to figure out what a customer's files "looked like" in a statistical sense?

No. In this case, the metadata included some of that data, but it also stored information about past application state or data that was otherwise stored in the blob itself.

For instance, when a file was replaced, it created an orphaned blob in S3. Tracking that down meant using rollbacks in Heroku postgres on at least one occasion. Keeping track of which files replaced other files is new functionality that was not possible using any reasonable means, but would have come for free with the feature I designed.

Reading data out of S3 is painfully slow and I don't consider it to be a reasonable solution for anything that involves looking at more than one file. Consider the question "how many M4A files have a Content-Type set that isn't M4A?" Iterating hundreds of thousands of files and checking their magic number (ignoring ID3) and Content-Type is not only slow but also expensive. Storing the content type in advance would at least allow that process to be shaved down to a small fraction of what it otherwise would be. (and before you ask, file extension is a worthless heuristic when dealing with user-supplied data).

But of course, I didn't know that storing an object with metadata would solve these problems, even though I knew that I should do it.

Saying things like "put Varnish in front of S3" is suggesting putting your finger in the dike. The cost of setting up and maintaining Varnish (and paying for it) is far higher than just doing the thing the first time. And in fact, would have been far more costly than me putting my foot down and doing the one-time migration, as I did. Besides, our applications are on Heroku, where something like this would have been non-trivial anyway.

My point is that YAGNI purism optimizes for short term wins. If you never do anything until you can make a compelling argument that it's something you NEED to do NOW, you're going to end up with Dr Seuss-esque systems that do everything possible to actually avoid solving the problem itself, especially if there's always a (less than ideal) workaround. This is one of the easiest ways for tech debt to accumulate.


> The cost of setting up and maintaining Varnish (and paying for it) is far higher than just doing the thing the first time.

Are you sure? What is your time worth?

I mean, there's certainly a comparative-advantage thing here. A software developer setting up a Varnish instance themselves is going to take more time than a software developer writing code, because software developers know code and not Varnish.

But my point was that most software isn't run by single do-everything DevOps people. In bigcorp production-scale systems (where all this software engineering advice both comes from and is targeted toward), you've got a dev team creating a particular software component, and then either internal or external ops teams running it as just one component of their system/solution. Those ops people know Varnish better than they know code.

Picture a ticket triage that goes through a support person, and then gets elevated to an ops person. (Because this is what ops people are for, whereas the dev-team's time is far too valuable to the company to spend on things that could be handled by the pre-provisioned capacity of the ops team.) The ops person isn't going to turn around and ask a dev to solve the problem by writing code if he can at-all help it. The ops person is going to try to solve the problem themselves, with an infrastructure-level solution. And, given

1. the overhead of having to go to the dev-team and get the ops-supporting feature added into their priority list, and then work with the support person to interact with the customer while the dev-team maybe eventually fixes the problem (and where then the ops-team will have to both deploy, and understand how to maintain, the software in its newly stretched state, where it probably now has extra ops-time needs like cache storage!);

vs.

2. the ease with which existing deployments of infrastructure components like Varnish, which were stood up to solve previous scaling challenges, can simply have their cluster-configurations extended to support new scaling challenges with no marginal increased maintenance burden;

the infrastructure-level solution will win every time.

Yes, sure, if you're one dev and you do your own ops and you don't actually know much about ops (in the comparative-advantage sense), then solving all your problems in code might have the highest ROI.

YAGNI—and pretty much any other software-engineering principle—isn't targeted at you. You're doing artisanal software development—making the hand-woven wicker chairs of the software world. Most such principles focus on decreasing the Total Cost of Ownership of software, taking into account inter-departmental collaboration overheads, maintenance of the codebase after the original developers leave, etc. You've got none of those concerns.

If you're a dev-team of one and you deploy your own code, go wild: write your whole system in a macro-heavy DSL dialect of Common Lisp with your own custom logging; or write your system as a single 46kb x86-64 assembler unikernel. It doesn't matter, because you still understand it, and can modify it just fine. (Just don't expect to sell your company down the line!)

---

> Consider the question "how many M4A files have a Content-Type set that isn't M4A?" Iterating hundreds of thousands of files and checking their magic number (ignoring ID3) and Content-Type is not only slow but also expensive. Storing the content type in advance would at least allow that process to be shaved down to a small fraction of what it otherwise would be. (and before you ask, file extension is a worthless heuristic when dealing with user-supplied data).

Ah, yeah, that's a different problem than I had assumed. In my mental model, you already had a proxy for handling your object creation, where audio files users uploaded would first go to you; you'd extract the file's indexable metadata (magic, ID3); and then your server would add said metadata into the S3 PUT request as headers (Content-Type & co, and then x-amz-meta- headers for anything else.)

Under that model, you get back everything you need from just doing a HEAD request to your object. S3 HEAD requests are cheap. They're just not fast/highly concurrent, so the point of Varnish here is to make them so.

S3 is annoying, though, in that you can't mutate metadata on an object without re-uploading the object. So, if you were allowing people to do direct object uploads to S3 using signed URLs, and then doing the indexing from there, it makes sense that you wouldn't want to write the resulting metadata back to the object, and would instead want to keep the canonical copy of it local.

In that case, though, there's still an infrastructure-level solution. Replace S3 with https://www.minio.io backed by S3, with Minio's metadata storage pointed at your existing RDBMS. It's essentially just the same thing you built, without writing code. Metadata lives in Minio (and therefore your DB, where you can query it); object bodies get uploaded asynchronously to S3. Though, with this solution, you can optionally have the metadata get proxied through Minio during the PUT, and thus do the metadata-generating content analysis step asynchronously against that stream (as a Minio plugin), rather than having to wait for the upload to complete and then GET the result back to your business layer.

Though, again, for your devs=1 use-case, maybe this has higher TCO than just writing code to track metadata within your system.


What if you'd never built the feature that needed the metadata/reference? What if you had abandoned your project? How long did you run with the much simpler, non abstract S3 location and what did getting to launch sooner buy you?


1. The new feature increased the reliability of a user onboarding function. If I hadn't built it, I likely would not have remained in business. 2. The company would go bankrupt? I don't see your point. 3. One and a half years. It didn't get me anything, because I didn't plan for the added metadata until about nine months in. If the question is "what did the nine months between when you came up with the idea and when you were forced to build it get you?" the answer is "a world of hurt." At the time, the cost of migrating was low. When my users 100xed, the cost of migrating had increased over a thousand-fold.


Yup. There's also the cost of inventory -- of software costs tied up in feattures that are not (yet) useful. Check out this article by Joel Spolsky. https://www.joelonsoftware.com/2012/07/09/software-inventory...


Excellent article, particularly about triaging bugs, too.


YAGNI is a sad example of the state of software engineering: our field follows rules of thumb, based on anecdotes as evidence. When will experimental validation become best practice, like in the other engineering disciplines?


There is a whole field of study--what was originally called software engineering--that does quantitative, empirical analysis of software development. Its practitioners are often called by managers entering into large projects that have tight quality requirements so that the optimal choices can be made in terms of tooling and techniques. (For example, large-scale automotive software, where there is far less tolerance for error than in IT.)


Do you have some pointers to important papers of this field of study?


[1] Article from Dr. Dobb's gives a very high-level overview of the kind of data that is available.

[2] is one of the foundational books in this area. Its analysis of testing effectiveness is fascinating because it's so revealing. For example, comparing defect rates in projects that use TDD vs. non-TDD; the effectiveness of static analysis; etc.

[1] http://www.drdobbs.com/225701139 [2] https://www.amazon.com/dp/0132582201/


Thanks a lot!


After having read a bit, I'm a bit disappointed. While the intent goes in the direction I would like to see, the actual execution does not follow scientific principles. "Consultant collects some data, derives conclusions, and writes a book" is actually again only anecdotal. You need to think about your sample before collecting the data, otherwise your outcome is biased by who gave you a consulting gig.

Consequently, the work has never been published as peer-reviewed paper with experiment and/or real-world data, so that someone else could validate or reproduce results. Is that assessment correct, or did I overlook scientific papers?


> the actual execution does not follow scientific principles.

It can't really. The practitioners are generally brought into a project under heavy non-disclosure and generally allowed to keep stats on the project as long as it is completely anonymized. This makes it hard to publish data in a scientific research mode, because it's hard to reproduce or validate independently.

Most of the time the practitioners are called in (hypothetical example) b/c a manager says, "I have the requirements for this project, I have 11 developers on the team, and we need to produce a working model of X by January, which is sufficiently defect free that we can put it in autos for initial test without losing a lot of cars." Thresholds are identified. Then the practitioner goes to his/her database of similar projects and says, "OK, when using these tools, we found you can get this level of defects on projects of x LOCs within the time frame you're going for. However, if you forgo Y, you can get higher levels in 10% less time. Etc." That can be valuable information, rather than just throwing developers at a project and hoping for the best.

The books that you dismiss as consultants just publishing their anecdotal info are often the product of hundreds of projects. Sure, the data is empirical, but it doesn't mean it's not rigorous. You'll note that the numbers in books and articles are often presented as ranges, which is due to the sensitivity of one factor to the presence of other factors (size of project being a particularly prominent factor).

Going to your original point, all established practitioners agree on the leading causes of defects. And this is where I fault most developers today--they plain don't know what those are. They'll guess, but they don't know. Which is what I think I hear you lamenting at the top of this thread.

Disclaimer: I'm not a specialist in this field nor a consultant, but I find its research interesting and have spoken to practitioners.


Thank you for taking the time to explain the background.

I totally agree that consultants working from quantitative data about past projects are much better than the usual cargo cults.

However, there are a lot of dangers in the process:

- If you are a consultant in automotive embedded software, you will be contracted for these projects, so your data will be quite useless to guide startup Web application development (for example, LOC can make sense for embedded C, but not for languages like Scala or Haskell, where you can implement the same feature elegantly in 100 lines or clumsily in 1000).

- As consultant, you want to generate revenue. So there is a strong incentive to oversell how solid your insights are. There is no counter-force in place to balance that bias out.

- While scientific publication is somewhat broken, it is (in CS) a quite good quality control with respect to scientific method. The book did not undergo any quality control by other experts. So now it becomes a matter of personal trust towards the author.

Summarized, I think that practitioner's data collections and their personal reports about them are useful in some cases, but cannot replace scientific empiric research.


Those from that field have also often failed in order-of-magnitude ways.


> often failed in order-of-magnitude ways

so, there should be lots of examples, right? References?


these folks were the first to push CASE tools, which are a footnote in software history because they failed rather completely in actual production usage. they invented COCOMO, which never gets used for the easy reason of routine order-of-magnitude margins of error (60% in studies, 600% in practice...). they were the biggest pushers or UML and graphical programming, which dies and lives again and again in increasingly zombified states...

have a dealie about cocomo specifically, I suppose. http://shape-of-code.coding-guidelines.com/2016/05/19/cocomo...


> Yagni only applies to capabilities built into the software to support a presumptive feature, it does not apply to effort to make the software easier to modify. Yagni is only a viable strategy if the code is easy to change, so expending effort on refactoring isn't a violation of yagni

This is doesn't concord with my understanding of this term. My YAGNI encounters are MUCH more often around technical design decisions that will theoretically make software easier to modify down the track but in practice we have no idea if the design will look the same by then and hence whether the benefits will be realised is almost impossible to estimate.

Things like "It would be better if this configuration was stored in a database table than a text file" or "these modules should be refactored implement the same interface so they can share code" etc.


I would argue there is another important benefit than the takeaway suggested in the second sentence of [1], that is the attention to architecture choices that allow for simple or complex (costly) refactoring later can engender more forward-looking awareness in the team's culture.

[1] "One approach I use when mentoring developers in this situation is to ask them to imagine the refactoring they would have to do later to introduce the capability when it's needed. Often that thought experiment is enough to convince them that it won't be significantly more expensive to add it later."


YAGNI is very useful. But :-

It's mostly useful in a situation where problems that YAGNI was trying to help with are happening. Namely, when people are trying to ask for anything that is possibly needed upfront because if you change your mind later it will cost a lot more to get it. For developers it was to try and get them to think vertically through software rather than horizontally and to stop designing large layers of goop instead of focusing on working software which you build out incrementally which means a HUGE focus on composable modular software.

It's also not a justification by itself. Lisa: "Lets do X" Bart: "YAGNI"

There needs to be a bunch of thought about things and YAGNIs purpose is really to just challenge what it is that is going to get built.

Most of the criticism of YAGNI is mostly YAGNIing something that was actually needed, or using YAGNI to compromise a software design. But that's not a fault of YAGNI, that's either because you let YAGNI hold too much power over your decision making because you felt its pressure to stop you doing something, or you just weren't capable at decision making time to work out what you really needed. YAGNI is just a focusing tool to work out what you are going to need and it's as strong or as weak as the people making the decisions.


YAGNI is no excuse to design shitty systems, however. Make your systems as modular and componentized as is sensible. It'll pay off immediately as well as down the road when you inevitably need to make modifications. However, it is wise to avoid adding in too many features that you merely think are a good idea but don't actually have proof are useful in practice.

One thing to be aware of is that YAGNI can often be the cry of someone who doesn't want to admit they are accumulating mountains of technical debt. And it's very possible to go effectively "technically bankrupt". Rewrites are expensive and take a long time, in order to do them properly it usually requires launching projects in parallel so that the rewrite can get done while the old stuff is still keeping the lights on, and so that the rewrite can attain the same level of operational maturity. You see this even at multi-billion dollar companies, changing architectures is a difficult process. But what happens if you are stuck with a mess of bad code and a business that is only sufficiently profitable to continue supporting the old code but couldn't afford doing a parallel rewrite? Well, then you could very easily find your company stuck riding that bad code forever. Given the dynamic nature of tech businesses, being unable to respond to market changes because you are mired in technical debt is a huge competitive disadvantage and likely to lead to a much shorter lifespan for your company.

So be careful making too many short-term decisions. YAGNIs add up and up and up, you don't want to get to a point where when you do "need it" it's now effectively impossible (due to requiring an exorbitant budget to implement it).


Everyone likes to pretend they can predict the future for a while, to the point where they will stop at nothing to make it conform to their predictions. Then comes the part where they will defend their choices to death, no matter how much evidence to the contrary is piling up. Once you start seeing the entire chain of consequences, it becomes easier to discipline yourself; it's a process of growing up and taking responsibility.


It’s been my experience that most programmers who have worked on many different projects across many different employers for many years can intuit whether something will be needed with a very high degree of accuracy.


Previous discussion, with lots of comments:

https://news.ycombinator.com/item?id=9605733


Take this with a pinch of salt. With consultants it's YAGNI until you do need it, at which point you will be billed at their contingency rate! You don't need to go full waterfall to understand your requirements up front and make the choice not to wait until something becomes a showstopper because "YAGNI and it won't fit into this sprint anyway". Trust your experienced, in-house engineers for what you are or aren't gonna need.


I think coherency of implementation is a lot more important than YAGNI. For example:

  class Car:

      def honk(self):
          print('Honk!')
Cool, now we have a car, but it only honks. But we only needed it to honk, so that's fine.

  class RoadTrip:

      def __init__(self, car, destination):
          self.car = car

      def go(self):
          self.car.start()
          self.car.drive_to(destination)
Oh no, we built a car but it does nothing that a car does. Let's add this functionality.

  class Car:

      def __init__(self):
          self.started = False
          self.gps = []
          self.location = cool_app.get_current_location()

      def honk(self):
          print('Honk!')

      def start(self):
          self.started = True

      @property
      def location(self):
          return self.gps[-1]

      @location.setter
      def location(self, new_location):
          self.gps.append(location)

      def drive_to(self, location):
          self.location = location
Super easy. Except a different user of our library implemented this functionality already for a different reason, but in a different way.

  class CarEngine:

      def __init__(self, car):
          self.car = car
          self.started = False

      def start(self):
          if self.started:
              raise Exception('Already started!')
          self.started = True

      def turn_off(self):
          if not self.started:
              raise Exception('Not started!')
          self.started = False

  class CarGasTank(self, car):

      def __init__(self, gallons):
          self.car = car
          self.capacity = gallons
          self.level = gallons

      @property
      def empty(self):
          return self.level == 0

  class MotorCar:

      def __init__(self, engine, gas_tank, mpg, location):
          self.engine = engine
          self.gas_tank = gas_tank
          self.location = location
          self.mpg = mpg

      @property
      def started(self):
          return self.engine.started

      def start(self):
          return self.engine.start()

      def drive_to(self, location):
          distance = location - self.location
          fuel_required = distance / self.mpg
          if fuel_required > self.gas_tank.gallons:
              raise Exception('Not enough gas!')
          self.gas_tank.gallons -= fuel_required
          self.location = location
I've lost count of the number of times I've seen stuff like this. Sometimes the original implementation is in a different library that's hard to change. Sometimes other code rely on specific details of the original implementation so changing it requires changing that code too. Sometimes additions seem "out of scope" so they are actively pushed to dependencies.

It might feel like this is a process problem -- like there should have been better upfront design or communication -- but these "failure cases" are actually the success cases for code. You want code to rely on your libraries. You want to consider all users of your libraries when making changes.

The problem is actually YAGNI, because that mentality encourages us to churn out big, incoherent bags of functions when we should be thinking about code responsibilities and designing whole systems.

You can see this in action with JavaScript most infamously. Its standard library is extremely YAGNI, and it led to whole new programming languages being built on top of it because it was so anemic. The complexity has to live somewhere, and if you don't deal with it in a coherent and orthogonal way, someone else will have to deal with it and their options will be a lot more limited than yours.

Following simple mantras like DRY, YAGNI, and whatever else fits on a poster is the surest way to mess up your design. It requires thoughtfulness and experience, and there are no shortcuts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: