Check out Beyond The Goal and Beyond The Phoenix Project for a deeper dive in this area.
I work in cyber security and use many of the concepts often. The root cause of many poor outcomes are poor assumptions, prioritizing ideology over customer value, and misaligned shared mental models.
I use a simple doc format to address. It's based on evaporating clouds.
1. What's the shared (understanding of current state with a focus on objectivity?
2. What are the problems with current state? Subjectivity is OK here.
3. What's desired state?
4. What are the experiments we can run ASAP to learn if our understanding of desired state is correct and learn how we can get closer.
As computer people, I believe we have access to more information on this through the field of Queueing theory.
One of the aspects of Queueing theory is responsiveness, and a system with a saturated queue has none. I see this play out over and over again in both machine and human capacity planning. Even with Agile we can’t get stuff done in a satisfactory time frame because we always have a backlog sized for a team twice the size of the one we have, when responsiveness is maximized when the system is running at 50% of maximum throughput.
One of my mentors was really into Goldratt and Ohno, but The Goal got stuck in my tsundoku pile for years. I’m a third of the way through it now (I’m using audiobooks to get through books I “should” read but never do), and it is starting to turn into thinly veiled queueing theory, but from what my mentor said he refers to it instead through the metaphor of drum-buffer-rope. But there’s a lot more to this field, and as I said before, you can apply it to our applications directly, not just to the building of them.
> Even with Agile we can’t get stuff done in a satisfactory time frame because we always have a backlog sized for a team twice the size of the one we have, when responsiveness is maximized when the system is running at 50% of maximum throughput.
Agile falls down when people misapply it, same as anything else. It's not just the size of the backlog; it's being able to limit work in progress so that you have the ability to adjust. What's more, management needs to get on board with the idea of probabilistic forecasting that's continually revisited, as opposed to trying to stuff complex work into Gantt charts and deadlines. Sadly, most of modern management refuses to make these changes, and too many folks in the trenches don't want to take ownership of their work and just want to be told what to do.
There’s a modification to Gantt charts that uses Monte Carlo simulation to come up with a more believable timeline, but nobody likes bad news so it’s a fringe Agile thing instead of mainstream.
Great companies are few and far between. Everyone else thrives on self deception.
But if you're doing Monte Carlo, you might as well just iterate and keep re-running the Monte Carlo as you burn through the backlog, because that's a better way of having up-to-date information that using any kind of Gantt chart.
You don't have to pick one or the other. A Gantt chart is just a pretty and easy to read graph based on the topological sort of activities and their dependencies and durations laid out in time. They also aren't meant to be static, unless you're working for a badly managed org that uses Waterfall the Gantt chart gets updated as things progress and new information comes in.
If you have a backlog, and you don't mark what is dependent on what (in progress or also in the backlog) you're just hurting yourself. Once you add that information and some very basic estimation (even just scale of expected effort is enough) you can generate a Gantt chart and use Monte Carlo simulations to get an understanding of your time estimates.
In my experience, estimates are the root cause of quality problems and expectation mismatches. No one treats them as estimates but as actual calendar times.
I've been fortunate to steer my company towards simply prioritizing work and communicating the prioritization to the rest of the company and more importantly, our customers. We don't give time estimates or timelines to customers, but provide constant updates on where something is. No one has complained about this in general.
Of course, there are always exceptions - we resist them all we can, and that too is reflected as reprioritization of backlog.
> No one treats them as estimates but as actual calendar times.
its true, though i suspect this is partially because its what the management chain wants: a date to report when it will be done (e.g will my okr for this quarter be met or not?)
> estimates are the root cause of quality problems and expectation mismatches
i have seen this over and over, most quality issues and incidents are caused by decent programmers rushing to meet the immovable deploy/release window... because 'your not gonna make your estimate'....
> its what the management chain wants: a date to report when it will be done (e.g will my okr for this quarter be met or not?)
Not just OKRs directly; a software project is usually itself just a node in a larger dependency graph. For example, the company wants the release to coincide with an industry event at some time T, which means that the project must be done by time T, and also must be mostly done (in some well-defined way) at time T-6 months, so the marketing people can be brought in to do their marketing things in time for the event. In many cases (video games come to mind), if there's not a reasonable certainty the project can hit those goals, the company may be better off scrapping it altogether, rather than wasting millions on development, and millions on marketing, only to miss the target entirely.
In general, management wants dates, because most activities that aren't software projects are coordinated through calendar dates, and software projects don't exist in isolation.
> In general, management wants dates, because most activities that aren't software projects are coordinated through calendar dates, and software projects don't exist in isolation.
very true, though i wonder how agile (small a) fits in to all of this, can an organization that is working iteratively on all fronts (including management) avoid (or at least reduce) these dependencies?
Yes. For us these are exceptions rather than drivers of software delivery. External events are certainly hard dependencies, but more often than not, the deliverables are more marketing than software. That is, a proof of concept, something behind a feature flag for the event - that will not be used in production. We do this routinely, by placing a PoC on top of the backlog.
Certainly not for all organizations , but I have had the fortune to shape this model company-wide.
> Sadly, most of modern management refuses to make these changes, and too many folks in the trenches don't want to take ownership of their work and just want to be told what to do.
Interestingly enough this is something I see in management books and operations research books going back to the 70s. It's a lesson that hasn't been learned.
As for the ownership - I think that makes a lot of sense. People in the trenches know very well that they don't really own the thing, but are just at best responsible for it. I think that is perfectly fine, and the whole "ownership" language tends to obscure very real power dynamics.
> management needs to get on board with the idea of probabilistic forecasting that's continually revisited
From the manager's pov, though, that just sounds like guesswork. "When will my house be built?" "Eh, not sure, but theres a 60% chance the framing will be up by July".
Development managers need to learn to communicate on the same wavelength as their customers, and vice versa. It rarely happens.
The thing is construction people do talk like that.
I think that’s why rich people often make terrible customers. They are just as grouchy at plumbers and general contractors as they are at us.
Which reminds me, one of my life goals is to get a full rundown of GC tricks to apply to software development. I’m running out of time for that to make a quality of life difference.
You're conflating the complicated with the complex. Construction workers don't need Agile methods, which is why "there's a 60% chance the framing will be up by July" sounds so dumb. The physical properties of wood framing, electrical wire, shingles, and drywall haven't changed in decades. You can make detailed plans around these known facts, and workers generally know predictably what it takes to build a house.
Software is not like that. Codebases are too big, especially counting third-party dependencies. Tech debt is lurking everywhere. Customers don't know what they want until they see it. So yes, in enterprise-sized software, you need probabilistic forecasting precisely because you're NOT building a building. It's impossible to know things in enough detail up front to make big up-front plans that don't largely change like you could if you were building a house.
I say this with love but, you've never worked in construction have you.
Architects drawings are little more than nicely descriptive hopes and sketches of an intended idea, which a good contractor has to turn into an actual plan of work.
You want to see chaos go talk to the person running the development of a high end property in New York.
I have a way of getting people to talk to me about their professions. At the tail end of a water damage repair, the GC confessed to me that they flood customers with trivial choices to distract them from the illusion of choice in other areas. Like the physics of plumbing dictating where sinks can go.
As I mentioned up thread, I want to buy a GC beers and get them to tell me more, before I ever take another contracting gig.
You need probabilistic forecasting for everything. I used it with great success to forecast the costs of the general renovation of an apartment we were moving into. Seeing the shapes and ranges of distributions was very informative (and in particular informed the decision on whether and when to take an extra loan). Can't imagine doing it any other way now, even though I had to hack my way into doing it, because approximately none of the tools I know of support this out of the box.
(I ended up using Guesstimate for it - https://www.getguesstimate.com/ - pushing it to the limit of nearly hanging my browser.)
Problem is, most people seem to be overwhelmed by those ideas. It's not hard, but then again multiplication isn't hard either, and most people are afraid of that too. This is a problem because software tends to target the lowest common denominator, which is how we get a million Trello clones, but no tools that understand that work breaks into DAGs, not 2-level-deep trees, or that Gantt charts are good to have, or that probabilistic Gantt charts would be even better.
> You're conflating the complicated with the complex. Construction workers don't need Agile methods, which is why "there's a 60% chance the framing will be up by July" sounds so dumb. The physical properties of wood framing, electrical wire, shingles, and drywall haven't changed in decades.
When's the last time you saw a construction project that landed on time? Construction projects are a classic example of forecasting difficulty. Lots of things have to go smoothly at the right time, including supply chain, coordinating work from multiple organizations (including local governments), and the weather has to play nice.
A backlog is not a queue. It's just a mutable list. A principle of Agile is that you can change your priorities and reorganize the backlog to put the currently highest-priority stuff first.
(This isn't unique to Agile - any bug database could do this, though they don't typically stack-rank things.)
Putting high-priority tasks first increases responsiveness for them, but will starve lower-priority tasks. That's inherent, but at least management gets to prioritize.
That's only slightly true, and it's a dangerous assertion from a process standpoint to say it is so. Everything is queues. Work in progress, undeployed code, dark features, sales pipelines.
There is a queue of requirements we have defined but haven't acted upon. That's contained within the backlog, along with bug reports and wishful thinking. The backlog is approximately a superset of incoming feature request queue (modulo anything that skips the backlog and goes straight into WIP)
To expand on your point: Queueing theory applies whether the thing is a proper FIFO queue, a LIFO stack, a statically prioritized priority queue, a dynamically prioritized priority queue, or a bunch of cards grabbed randomly out of a hat. If things keep coming in faster than they can be handled, the queue, no matter its form, will just continue to grow. That a backlog is not a proper FIFO queue in most orgs doesn't change this fact.
I still think "mutable list" describes the situation better than "queue," at least for programmers familiar with common data structures. No argument that queuing theory is useful.
Defining "responsiveness" in terms of everything that anyone has ever wished for seems like a bad thing, though. We can have a brainstorming session that comes up with a long list of features that would be nice to have, end up throwing them out, and that's perfectly fine.
Attacking someone's knowledge level does not dismiss their argument, which is that you might be imprecise in your use of the term responsivity in your first statement, and could be using it to impress rather than inform your audience.
Let's get to the bottom of this in an illuminating and polite way, without further insults.
You made two statements about responsivity:
1. If your queue is 100% saturated you have 0 responsivity.
2. It's common to see maximum responsivity if the team is only pushing 50% of their maximum throughput.
I'm assuming that you are either
1. defining a responsivity metric for backlog items and using it consistently in both cases
2. Using responsivity in the imprecise business consultant sense of the word which is being criticized
Case 1: You're a serious Queue theorist
If you're doing it in the first sense then could you please give us a more technical explanation of which responsivity metric you're talking about and how these relationships attain?
Some questions there:
1) I can't find a responsivity metric in basic queueing theory. I find average occupancy, thoughtput, waiting time, service time, etc - all metrics which describe /aspects of/ responsivity. Can you point us to more specific and holistic responsivity metrics you have in mind? What makes it the "right" metric to capture something as abstract as responsivity?
2) How can a saturated queue have no responsivity? It seems that saturated queue - which I define as a full queue that is still serving requests - still responds in the average waiting time, plus now there's a chance that the request is rejected by a full queue. Assuming the queue is still serving, there should be some nonzero probability that a request will be served, because a position in line opens up, for a brief period, every time a request is completed. So responsivity can't be strictly zero, can it? It would make sense to me here if our responsivity metric drops so close to zero, compared to normal functioning, as to not make any effective difference. For example, if 95% of our requests are being rejected, and all accepted requests have awful waiting times, then it makes sense that responsivity should be considered effectively nil.
3) where can I learn about the model where maximum responsivity requires some sacrifice on throughput? Usually for this kind of result we have some curve, we take it's derivative, and it turns out the critical point is some equilibrium or maximum. Do we really have the ability to do this for coding teams and their backlogs? That could give us serious objective power in negotiating our work throughput! But it requires a trustworthy model.
Case 2: Your a business consultant and you're laying it on a little too thick
Finally, if you're doing it in the second sense, then you are using the term in the sense that is being criticized. In particular, using it without a formal definition, in a way such that a business audience would hear "responsivity" as whatever metrics they care about most. That would mean you use queuing theory terminology to misinform and manipulate those unfamiliar with queuing theory - probably the most used application of queuing theory in the industry. If you're intentionally doing this, and willing to insult people who call you out for it, we probably couldn't persuade you to own up to it or stop. But if you're not aware of it, you might reconsider how you use queuing theory terms, and commit to being accurate and objective. I state this not only for your benefit but for my own benefit, and for the benefit of anyone who might use technical terms imprecisely! It's always best to be able to ground technical statements in objective theory.
Conclusion
Assuming that you are using the term accurately, with a particular metric in mind, I'd love to learn about it and analyze the metric in our two pet cases. If you're being shady, I hope you'll recant and improve the accuracy and honesty of your language. Thanks for your patience and dialogue!
Wait till you realize optimizing data flow through an API process serving multiple requests is the same meta problem as optimizing value flow through a development team. :)
For a period in my life I was very taken by the promise of a node-based graph visualisation of projects, enabling you to quickly track dependencies, constraints, next steps, redundancies and so on.
And for argument graphs. Premises, therefores/lemmas, conclusions.
I don't get heavily into CRT/FRT, at least not the most disciplined versions. I find that in practice the kinds of problems that justify that in-depth level of analysis are few and far between.
I'm slightly surprised that there's not more mention of Flying Logic on here. How do you decide when to use it, and what other tools/frameworks do you use in conjunction?
Do you share or collaborate on these models with anyone? I find that it's very easy to assume that people will look at the model and see what you're seeing but mostly their eyes glaze over.
I use it when I want to draw a graphical data structure that will visually auto-update as I change it. That answer sounds kind of pat, since you basically need to gather enough experience to know when that is useful. I tend to think it's useful in far many more cases than people usually appreciate.
I often screenshare. You can also share the *.xlogic files, even via git, although the company has made it harder to download a free version of the app that can read them read-only. Last I checked it seemed they require a credit card even if you don't buy.
The natural representation of a recipe is a DAG! ;-)
It's not really moving on - I'd still love to have something like this, but it's an entire paradigm shift that I haven't had the capacity for, not to mention buy-in from other parties.
ericalexander0's comment above touches on some of it ("poor assumptions, prioritizing ideology over customer value, and misaligned shared mental models")
In my particular case, I was expanding a business and starting a new one and had just discovered the whole "productivity" scene and had naive notions of using task management tools and Notion wikis to achieve some latent superpowers. But I got into a rabbit hole where nothing was good enough, there was always some element of lossiness as you moved between tools, and all the tools in the world are not a subsititute for having clear mental models and actually just getting on with the job rather than thinking endlessly about the most beautiful and intuitive ways of getting it done.
Separately from these meta-concerns, building and navigating a model was not as fluent as a Workflowy/Dynalist situation (the latency was small but annoying - like early days of Notion), and as your model built up and reorganised it was easy to lose track of things. There's still some value in graph-based knowledge management (e.g. Obisidian), but it's also important to remember that storing and having access to information (however aesthetically pleasing it might be) is not the same as knowing something.
Possibly a larger conversation lurking somewhere about productivity, management, the meaning of work, ADHD, and friction.
Reading "The Goal" and contemplating how to apply the Theory of Constrains to complex software systems a couple of decades ago became part of the secret sauce of my career as a software engineer. Especially helpful was the insigth that one may make good progress tuning one part of a system only to discover that a larger constraint prevails as the true bottleneck.
Here's a practical example. For those who may still wrangle fleets of actual cloud instances (instead of going the managed cluster or serverless route), a common mistake is to imagine that all that is needed is a group of the smallest compute units available. But with AWS, for example, this can bite you when the smallest instances also have dinky allowances for network and/or disk I/O compared to medium or larger instances. People end up wondering why their queues aren't draining quickly or their DB writes or log pipelines are backing up, etc.
> one may make good progress tuning one part of a system only to discover that a larger constraint prevails as the true bottleneck.
In my latest job this was exactly the case. We were modernising a good old big ball of mud. And we made very decent progress on a part that was supposed to read from an API. That was nice but turned out that the API itself is poorly designed, and the vast majority of problems and slowdowns are coming from the other side. And the true bottleneck was that the team responsible for data was not very well prepared, so there was little design, poor coding, etc. And what really turned out to be the bottleneck is that the entire department just has a problem with processes and with hiring and retaining qualified people, and with ups killing the existing ones ...
Everything in this book applies not only to management, but also optimizing distributed systems.
E.g.
- optimizing parts of the system for their own metrics often leads to degradation of the whole system performance; any optimization must be done w.r.t. the context of the system as a whole
- focus on optimizing the bottlenecks
- approach to identifying bottleneck
- minimize amount of stuff waiting in the queues
I love this book. It laid out in an approachable way my observations w.r.t how humans fail to organize efficiently by just not being good at thinking in terms of a larger system.
Highly recommend reading or listening to The Goal. The audiobook feels like a cheesy training video, but sort of in a good way, and the concepts are extremely useful. Plus, you learn the language it is more likely your business counterparts know, as opposed to talking about backpressure or queuing theory, which they may not connect with.
I use the concepts from this book all the time at work to justify a prioritized backlog and defend against a lot of work in process.
+1 vote for The Goal. Our operations research prof showed us the movie in class one day, and it's one of the few things I distinctly remember from my courses. It's teachings are crystal clear to me years later, although unfortunately/ironically I haven't been able to implement it in my life so well.
"The Goal" has a programming offshot called "The Phoenix Project". It may make your blood boil because a lot of the situations are tooootally realistic.
The Phoenix Project should be required reading for anyone in the management chain of software development, IT, devops, etc. Especially non-technical CEOs.
The Goal is also told in narrative form. It’s from the perspective of a plant manager whose company is spiraling down the drain and his plant has been given 3 months to improve their numbers by an exec.
When we first meet the exec he’s poking the hornet’s nest to get a rush order done, and he pisses off a machine tech who quits in a huff, accidentally breaking the one machine they need the most in the process.
Which is way too close to home for some of us. The guy bitching the loudest about the problems is usually the source of at least a third of your problems.
I've read enough business "fables" to say that it's about average for that category of writing. It communicates its point through the fable, but the novel itself is ok at best and certainly not the reason to read it. It does read easily since it's not trying to be high literature so it's readable in under a week if you can dedicate an hour or two each day to it.
I'm surprised that there is no mention of Operations Research in this article.
I'm actually curious if any business use Operations Research for real, vs. managers just guessing. I mean do something like Linear Programming to optimize profits based on various constraints.
There is the Franz Edelman award from INFORMS [1] that used to publish accessible articles about how OR techniques are being used in industry.
The MIT LGO program has some theses published online that show how OR techniques are applied at smaller scale in industry.
I’ve been involved in some smaller scale projects that used mathematical programming techniques to help with scheduling manufacturing lines and call center shifts. We’ve used simulation to help understand how improvements can be made in call centers and warehouses.
Lots of this stuff is under industrial engineering at the operational level.
At the supply chain level, often a company is using the services of someone else. Sometimes they’ll have industrial engineers working with these techniques.
I've been rather nerdy on Operations Research. It seems that today it's a mostly unknown field limited to few companies. Maybe it turned out to be too difficult for ordinary managers, who just wanted dashboards, maybe people weren't really ready for it, maybe it got crushed by the agile industrial context ... but if you look at writings from the 70s and 80s it seems that there is an agreement that for an OR department to be truly successful it would require a mandate to truly transform the business and disrupt common power structures - which is why instead they often get relegated to just doing dashboards.
Over time they became data analysts and then the data scientists of today.
I personally think that there is a LOT of very useful stuff in the field.
A kind of radical belief that I hold is that basically everyone would benefit from working as a planner for a year or two, especially if you can spend a good amount of time away from your desk and out on the floor. The lessons you can learn by trying to coordinate a production floor are simultaneously very general, very practical, and oddly difficult to learn elsewhere.
The Goal was a fun listen/read. Wait, if the water fills up at 5 liters an hour but the pipe can only support a throughput of 2 liters an hour… then we have a whatchamacallit!
I work in cyber security and use many of the concepts often. The root cause of many poor outcomes are poor assumptions, prioritizing ideology over customer value, and misaligned shared mental models.
I use a simple doc format to address. It's based on evaporating clouds.
1. What's the shared (understanding of current state with a focus on objectivity? 2. What are the problems with current state? Subjectivity is OK here. 3. What's desired state? 4. What are the experiments we can run ASAP to learn if our understanding of desired state is correct and learn how we can get closer.
Simple concept. Works great.