Hacker News new | past | comments | ask | show | jobs | submit login
Story points are pointless, measure queues (brightball.com)
345 points by brightball 3 months ago | hide | past | favorite | 288 comments



My personal experience with story points is that the number never really mattered, but the process of the team discussing how to rate the complexity of a task was very useful. In terms of having utility of estimating how long something will take, I personally have never been able to translate story points into a reliable indicator for that for many reasons (e.g team changing, domain changing, variability in operational load outside of development work).

Overall I tend to avoid using story points, but on the few teams I worked on who really wanted to use it, I always framed it around building shared understanding rather than a metric that is actually useful for estimating work.


I used story points for years with my teams and they worked "as advertised", which is they helped the teams understand the effort, complexity and risk for each story involved.

When there was disagreement, it helped them dig deeper to understand why and usually reveal somebody's incorrect assumptions.

It helped make sure teams didn't overcommit to the amount of stories they stuffed into a sprint and avoid either burning out, or worse, normalizing not finishing the sprint causing negative impacts to morale/motivation. (For some reason my teams often thought they could do more than the points implied!)

Most importantly, when large projects were proposed or were in progress, we were able to give realistic estimates to the various stakeholders about when to expect the various milestones to arrive, which bought us engineers a ton of credibility, trust, and respect with the rest of the company.

And yes, management wanted to see the story points and measure the team against them. I told them to F-off. Nicely. Kinda.

It helped that I was either a CTO or a senior enough exec in those cases with 3-8 agile teams. I essentially was the middle management and could put a stop to any destructive practices like evaluating teams against their velocity.


Absolutely my experience too. Story points have been an effective tool for me with multiple teams in several different companies over the past two decades. They aren't, by themselves, the complete answer to any problem - they need to be applied within the context of a healthy team and engineering culture. And some senior management persistently misunderstand them and want to do insane things like compare velocities between teams. But they are a good and useful tool in the hands of a good team.


Healthy engineering culture is a cure for many, many issues.


So story points work when they are used as an internal tool for a team to understand themselves.

And story points don't work when they are a tool used to communicate to the external world outside the team.


They are good for 'hey can I get this crap done in a sprint'. Once someone starts measuring it for 'how good a team is' that is when it falls apart.

One teams points almost always do not equal another teams points either.

Agile has a TON of anti-patterns that look good to do and are enticing to do. But in the end are self destructive. Usually making it about the process instead of 'I have X amount of work and Y number of people how much can I get done in Z time'.

For example velocity. I measure it so I do not overcommit. Trying to do 50 points when 20 is the norm and something will happen that we do not want. But now that you have a measurable number some manager will want to brag on it (that is their job to brag about you). In the end being put on some spreadsheet to be presented to some other manager. It becomes a score to measure you against other teams and an anti-pattern. As actually testing if something is being productive is hard. But numbers you can get all sorts of them out of agile, leading straight to anti-patterns.


Story Points are inherently team specific as is Velocity. Trying to normalize them across an org for the purposes of a performance metric is folly. Velocity should be used internally on the team when doing timeline estimations, but exposing that outside the team is, again, folly. Even on a singular team, SP and V are subject to small drift or large corrections based on team makeup, time of year, or numerous other metrics. They are simply a planning estimation tool. As others have pointed out, the act of assigning SPs is a useful tool itself as it requires a team to collaboratively estimate complexity and it frequently helps surface miscommunication and missing details.


> It helped that I was either a CTO or a senior enough exec in those cases with 3-8 agile teams. I essentially was the middle management and could put a stop to any destructive practices like evaluating teams against their velocity.

I am sure teams quite appreciated you shielding them from overzealous management. But here is a thought: Doesn't this stand or fall with you being there or leaving? Will the next middle management be as capable and looking out to shield the teams from the destructive influence? Why not change the system, so that the middle management does not need to shield the engineers?


A manager cannot protect teams after they leave. The new manager can change all the existing process when they take over, and you're back to square one.


I understand that. So the question arises: Is it more difficult for a new manager to ruin the work processes through inaction (not shielding the team) or by reworking established processes? My bet is on it being very easy through inaction.


Most of time, I find new manager was brought in to be a yes person because previous manager quit due to conflicts with their management that maybe you didn't see.


You sound like an excellent technical leader and that fixes a lot.

One of the main drivers for writing this down is to make it easy to pass along for people who aren't as aware of the problems that come from those anti-patterns, as well as to explain why they are so destructive. The hope is to raise some awareness for people in tougher situations.


>In terms of having utility of estimating how long something will take, I personally have never been able to translate story points into a reliable indicator for that for many reasons (e.g team changing, domain changing, variability in operational load outside of development work).

If your measure of the utility of story points is how well they help you estimate time, then you're right, they're useless to you. If you're on a scrum team, they're a useful back-of-the-envelope way to estimate which bits of validated functionality you're going to be able to get into the codebase this sprint. No one outside the scrum team should care about story points, and they certainly shouldn't be used to generate reports. Velocity is for the team's benefit, as a tool to help manage workload and schedule.


>back-of-the-envelope way to estimate which bits of validated functionality you're going to be able to get into the codebase this sprint

>help you estimate time

How are these different?


They aren’t. I’ve seen this argument rehashed a thousand times - “we arent estimating time, just complexity so we can estimate how much we can do in a sprint. which is two weeks. wink wink”


Which is just time and effort estimations with extra steps and cargo culting.


Sadly, if you avoid the extra steps and say "I think this will require 10 working days", some manager will reply "are you absolutely 100% sure that it couldn't be done in 9 working days?".

For some reason, calling it "10 story points" does not provoke the same instinctive response.


The assumption is that it's already unreasonable to expect developers to estimate how many hours something will take, and then due to meetings and such it's effectively impossible for them to then those already bad numbers into calendar days. So instead, the proper answer is to bypass all that by having developers estimate in made-up units that can be added up and trivially converted into calendar days.


The assumption that anyone can estimate work they have never done with a reasonable degree of accuracy is just plain wrong. That is why management pretends they arent doing that either


One is a fuzzy and subjective measure of how much software a team can develop in a sprint, the other is a precise measurement in a single dimension. I could sit there and try to calculate how many expertise-adjusted person-hours each team member represents, then estimate our capacity that way, but why fool ourselves with that kind of false precision? Just eyeball it the first sprint and then observe how much we get done. Explicitly tying story points to time just invites abuse of the velocity by managers who don't understand how scrum works.


I frequently hear about this abuse of estimates and even ran in to it myself when I was a more junior engineer. In my opinion, part of maturing as an engineer is learning to call your shots and properly communicate when you aren't sure how accurate you are. It's one thing to say "I promise this will be done tomorrow". It's another thing to say "we're not positive but within the month seems likely"


Given that people are generally bad at giving accurate estimates, the story points is asking the question in a different way, it's asking "what's the likely complexity of implementing this?" rather than "how long would it take to implement this?", because people are likely to be too optimistic on the time estimate, and more likely to be accurate on the complexity estimate.

It is a multi-step estimating process to arrive at more accurate estimates:

1. estimate complexity via story points, filter out features that will be implemented according to the "complexity budget" 2. break the features down into sub-tasks 3. estimate the time it takes to do each task


Only one requires an overpaid clown (SCRUM master/Program Manager/etc) to do the unit conversion.


Same way that throughput and latency is different. You can use it to predict what the team can deliver in a quarter for example. But difficult to give ETAs to individual stakeholders.


if you are at a point where you employ such a factory throughput thinking for information workers then you're misusing information workers in my opinion. Focusing on a throughput metric is just as nonsensical as focusing on how many lines of code somebody creates, in fact, it's really astonishingly similar in how nonsensical it is. Bug reports, issues and features are not just lumps of coal that you need somebody with a pickaxe to beat it with. If you are treating employees like cogs then they will mold themselves to be empty, unthinking cogs. They will not think of outside solutions anymore, they will not care anymore, they will just take the next issue and beat it as ordered.

In my opinion it could even be seen as the biggest red flag of a team when they start using story points. It fundamentally means that this team started to measure their work in terms of raw issue throughput instead of real value. It may work for a while. Maybe your managers are just so awesome that all the issues they create are perfect and great for the business and you never have to think for yourself at all. But inevitably there will come a point where the company would be better off if everyone was using their full potential but by that point you are stuck with a bunch of cogs that you have molded into cogs over years.


ah, we are not using it as a measure of teams performance, like throughput though. Its just used to predict what the team can deliver in a sprint or a quarter etc and make business decisions based on that.


That is kinda the same thing. With story, it’s either we roughly know how to do it or we need to do some research. In te first case, estimates will be wrong because of edge cases and contextual differences.

Businesses decisions should be based on things like feature requests from customers, not the amount of “points“ a team can get done.


It’s extremely useful for a business to have a rough idea of when they’ll be able to deliver something. Budgets, contracts, client relationships, marketing, etc are affected by it.

So should Engineering give a best effort to estimate that, or just throw up their hands and say “it’s done when it’s done”?


Counterpoint: it's extremely useful for the Trisolarans to know when their next stable period will be, but that doesn't make it reasonable to demand an estimate and factor it into a contract unless you're really really clear about the uncertainty in the estimate.


Yes, we should all live in reality and use our best judgement. If estimating based on story point velocity is literally useless, then no one should be doing it. However if it does help make planning more accurate, even by a little, then it might be worth the cost of doing so. It’s a conversation to be had between different functions, hopefully based on empirical evidence.

I feel like a lot of engineers overlook that there’s more to a viable business than just producing high-quality software. People don’t ask for estimates just to annoy developers.


> People don’t ask for estimates just to annoy developers.

No, I know, there's just a systemic difficulty with scheduling dev items that are difficult to estimate.

My team is currently working on performance improvements for a certain service in order to get it to a level that our biggest client is happy with. Based on some profiling and some intuition, I built some Jira cards with rough ideas of what changes might make some improvements and added some rough estimates of how long it would take to trial each idea. Of course, what's actually happening is that we try one idea, it counter-intuitively makes performance worse, we dig into why that's the case and refine the idea, then try a different version of it. It's fundamentally impossible to give an estimate for when we will have run through all the ideas (plus new ones we think up along the way) and how much of an improvement it will make.

I just came out of a slightly painful standup where the dev manager repeatedly asked "Is this doable by mid-August?", and I repeatedly answered that it's not possible to give such an accurate estimate, that we would do what we can in that time and hope to give a better idea of what's possible by the end of this month. Of course it's not great for the dev manager to hear, because they have a client who needs to know how quickly they can scale up their use of the service. There's a conflict between what we can possibly know and what the client "needs" to know. I wish that I knew how to resolve it. It feels wrong to agree to deliver something with such a level of uncertainty, since it just leads to a ridiculous amount of pressure like this.


That’s all fair. It’s an inherently difficult problem. In a healthy organization, leadership is aware of this, has reasonable expectations and factors in the risk. Unfortunately not all organizations are mature in this way.


> No one outside the scrum team should care about story points, and they certainly shouldn't be used to generate reports. Velocity is for the team's benefit, as a tool to help manage workload and schedule

These two sentences are somewhat in conflict imo. Workload and schedule is tied into capacity planning and resourcing, which happens at a higher level than the scrum team. The people having those conversations need something to go off of, so it's pretty easy to see why they latch onto story points and velocity.. those are numbers that their teams are already producing!

I do agree with you that this is a misuse of velocity and story points, but I don't think it's possible to keep such things from being used (abused?) by upper managers above your team


But at that point you should be at a higher level of abstraction, looking at the number of sprints that will probably be needed to clear the backlog (hopefully understanding that the backlog itself should always be regarded as incomplete). The discussion then should be something like, "Hey, we're four sprints in, the backlog already has about seven more worth of stories, and we only have four more sprints until our target deployment date. Do we cut features, move the date, or add more team members?"

In the pathological case where management is constantly monitoring the team velocity and converting that to individual productivity, you'll get managers who (often to cover their own recklessness in setting an unrealistic date, or procrastinating on starting the project) will decree that the team is going too slow. I know that happens a lot, but it's not the fault of scrum, it's just poor management. Scrum can't make a bad manager good, but avoiding scrum won't make them good either.


There are other numbers that kept getting ignored like bugs count, backlog size, time lost to tooling and meetings,… maybe those can help with planning and resourcing too.

It’s always measure the team’s productivity, not make the team productive.


Apart from the impact on team motivation and the incentive to inflate estimates, another problem with that approach is that the story points might not correspond cleanly with how much time was actually spent on the task. Story points are anyways an example of multidimensional data squeezed into one dimensions and in the process losing valuable information.

Are there teams out there that correct the story points based on actual amount of work and complexity?


>Apart from the impact on team motivation and the incentive to inflate estimates...

Why would story points affect team motivation, and why would a team have any incentive to inflate estimates? The team isn't judged externally by the story points they burn, they're judged by the software they deliver. The team itself should be the sole consumers of their own story points, so inflating them accomplishes nothing from their perspective.

>...story points might not correspond cleanly with how much time was actually spent on the task.

If something turns out to be easier than expected, then the team should take that as a lesson for the next time they see something similar. This happens all the time, it's the kind of thing to bring up in the retrospective. I tell my teams that every sprint has two products: the validated software that gets produced, and the team that produced it. The team generally gets better as the project goes on, as they learn more about the project and their own capabilities.


Story points are used to plan a sprint. They are a honest estimate of how much work and how difficult a task is going to be. If they are used by management, then it turns into a tool to excerpt undue pressure on the team.

> The team isn't judged externally by the story points they burn

Are you really sure about that? Never got a question from the customer why the team couldn't finish as many story points as in the last sprint? Or why during sprint planning the team hasn't committed to a given amount of story points?

There is indeed a learning process. Once the above questions get asked, it is not difficult to see why a team would start to inflate story points.

> The team itself should be the sole consumers of their own story points, so inflating them accomplishes nothing from their perspective.

As indicated by the comment I was replying to, this is not the world we live in.


The points are needed because it's convenient to do reports based off of it. If we used estimation language such as "easy, medium, hard, dummy-thicc" they'd still need to assign points to those labels so they can do math on reports and graphs to watch your performance

The biggest sin of course is then trying to predict velocity, but the consequences of that usually just make people doing the reporting look silly for no reason. I think even the slow developers rarely get fired, and you also get nothing for clearing more story points than other developers.

No bonuses for higher velocity is the real reason no one takes it seriously.


> points are needed because it's convenient to do reports based off of it

So, you’re saying it’s convenient for management, yes? Making things convenient for management at the expense of the front-line workers inverts the priorities of a company. The managers ought to be working harder to make things as efficient and convenient for the workers to build the right thing. Putting management first is the stupidest thing you can do for your company. You immediately separate yourself from quality and by and by your company will whither and rot. If it doesn’t die, it certainly will never be great.


>So, you’re saying it’s convenient for management, yes? Making things convenient for management at the expense of the front-line workers inverts the priorities of a company.

You'd end up having a points system whether you like it or not. If it wasn't story points, it'd be days/weeks/months estimates.

Almost nowhere you will find "it's done when it's done, don't bother me until it's done" as an option for a developer.

If you want an even more utopian arrangement, management would know how to look at git and use the software once in a while to see how things are progressing.


You, like tbrownaw, are arguing against something I didn't say. (Please go read the sibling thread.) I didn't say that planning, goals, etc. are bad. What I said was that story points lead to bad plans because they oversimplify the complexities of software development. A project timeline can be fine—but only if it is born out of a nuanced discussion and not by adding up a bunch of unitless points.

Developers (at least, ones like me) despise story points because they ask developers to collapse a ton of nuance and detail into a single scalar. It’s about as meaningful as this comparison of food recipes: compress the cook time, number of ingredients, cost of ingredients, quality of ingredients, whether or not some of the ingredients are allergens, and how much you like the food independent of the weather and your mood into a single scalar value and using that to plan what you will eat during the week. Madness!

Story points can enable corporate despotism. Bad managers who demand developers make up story points, and then turn around and use that to dictate what developers will work on—without engaging in a thoughtful discussion on what is feasible and what will keep the product stable—are on a power trip and are treating developers like cogs in a machine. They know the price of everything, but the value of nothing.

The good managers that I have had in the past regularly discussed with the dev team what was needed and—with the developers' input and without artificially compressing the complexities of the task at hand—set good goals that gave the dev team focus and direction.

Using story points is lazy management.


The purpose of a company is to make money. Even if you hate shareholders (like for example your own 401k) and think invested capital deserves zero return, making money is still what allows the company to pay your salary.

Most companies do this by selling things to customers, whether services or products.

If you're building software for a specific customer, they'll want to know when then can start using it.

If you're building something for retail, marketing will want to know what release date to advertise.

These are the mind of things that management uses those reports for. Saying that actually customers don't need to be able to make plans, or marketing doesn't need to be able to advertise, is effectively saying that your employer doesn't need to make any money.

Which is quite a silly thing to say, at least if you recognize that devs don't exist in isolation and maybe even aren't the center of the universe.


> Saying that actually customers don't need to be able to make plans, or marketing doesn't need to be able to advertise, is effectively saying that your employer doesn't need to make any money.

That's not at all what I said. I said, if you choose to make management more efficient to the detriment of the people making the thing your company sells, you've made a bad choice. Bad managers like story points because it usually doesn't hurt their heads too much to add and compare scalar values. But a story point doesn't come anywhere close to representing the complexity of building any product. I've had one or two bad managers who wanted to make all things subservient to their precious Excel file. The productivity of that team (when the manager and story points process were introduced) plummeted and the entire team quit within a month of each other leaving.

The best managers I have had have worked by discussing customer and business needs with engineering and walking away with a richer picture of the tasks involved and an estimated timeline. It is very hard work to do this—it usually takes someone with a little knowledge of how software engineering works. I've had good product managers like this and I've been able to deliver some of the highest-value code I've written (measured strictly monetarily) thanks to managers who took a rich, nuanced view of the process.

I am not advocating for no management or no planning. I am advocating for management not requiring over-simplified metrics because it makes their life easier. Management does not exist to make its own tasks easier. It exists to make it easier for the people making the product to build the right thing at the right time and sell it. You can make plans without story points.


Having worked where a megacorp attempted to normalise points and incentivise velocity, there is a fatal flaw. The people doing the work estimate the size of the work! Increased “velocity” will happen automatically. Actual work throughput will remain the same. Prediction becomes impossible.

The most damning problem with “points” is that you can’t even do basic math with them. Does 4 points take twice as long as 2? No, no it doesn’t. The whole thing is just a giant waste of time.


My team needs to get 17 points done daily, but no-shows-no-calls keep kneecapping efforts. When they do show up, they complain incessantly about how management makes everything a 0.5 no matter the list of features in the story, and why can't they get more than 29 hours a week so they can have insurance. But it's attitudes like that which keep them from getting raises on their yearly performance reviews. It's a good thing scrum was invented so we could finally measure the consistent low performance of you computer science nerds.


> No bonuses for higher velocity is the real reason no one takes it seriously.

Bonuses for higher velocity based on guessed story points would be an even bigger reason for not taking it seriously - or rather to overestimate everything in order to gain the impression of higher velocity and the bonus associated.


Non-developers might have to learn something about development so they can push back on inflated estimates, which honestly would be helpful to everyone.

If I quoted you $1,000 to unclog your kitchen sink trap, you'd call bullshit because you can probably find out how easy it is to do that. Likewise, if I quote you 13 points to add a generic error toast message because a POST went badly, you should call bullshit on that too.


And yet Fred will take 2 weeks to do the work, and the team is dysfunctional so only certain people work in certain areas.

Fred sucks. We can say his work is easy, and continually make him feel bad without firing him, or we can pretend it’s two weeks.


that really is the truth behind storypoints, somebody wants a way to create pressure. Which is also why storypoints are so bad because now you are pressuring everybody, even the good workers, despite only wanting pressure on a couple people. And on top of that you are wasting the time of the good workers with the whole story point estimation dance.


The problem with your kitchen sink example is that you'd think that they've unclogged 7k kitchen sinks so that they can you give you a great estimate.

If we've built the same software before ....we wouldn't have to build anything we'd just use the previously built software.

Software isn't built like a kitchen but by exploring an infinite list of design choices and then choosing one way and implementing it ..hopefully without impacting existing architecture. And it seldom stays static - the name of the game is constant change and then add group dynamics to that.

A better example is to call a plumber and ask how much would it be to add a "rainbow spout to the horse-drawn wheelie-do" oh and there will be several other teams of plumbers there as well. I bet you get a few questions.


this is nonsensical. first of all more programming these days is hardly invegigative, it is literally plumbing. attach this service this queue. translate this request to this api. compile this code under a new version. put this thing in a container.

secondly, and I've done alot of this, its entirely possible to build useful estimates around greenfield projects like 'define a new language', even though there are lots of variables in play. one useful technique is to work backwards. 'if we dont have a working draft spec in 2 weeks then we can really start the parser, so we're gonna say that, and if we cant hit it, we know we're in trouble'

if your ability to assign semantics and use your experience to plan out the work stops at 'horse drawn weheelie-do', i wonder if this is the right profression for you.

if as a culture we just throw up our hands 'whelp, its all unknowable, we'll just do the best we can and thats you can ask for', then we cant really criticize our customers for being frustrated with us and not 'appreciating our brilliance and how hard this is'.

and the truth is, we can and have done better than that.


I'm a big fan of evidence base scheduling by Joel Spolsky [0].

Story points hold very little value in estimation, the discussion that occurs however, is extremely useful.

[0] https://www.joelonsoftware.com/2007/10/26/evidence-based-sch...


I rarely see EBS in use. Why do you think it is not more widely used?


Oh come on. There are certainly a bunch of things which nobody on a team has done before in any given project, an error toast is not one of them. Almost every project is using previously built software for them, and the way they fit together is reasonably well known.


It's also easy to game the system. Turn a couple of 3's into 5's. Your "velocity" goes up for no real increase in work.


To your own point, the reports serve little value aside from a fabricated narrative that some companies like to build feel-goods around.

Discuss the complexities and needs. Break the work into small chunks. Define what progress means. Set expectations for making progress. Regularly and honestly review why you/the team are or aren’t meeting expectations. Rinse and repeat.

Perhaps this is over simplifying it, but these are the tried-and-true high notes in my experience. If at any point one of those steps isn’t feasible, then it’s a larger issue that implementation process likely isn’t going to solve, so the “to measure velocity or not” point seems moot.


In addition, I like the whole "Full Queues Amplify Variability" chart.

It's intuitively obvious, but I never realized it was that bad.

I'm going to have to run that down and see if it's actually backed by real data. Too many business books are complete flimflam.


Reinertsen's books are very, very good. The most recent and most up-to-date is Principles of Product Development Flow, which that chart came from. The previous one, Managing the Design Factory, is pretty similar and I think it's a better read.

Another great takeaway from it is to prioritize things by cost of delay, or even better, what he calls Weighted Shortest Job First, where you divide the cost of delay by the expected length of the task. The only problem is that the same people that want to get oddly formal and inappropriately rigorous with things like story points will want to turn cost-of-delay into an accounting exercise - or object that it's impossible because they don't have the accounting system for it - which misses the point entirely.


Every CTO needs to read that book.


Yes, and WSJF gives you two units of data: priority and sequencing - because everybody's pet is priority 1. Much better than MoSCoW.


That is very true. I'm going to write something up about WSJF in practice in the future, but it has some real benefits to fixing that issue.

Particularly when you get the decision makers in a room and get them all to agree on the estimated value of each item. Not only does it remove the numerous priority #1's, it also gets everybody aligned on what the real priority #1 is and why.

I've seen it done where people are surveyed separately and it only works well when the people involved are forced to have a conversation to put real numbers to their assumptions, coming out with agreement. The other side effect is that it solves the squeaky wheel problem.


I hadn't specifically seen MoSCoW before. Some people need to have everything spelled out for them, I guess.


The shape is about right for the variance of the M/M/1 queue (what is an M/M/1/∞ queue??), but it's a little disingenuous since variance is a higher-order moment. Standard deviation is probably more sensible to think about here, since that brings it back down to the same scale as the queue length.

> Thus the average number of customers in the system is ρ/(1 − ρ) and the variance of number of customers in the system is ρ/(1 − ρ)². This result holds for any work conserving service regime, such as processor sharing.

https://en.wikipedia.org/wiki/M/M/1_queue#Average_number_of_...

In the cited example, going from 75% utilization -> 95% utilization drags your standard deviation from a modest 3.5 items in the queue all the way up to 19.5 items. And, in both cases, that's not far off from your average queue length, either.


> what is an M/M/1/∞ queue??

Infinite queue length. M/M/1/k would be a system where once k items are enqueued, some work shedding policy takes over (either rejecting new work or often preferably dropping at the head of the queue.)


This is my experience as well. Story points in and of themselves are worthless, but as an excuse to discuss the project as a team, they can have some value.


My thing for any agile methodology is the self organizing team aspect really is the most important part, any time-like metric going up the chain is gonna be painful.

I liked story points when the team used them to get aligned on the work being done. I did not like story points when middle management had them on a chart they would review.


Those items after your eg are part of the problem why story points don't work for you.

You really do need to control the amount of time your devs get distracted by other things. If they're solely focusing on that one project, you'll be fine.

Its why i personally appreciate XP more than plain old scrum.


This, 100%. When engineers have focus, no context switching... things get done. When they're interrupted with useless Zoom meetings every 2 hours, productivity drops off the cliff. If you have 2 meetings 1 hour apart, that 1 hour is basically useless: by the time you're in the zone, you're interrupted again.


And then when we're given a nice long stretch of interruption-free time, we ruin it by hitting HN.


Story points were never meant for forecasting. To do that you can use Monte Carlo analysis https://observablehq.com/@danielfrey/monte-carlo-simulation-...


Please don't


That's a very healthy approach to it and deserves applause. Kudos.

The goal of mobbing around task breakdowns is to drive the communication and building that shared understanding. Just with an output of writing most of it down in a way that makes it easy to track progress and approximate the size at the same time.


They're also useful in retro for the simple fact that something was written down. If your story point estimate was way off for a particular task, you can pull up the original conversation, identify what assumptions were violated or what new info was surfaced that wasn't present at the original estimation and whether there's learnings/process changes to be made.

Of course, retro is usually the first thing to go in a deluded attempt to increase velocity and story points hang on as this vestigial tail, contributing to more cargo cult software engineering.

edit: One scenario that might play out:

The team all agreed that tasks A, B & C were worth 3, 5 & 1 points respectively but Steve and Mei thought task D was worth 5 points but Carol thought it should be 13 because it involved integration of an external API where her experience was that other API integrations in the past had exposed hidden complexity.

Task D ultimately was not completed because during the process of integration, it was discovered that the API did support a key feature that was needed and instead, an in house alternative needed to be built.

It was decided that going forward, the team would instead assign a 1 point task to building a toy app for any new API integrations that would then feed into the process of deciding the story points for features requiring that integration.


"Plans are useless, but planning is indispensable."


Eisenhower. Here's another from the great philosopher Mike Tyson: "Everybody has a plan until they get punched in the face."

Point being, have a continuously ready backlog consumed by short iterations with great telemetry - like SpaceX.


> I personally have never been able to translate story points into a reliable indicator

You haven't discovered the secret formula: make your estimate and then mindlessly triple it.


multiplying estimates by PI has proven to be a scarily accurate rule of thumb over the course of my career.


Yeah, the discussion can be really illuminating. If the points weren't recorded or mentioned afterwards I think it would be a net positive in some cases.


That's one of the main points of the approach mapped out.

You still get the discussion value, you just end up documenting things better as a practice and make a more concerted effort to add clarity where needed.


Software does not exist in a vacuum. You are probably right in thinking that story points should not matter to you, as a developer; but they do matter to external stakeholders.

Forecasting and management of expectations is necessary because software often has external real-world dependencies: available funding for a first release, marketing materials, hardware that can't be released without software, trade shows, developer conferences, yearly retail release parties, OEM partners, cyclic stable releases for enterprise customers who won't push software into production without extensive pre-testing, graphics adapters that can't be released without drivers, rockets that won't launch, cars that won't drive, etc.

All of these things require some degree of forecasting and appropriately-evolving management of expectations. Here's where we stand. Here's what we can commit to. Here's what we might deliver, but are willing to defer to a future release. Here are the (low value) items were under consideration that we will definitely defer to a future release, here are the features you will need to drop to get feature b in this release.

The purpose of story points is to help provide forecasting and management of expectations (with appropriately limited commitments) to stakeholders, on the understanding that forecasts are approximate.

Calibrated burndown of story points is pretty much the only basis on which forecasting and management of expectations can be done in an agile process. The key is to make sure that stakeholders understand the difference between forecasting and commitment, and to make sure your development team is appropriately protected by a healthy development process.

Whether the author's claim that you get better forecasts by just counting stories, instead of summing story points... color me skeptical. I do get that it prevents some obvious abuses of process, while enabling other that are just as bad. If somebody is using story points as a developer performance metric (which they shouldn't), there's nothing that prevents them from using completed stories as a developer performance metric (which they shouldn't). The corresponding abuse of process to combat that metric would be to hyper-decompose stories.


> Calibrated burndown of story points is pretty much the only basis on which forecasting and management of expectations can be done in an agile process.

Do you mean in an Agile process?

Sorry I have a visceral reaction to this, having seen teams try to accomplish large things by breaking them down into story-level tasks and then sum up the estimates, and then watch the slow motion train wreck as gaps emerge, requirements evolve, learnings accumulate, and everyone throws their hands in the air and points to the long paper trail of tickets that prove they did their job.

Scrum and story points are a reasonable way to drive local incremental improvements when you have no greater ambitions than local optimizations of an established system, but they have a low ceiling for achieving anything ambitious in a large system. At best you'll get solid utilization of engineering resources when you have a steady backlog of small requests, at worst you'll redirect engineers' attention off of the essential details that make or break difficult projects towards administrative overhead and ticket shuffling. I understand why this might be the way to go in a low trust environment, but it's really no way to live for an experienced and talented cross-functional team.


I would love to hear your view of handling ambitious changes in a large system.

(In the middle of this scenario. Mix of experienced/talented and low trust environment.)


> The corresponding abuse of process to combat that metric would be to hyper-decompose stories.

You're right about that, but that's also one of the benefits to the approach. Inflating a point estimate is easy and there's no real audit trail for why the value is what it is.

On the other hand, if a team creates tasks like "Commit the code" or "save the file" it's pretty easy to identify as fluff.


'save the file' is not fluff.


There was a part of the article that discussed that you create tasks, not stories.

You can group the tasks into stories or milestones or iterations or epics.

In general, he’s saying you should always keep breaking down a task until it’s a 1, since 1s are easy to estimate.

The key part that may be missed is the “mob programming” or “pair programming” aspect where all the engineers on a team sit together and work through a story or milestones or epic to come up with a list one 1 point tasks.

Obviously this still can’t be done, so the only effective end solution is maximum pair/mob programming unless all tasks in an iteration are accounted for and broken down into easily understandable and estimable bits of work.

There is at least some truth to the notion that if you use mob programming, estimating becomes pointless.


> The key part that may be missed is the “mob programming” or “pair programming” aspect where all the engineers on a team sit together and work through a story or milestones or epic to come up with a list one 1 point tasks.

My issue with this has always been that once an issue is straightforward enough to estimate as a 1 point task, you could’ve implemented the task already during the estimation process. The unknown effort is almost never in the writing code part, but figuring out the complexities around business rules & externalities.

But this doesn’t fix the process, it just moves the variability of effort & time into a different part of the process.


I have actually always had the same feeling about 1 pointers, but in this case you're talking about a collection of small tasks that make up a larger feature.

The larger feature probably couldn't have been implemented during the estimation process, but a single isolated small task could have.


Story points aren't time (as OP states). They're relative complexity, and uncertainty (hence the fibbonacci sequence building uncertainty in to larger numbers). And stories should be able to sized as big numbers. I've never been on a team comfortable with more than a 7, at least not since my first agile experience where we all took an agile/scrum training together for a few days. We'd frequently give things like 21 or 30 or 50 points, as appropriate. That's the only place I've ever seen a burndown chart that looked like it should. Everywhere else, it's flat until the last day and then drops to zero as all those "it's a 7 I promise" get carried over to the next sprint for the 3rd time.


I haven't done story points estimating in years, but at the time, an 8 was rarely acceptable, a 13 surely was not. Estimates that high were basically saying "this story is too big or too poorly defined to estimate accurately" and we'd try to break it down into several stories of 5 points or less.

The vast majority of our stories were 2, 3, or 5 points.


Now you are doing exactly the same mistake the article starts out with. Comparing story points outside your own team, of which you have zero context about how much 8 point represents.

How big your points are make no sense at all outside your own team. It is a relative measurement. 8 could mean 8 lines of code, 8 rest endpoints, 8 database columns or 8 interviews with customers. It certainly should not mean 8 days.


I guess I am, but the point I was trying to make is that there was a fairly small number of point values that we were fairly confident in, and above that it was quickly into the "we can't even take a guess" territory.

Correct that the absolute point values aren't relevant, but it would seem odd to me with Fibonacci increments that teams would find values such as 13, 21, or higher to really be useful unless they put a lot of research into their number. For us, it was "read the story card, and give your estimate" so it was entirely a gut feel sort of thing.

And yes, when you really boiled it down (though rarely admitted), for most people 1 point = 1 day. So anything over 5 was unlikely to get done in a week, therefore it needed to be broken down as we ran one week sprints.

I'm not endorsing any of that, by the way. I thought planning poker was pretty arbitrary, but it was the gospel and not to be questioned.


Not really.

The point there is the granularity. If 8 points to you is fixing a minor spelling mistake in your docs, what value is there in having anything smaller than 8?

If 1 is "build the entire backend" then how can you represent anything smaller?


You're just proving their point


I agree - just countless times I have to beat that up into peoples head and even people who I would consider smart "points <> time".

After the sprint you can kind of infer the time but it should not be guideline for next estimations unless these are tasks like "fix typos".


Ultimately, we are bound by time, not complexity. Why does it matter how complex a task is? The product managers and customers wont care how hard we as engineers have to think or reason about a problem; to them, the only thing that matters is time until delivery.


So they are bad product managers and customers.

Time until delivery for good managers and customers is a range. Can you estimate getting 10kg of potatoes from grocery store that is 35m driving roundtrip away? Can you say it will be exactly 40mins because you can pick up and pay in 5 mins? I don't, I can say it will take between 40mins and 2h. There are always things like card terminal stops working or you get stuck in traffic because of an accident.

Complexity in that example is uncertainty like I do expect high traffic and there might be an accident happening but if there is less traffic and I hit all green lights 40mins going to be easy.

We all know bad managers and bad customers will expect me to get that bag of potatoes in 37 minutes and then ask 10x why did I not drove over that police officer that was stopping the traffic because of an accident to get their potatoes on time.


I don't follow how we go from "they are bad product managers and customers" to... therefor time estimates are bad. I do not think it is unreasonable for our primary stakeholders to ultimately care about time. I also do not think it is unreasonable to give error bars in estimates like "this project is uncertain, therefor estimates will be variable".


The bad managers are those who ask you for an estimate and then take it as a commitment.

.

"How much time will this take?"

"Not sure; approximately 30 minutes, but could also be 20 or 40 minutes. If we are very lucky then 10 minutes, but if we are very unlucky, maybe an hour or more."

"Spare me the details, I just need one number for the report."

"Uhm, 40 minutes?"

"You just said that it would be approximately 30 minutes, didn't you?"

"Yeah, but I wanted to add some safety margin..."

"If we keep adding large safety margins to everything, then the project will take forever. As you said, some tasks are completed faster, some tasks are completed slower, on average it will cancel out. I need your best estimate."

"Uh, okay, then 30 minutes. On average."

...the next week...

"So, you guys told me this would take 30 minutes, but it actually took 35. I think we need to have a serious talk about your performance."


"Time until delivery for good managers and customers is a range."

Sometimes it's not. In the gaming industry Christmas is a hard deadline.


No, it's still a range, the difference is simply that a good manager would plan such that Christmas is at the very far end of the range. Bad managers will plan with the optimistic end of the range, and then expect crunch time from exploited workers following their passion.


Story points aren't useful outside the team. They're for the team to help it figure out roughly how much stuff it can do each sprint. They shouldn't leak out of the team.


A sprint is a unit of time, how does measuring a "complexity"- whatever that is- helps in figuring out how much stuff you can put in *time*?


I understand what you're saying - of course in some sense they're convertable. But the point is to not think about time when estimating, because if you estimate time you don't factor in things like other tasks, or holiday, or anything else. Or if you do you have to spend ages trying to account perfectly for time.

Instead, if you estimate complexity (e.g. I think this task is a 3, just as a starting point, then this task is roughly the same, so it's also a 3, then this one is similar but will take almost as much testing due to its difficulty, so we'll call it a 5, then this one is very simple, not even half as difficult as the first one, so it's a 1, etc), then try and keep that up for a few sprints, then figure out how many points fit into a sprint, you automatically factor in other factors (like "I have to log into Okta ten times a day", or "people keep getting pulled into meetings") through practical observation of what got done, and you get better at predicting what you'll be able to achieve in a sprint.

It's not perfect; it just removes the need for certain entire jobs devoted to accounting for time, which you can spend on another developer instead, while also being a reasonable measure of what you'll get done, and only takes about an hour every two weeks.


If the problem is "we're not accounting for holidays in our time estimates", I can't see how the solution could possibly be "time is obviously a flawed measure, so we'll use this other measure which has this hazy relationship with time, but we all agree that it's definitely not time, although we have trouble saying what it is"


personally, I find it much easier to say 'normally, I would get this to you by wednesday of next week, but we have that offsite and my wife's parents are visiting, so does friday work for you?'. than 'this is a 3', so, I guess this fits in this sprint?

changing units and names of things really seems like a deliberate attempt to rob the discussion of any actual meaning. just a comfortable empty formalism that masks the fact that we aren't trying to come to grips with the most difficult parts of our job


But you don't estimate holidays in ... you estimate how long it would take if someone picks on the task Monday morning and works on it full time.

If someone picks up task on Monday then has 20 other meetings - estimation is still the same, he just continues after those 20 meetings and you just don't care when estimating.

Only thing is if at the end of sprint dude is saying "I started X then I had 20 meetings so I did not make it" - well you just accept that or you don't put guy into 20 meetings.


> But you don't estimate holidays in ... you estimate how long it would take if someone picks on the task Monday morning and works on it full time.

You estimate tasks that way, but you estimate capacity to do tasks on your actual track record, which will include holidays and other things.


Functionally you're just created extra steps and confusion by not calling it a time estimate, or at least something equivalent to time. Even with a real time estimate you shouldn't be planning by going "well you work 80 hours this sprint, so we plan 80 hours" - you should be doing the same consideration of looking at how many "hours" were completed in the last few sprints and plan based on that number. If we're in agreement that these numbers are for the team only then it shouldn't matter if they consistently under or over estimate the hours, nobody outside the team should know or care how many "hours" they complete in a sprint.

The confusion part is that by calling it "complexity" and saying it's not a time estimate you've muddied the waters on what it is, people will debate the definition and intentionally differentiate it from actual time. I've seen this before, the points-per-sprint never stabilizes because teams have cards where "that's a 1 point card because it's simple, but it will probably take a week". And then suddenly they're ignoring the points during planning to instead come up with an actual time estimates (which also don't work because they don't track those against multiple sprints).


I think time variability increases with the level of complexity. In this context I see the idea of task complexity being related to uncertainty in the time estimate. This makes it fit nicely with the Fibonacci sequence.


Time variability also increases with the time estimate for a task. If a task is "about two weeks", then it might be 1.5 weeks or it might be 4 weeks.

But if a task is about 1 day, it may take 4 hours or 4 days, but it will almost certainly not be 1 month.

Points are always just a proxy for time, and work the same way. Not matter what anyone claims, as long as you use points to plan time-abound sprints, points are directly a measure of time.


Yep, that's completely accurate.

At the same time, that's one of the reasons to prioritize removing as much uncertainty as possible.


Points are about uncertainty, they're the difference between:

- 3-4 weeks

- 3-9 weeks

Product managers can get their head around that.


Yes, project managers easily can have that level of comprehension, but it is rare to meet a project manager that understands that a time range is something like a confidence interval. That is, if it is estimated a task will take 3-9 weeks, with some probability (say like 10%) it will take an even shorter or longer amount of time. There is uncertainty encoded in the time range, but the time range itself is also uncertain.

Fundamentally, the problem is that project managers set deadlines based on statistical estimates from developers. Despite the fact that they set the deadline and do not understand the dispersion, they want developers to be responsible for misses. Sometimes, people mistankenly believe that there is some magical practice that can eliminate the uncertainty from estimation. You can make predictions with things like story points and achieve a certain amount of accuracy with a certain amount of dispersion. Statistically, it is the longitudinal behavior that can be predicted (sprint success rate at a specific velocity on a stable team), but we focus on cross sectional details (we missed this sprint).

Project management is generally not considered a field requiring statistical expertise but modeling reality of the work requires it.


Why not just say that then? It’ll take 3-9 weeks. You can then just add all the min and max and get a full range.


No, points mash together size and uncertainty. A task that's "3-9 days" will have fewer points than a task that's "3-9 weeks". And a real that's "3 months give or take a week" will have more points than either.

Of course, there's actually no such thing as a "3 months give or take a week" estimate for a task. It's basically impossible in programming to have a task that takes that long with that low a level of uncertainty. So in reality, time estimates have the same properties as points: the higher a time estimate, the more uncertainty it represents.


If they're not time, why use numbers? Use fruits: easy peasy, it's a lemon. A really tough story, a coconut. You can't add them in any case, because they're not time.


I like this, I'd advocate for a fruit-based task system - although I suppose the exact fruit ranking would depend on the team.

- easy peasy: lemon

- easy but needs careful handling: kiwi

- regular but boring: red delicious

- regular, who wouldn't want to take one of these?: mango

- large task, risk of splash damage if mishandled: watermelon

- tough to crack, needs time or a hammer: coconut

- technically we'll do this, but not really our job: tomato

Edit: I am sad that emojis aren't allowed in comments, though it's understandable.


Just because they're not time, doesn't mean you don't want to add them. Imagine you have an empty basket, and you're not sure how much fruit you can toss in it. So the first time, you just start tossing stuff in until it's full. The exact number of lemons, coconuts, etc will vary. But after a few rounds, you'll get a feel for how much of each you can get into the basket. That's story points. You feel your way into a groove where the team gets a more concrete sense about how much it can get done in a sprint, given the variability of the work, external projects/distractions, and the makeup of the team.

Story points get a bad rap because a lot of engineering managers don't get scrum and just see a convenient way to measure productivity. Which story points are absolutely not meant to do, outside of the team itself setting its own sprint goals.


My experience is that story points get a bad rep because they don't mean anything unless you use them as an explicit proxy for time. There's no way to say in reality "this task is small" unless you have some idea of how long it takes. Additonally, this concept of velocity makes no sense because a task that's big for me might be small for someone else in the team, so then we either pre-assign tasks and set points based on assignment (and then have problems if we switch the assignee for any reason), or we assign "generic points" and that ends up not meaning anything at all if the team is not very uniform (e.g. all seniors with similar skills and ownership of most of the same code, or all juniors).

Additonally, all methodologies tend to discourage correcting point values after the fact. That makes the process of deriving time estimates (velocity) even more error prone, because it conflates uncertainty with mistakes. That is, you can correctly estimate a task at 8 points and finish it in four weeks; or you can incorrectly estimate a task as 3 points and finish it in 4 weeks. That doesn't mean that the team has a velocity of about 1.35 points/week, it means it has a velocity of about 2 points/week, but made a mistake with one task.


>My experience is that story points get a bad rep because they don't mean anything unless you use them as an explicit proxy for time.

So when you shop for clothes, Small/Medium/Large are useless? You require precise measurements for every item, and they have to be exactly the same size across manufacturers, or else sizes have no utility for you? The reality is that a Large can be large on you in different ways, even if it's a t-shirt. And software complexity has a lot more dimensions than a t-shirt. The utility of story points is that they allow a team to create a rough idea of their capacity over a sprint, so that they don't consistently under- (or more commonly) over-commit.

If you try to use story points purely as a uniform proxy for time, of course they're going to be useless, because you can always just use time instead.


Of course small/medium/large mean something, they are an approximation of size/dimensions. But story points, adherents claim, are not a measure of time at all! They are not "an approximation of time", they are, so it is claimed, supposed to be unrelated to time, but to "complexity".

And while I agree that a task can be large either because you know what must be done and there is a lot of work or because you're not sure what needs to be done yet. But conflating those two things as "8 points" or whatever is just not helpful.


Story points are also "an approximation of size/dimensions." If my team has consistently deployed 25-35 story points per sprint for the last three sprints, it's reasonable for me to assume that next sprint they will also be able to complete about 30 story points of work. By contrast, knowing that they worked a combined total of 300 hours on average doesn't help me at all. And accounting for uncertainty is important, which is one reason a Fibonacci sequence is commonly used. The general rule is to go up one in the sequence if the team is uncertain. The whole purpose of story points is to avoid having to track things like uncertainty separately. It's like the Markov assumption, the information to get to the current point estimate is baked in. It is useful (essential, even) to incorporate fuzzy concepts like perceived complexity or uncertainty without bogging the team down trying to measure them precisely and explicitly.


> If my team has consistently deployed 25-35 story points per sprint for the last three sprints, it's reasonable for me to assume that next sprint they will also be able to complete about 30 story points of work.

And if the last few sprints they've completed between 5 and 30 points, do you believe they'll complete around 17.5 points next sprint?

Now, if the team is good at estimating (which they are if they get consistent results between sprints), they can tell you of them telling you feature X is 8 points, Y is 5 points, Z is 15 points, and you concluding that they will finish X, Y, and Z next sprint. But, they can exactly as well tell you that X will take around 3 days, Y will take around 2 days, and Z will take around 5 days, and you can have the same conclusion.


>And if the last few sprints they've completed between 5 and 30 points, do you believe they'll complete around 17.5 points next sprint?

I don't know, what did the team figure out in retro? Was the big difference a real underestimation, or was there some kind of unforeseen blocker? I've never seen that big a variation, but anything's possible.

If it makes you feel better to measure team velocity in something you call "days" instead of story points and it works for your team, more power to you. But don't fool yourself that you're talking about actual days. At best you're talking about "probable days", and how many days it actually takes will depend on a lot of things, including unknowns and who takes the story (are "Bob-days" the same as "Carol-days"?). So you'll end up with a measure of days that is very team- and uncertainty-dependent, and at that point it's better to just use story points and admit that it's not a universal measure and doesn't need to be. Not to mention that by using days you'll invite confusion between calendar time and time-on-task.


If you try this (and I have, just not with fruits), someone will complain they can't graph fruits. You'll tell them that's the point. They won't listen, so they'll map fruits to numbers, and now you have the same problem anyway.

My personal preference is to use time estimates with some uncertainty. A day or less. 2-3 days. A week at most.


> My personal preference is to use time estimates with some uncertainty. A day or less. 2-3 days. A week at most.

In the project management world, there is an assumption that tasks that are overestimated and underestimated would even themselves out so that the total estimate would equal the actual time needed. Sad to say that accuracy of estimates don't follow normal distribution in software development.


I had one manager who used time in orders of magnitude. He’d ask, “is it a day, a week, a month, or a year?”


I've done exactly this in the past, and then when someone asks how long the project as a whole is going to take it's easy enough to give a range. If someone says a task is going to take hours that's a range of 1-6 hours, if they say it's months that's 1-12 months. If you want more certainty in your estimate then you're going to have to give us some time to break things down.


That is one of the complications - one thinks developers should be smart as in abstract thinking so they should understand (just like all the other numbers humanity made up): "numbers we call story points are not having property to add them and they are not convertible to time".

*Properties of Whole Numbers:

    Whole numbers are closed under addition and multiplication.
    Zero is the additive identity element of the whole numbers.
    1 is the multiplicative identity element.
    It obeys the commutative and associative property of addition and multiplication.
    It satisfies the distributive property of multiplication over addition and vice versa.
*


Why use a measure that creates this footgun? Why should our task estimation require this much abstract thinking? Why not invent the measure so that it's not misleading? Numbers generally have well-understood properties. Using them in a way where the properties don't apply is asking to be misunderstood.


And this is how the entire math gets done in 5 simple JIRA tasks. :D


I use Halo difficulty levels: Easy, Normal, Heroic, Legendary.


I continue to treat story points as a measure of time, despite being told repeatedly they're definitely not time. I will continue doing this until someone can explain to me, in a way I can understand, what they actually are that is not time.


Because in the end, they are a proxy for time. We can call them "complexity" or whatever, but that doesn't help much with planning a time-boxed period of activity. So they end up meaning "time".


Having points not be a measure of time is a means of estimating how much work you think the team as a whole will accomplish while diminishing the risk that a given estimate (delivered in time range) will mutate into a 'promise'.

Its also a good way of communicating what you think the blend of known unknowns and unknown unknowns is.


Difficulty level?

You can't promise you can beat a game on hard 2x as fast as you can on normal, or 3x as easy.


Estimates never represented a promise in the first place. If you have someone who is holding you to your estimates, you have to address that.

Regarding difficulty, easy things aren't even expected to be faster than hard things. I'd rate a backflip as much harder than counting to 100,000, even though it wouldn't take nearly as long.


The identity of story points depends on what information you have. If you don't know your team's velocity then story points are only relative complexity. Once you have your team's velocity you can use that information to convert to time.


What is relative complexity? How can you compare the complexity of <changing the colors of one button> with <implementing a sorting algorithm>, other than by how long they might take?


Then it would be how long they might take relative to each other. To give a specific time you would have to know the stage of development for the product, the health and ease-of-use of the CI/CD pipeline, the current interviewing burden of the team, the rate at which production incidents are occurring, the number and length of meetings that the team members are in, etc., etc., etc. More junior developers generally won't do a great job of that. But more junior developers can usually estimate the time it would take them to do the task if they had absolutely nothing else to do and if their build/test/deploy pipeline were optimal. So with story points that's all they need to really worry about. In that way story points are a tool to help the team make better estimates in the face of varying degrees of expertise in individual contributors estimating time-to-complete.

By judging complexity and measuring velocity you get an estimate of time that intrinsically takes all of the variables into account. It's a powerful tool when used right.


> To give a specific time you would have to know the stage of development for the product, the health and ease-of-use of the CI/CD pipeline, the current interviewing burden of the team, the rate at which production incidents are occurring, the number and length of meetings that the team members are in, etc., etc., etc.

This is a strawman. When asked to estimate a task, people essentially always think in terms of "how long would it take if this were the only thing I was working on". Of course, when a junior dev gives an estimate like this, you don't put it into a Gantt chart and start planning release celebrations based on it: you add appropriate buffers and uncertainty based on who made the estimate.

> By judging complexity and measuring velocity you get an estimate of time that intrinsically takes all of the variables into account. It's a powerful tool when used right.

Again I ask, what is complexity, other than an estimate of time taken?

Also, "velocity" is just an average across people and sprints. This would only converge to a meaningful estimate of time IF people are consistently failing their estimates in the same way. If the error bar on the estimates varies wildly, taking the average doesn't do anything meaningful. I think this is much more common than consistently miss-estimating in the same way.

Not to mention, if these estimates of "complexity" don't take into account external factors, then they'll always be off by unpredictable amounts. Velocity measurements also fail to take this into account - so, when the team had a bad sprint because a member fell ill, or because there were extended disk failures, or whatever other external event, that goes into the velocity, as if this is some recurring event.


I did story points with a team of 12 over a two-year period of time, and it worked.


I did them too, with two different teams of 8-10 people, once for two years and the second time for about one year. We didn't lose anything when we gave up on points and simply went for time based estimates.


Points are almost worthless as they can be gamed.

We're judged on delivery. Measured by time. Complexity is arbitrary.

If points aren't time bound why am I limited on the amount of points I can take? Every team had a max point load. If there's no stick for rollovers then you're Kanban.


> Story points aren't time...

> ...burndown chart...

The x-axis of a burndown chart is time, right? So if you create a chart that measures points/time then you encourage the idea that a certain number of points can/should be completed in a day, ergo that points are a proxy for units of time. Otherwise what's the point in the chart?


If you're measuring foo/time, then foo is probably not time. Unless you're measuring some kind of acceleration.

Charts are supposed to be pretty and reassuring and go up and to the right. That keeps the managers happy!


Yeah, it seems like it's fairly common for people/teams to follow the idea that any story that is 8 or more points should be broken down to tasks of 5 or less. This simply doesn't make sense to me. If the most simple task is 1 point, is your most complex task allowed really only 5 times as complex? Story points usually follow an exponential increase for a reason, enforcing staying in the mostly linear portion is just pretending the complexity and uncertainty has been decreased.


The idea is that if a task is that large can you really not break it down into smaller steps? Do we understand the problem well enough to implement or are we hand waving over likely areas of complexity? Maybe if you tried to break it down you'd realise that the 21 point card is actually more like 10+ 5 points tasks and you had just though "big" not "I know what needs to be done and can size this accurately".

Doesn't mean these cases never occur but it's worth seeing if it's actually smaller related pieces of work.


> Maybe if you tried to break it down you'd realise that the 21 point card is actually more like 10+ 5 points tasks

It didn't even occur to me think of it this way, because the times I've been exposed to breaking down tasks the total number of points stayed constant. 13 pointers becoming an 8 and a 5, and the 8 pointer becoming a 5 and a 3.


> Do we understand the problem well enough to implement or are we hand waving over likely areas of complexity?

Nailed it. That is exactly the right question to ask.


The use of the Fibonacci sequence is so pseudo-intellectual. It's completely arbitrary, but use of the Fibonacci sequence makes it sound smarter or justified somehow.


This article argues against story points but then concludes that the solution is breaking down work packages into atoms called tasks and measure the queue length. Those are two different dimensions. No matter how hard you try to break things down, a task queue will still have items in it with different sizes.

There is nothing saying you can't refine work packages together in your team, while still using story points. That's actually how it's done almost everywhere. When items end up with a very high estimate there will be a push to refine it to something smaller. Something you should do but only as long as it still makes sense.

In fact the worst place i ever worked at was where we were given strict orders to break down every story until they all became 1 story point (still using story points...). Doesn't take a genius to figure out what happened next. All packages started having pointless micro-tasks with thousands of cross dependencies between them: "open the editor", "write a function" "write a unit test", "commit the code", "review the code". How am i supposed to write a unit test before the function signature has even been made? How am i supposed to iterate when finding bugs? More complex tasks still overran their estimates by factor 10, in fact even worse than before, some things just can't split, yet they still needed 1 point estimate.

Using the queue length and the impact on variability is still an interesting concept, i just don't think you should connect it with breaking down everything into single-sized items.


I wish the article spent longer with discussing the queuing view and less time on pointing out the flaws with story points, but I guess I'm biased because the flaws already are "obvious" to me.

I think there's an inherent tradeoff between the overhead and misery of breaking down a task into granular subtasks and the variance of task completion time. In practice what this would mean using a queue style form of tracking would be that you trust your team to break down work and do time-bounded investigation into unknowns. Then you look at your task completion rate. Now measuring this as an RV gives you not just the average task completion time, as we reduce to using Little's Law, but also variance of task completion time. If we find task completion time to have too much variance despite the input queue length of tasks not actually growing very much (i.e. the arrival rate is staying stable and low), then it's probably worth having the team break down tasks in a more granular fashion (or maybe it's a single person who keeps making giant tickets or something). On the other hand if the team keeps complaining about straight-jacket ticket discipline, it's probably worth letting folks be more loose with task creation. There's a human element here but there always is, since it's humans who are doing the work, and that's fine.

I've always argued that output per person on a team should be modeled as RVs, but I really like this queuing approach and it's something I may bring up on my team. Again in practice I'd probably just track task completion times on a weekly basis. This would be much simpler than story points and instead of the bickering that comes with trying to break a ticket up, it would give engineers more autonomy over task creation.

I really like the idea.


I’ll do that for a followup post. Watching the discussion here it seems like that would be beneficial. I struggled with where to put the emphasis on this write up but ended up deciding that I need to clearly make the case for Story Point flaws first as a warning reference for people experiencing these issues in their current environments.

I would have separated queue management into its own post but thought writing about a problem without presenting the solution would result in a lot of “okay, so what should I do instead?” But that created a length complication where I had to keep it succinct because it was already a long post.

Anyway, there will be a followup with more detail on queues. Probably the biggest complication for them is the lack of good reporting built around them in most existing systems.


The bit where it broke down for me was where they initially make the (great) point that agile falls down where small story points for simple tasks (which is accurate) are added to large story points for complex tasks (which is much less accurate), leading to a net inaccurate estimate. So the take away is to break down complex tasks.

Then they introduce queues, which are made up of small tasks.

I admit I stopped reading at this point. Is the useful thing the queue, or the fact everything is now broken down into small tasks?


> Is the useful thing the queue, or the fact everything is now broken down into small tasks?

Yes. :-)

There are numerous benefits to both that are explained in more detail.

The short version is that the small tasks will give you a more accurate rate of progress. The exercise to break things down that way will benefit the entire team's communication and understanding of the problem. Writing them down will keep a record in place for new developers who join where a point measure wouldn't provide any real context.

The queue gives you a clearer picture of job sizing that can be measured and naturally changes with scope changes. It gives you a leading indicator of additional complications so that you can pivot earlier in the process if needed.

Awareness of variability amplification from the queuing process also makes people conscious of flow control of work, as well as the extreme negative consequences that happen when that work is over-scheduled.

They work together to create multiple benefits.


Thanks for the reply, I appreciate it.

I'll admit to still not quite understanding the queue part in concrete terms, I'll go back and read the article fully, but it still sounds like a queue is a series of small tasks which are added together?


Yea. I’m using his terminology but a queue here is just like a queue anywhere else in programming. Its an ordered list of work to be done. Really not any different from a backlog.

He’s using queue more specifically because he’s also referencing Queuing Theory so it keeps the terminology consistent.


Story points were only ever an approximate unit of time. To fit them into a clear unit of time such as a sprint is a clear indicator of this. To call them complexity or anything else is misleading- transcribing a dictionary by hand into a CSV file is super simple (1 pt) and takes a long time (how many points is 3 FTE-months in your calibration?).

Eatimating effort-time vs completion time are quite different, serving different stakeholders. A story that takes 1 point of effort by anyone's pointing system could still take a week due to any number of factors (crucial collaborator gets sick, laptop crunched by a car, ransomware attack, whatever). The only really estimable aspect is how long the developer will spend on the work, not when it will be done. Air speed, not ground speed.

That said, it's not clear how queue analysis helps when you haven't spent any time saying how long you might expect each task in the queue to be, or what the dependencies are between tasks within and across teams. Given engaged team members, I've gotten very good results on predicting the pace of progress for sprints, and it all never mattered because everything needed to be shipped. About 4-6 weeks before each X was to be completed we could say with confidence that X would be ready in 4-6 weeks. Not terribly useful.


I just don't get what the new terminology is for. Like how is this helping anybody?

Manhours = story points Task = story Subtask = sprint Upcoming tasks = backlog Task turn around time = ?? Project goal(s) = epic(s)

I'm probably not even doing this right, but what are we doing here anyway?!

It gets real fun when the project involves software/firmware + mechanical engineering (think machines, robotics, etc), gotta love their faces when you teach them the special magic advanced project words for special software people.


The broader point with all this is every field comes up with jargon because it's a form of compression rather than repeating the same phrases again and again. The differences between what you've written there and how they've been used in the teams I've been in are the point here. Instead of saying "manhours but with a conversion factor that we figure out over time and an understanding of the variation" you say "story points".

Story points aren't hours. People are really bad at estimating with time, but they're much better if you just get them to compare things and say which one seems like it'll take longer.

Stories aren't tasks, either. A task is a single unit of work that you can do and complete. Stories are groups of things that solve a particular problem.

For example, adding a password reset is a story. But that might be distinct tasks that different people can take on, or should be coded and released independently. Maybe that requires setting up an email service or server, UI changes in the frontend, backend changes, is there design that needs doing for it, etc.

> Subtask = sprint

Sprint is a bit of a weird one sure but it's a short length of time. Different for different teams, typically 1-4 weeks.

> Upcoming tasks = backlog

Backlog isn't a very custom term here is it? Also it's not the upcoming tasks, it's things that probably should be done but not right now. New idea? New feature? Cool, backlog, doesn't interrupt the current set of work for the next couple of weeks.

Every field can do this with every other field. Doctors with their fancy words like anterior, why don't they just say "the bit at the front"?

I know there can be cringey project managers, yes. On the other hand, I've also seen highly skilled engineers scoff at these kinds of things then spend way too long building stuff that doesn't actually address what the user needs, misses out key parts because they never thought about who was actually tracking those ancillary pieces of work and making sure they're done, and fail to deliver.

Oh and finally if someone wants to come along and say "well we did it differently", if that worked for you then great! The classic point of agile was that you should try things and keep what works. None of these concepts are particularly complex imo.


The idea was to not use direct measures of time to estimate because someone above you will start assuming those estimates are promises and then life gets unpleasant.

Story points, much as I personally dislike them, were invented for developer defense.


I've never seen such systems emerge from the team as a tool for self-empowerment. They are always foisted upon the team by managers who are trying to, ultimately, turn estimates into promises.


There has always been a perfectly fine word for that which means exactly the same thing: man hours.

I remember learning in school (way before the hocus pocus fancy new words came out) that the development of an Intel CPU cost around 1000 man-years.

I don't understand how expressing this as "it took 1000 kilo-story points" or whatever would bring any advantage whatsoever.


>I just don't get what the new terminology is for. Like how is this helping anybody?

The corporate world likes to come up with new terminology for old stuff and brands it as some new profound discovery. The result being that the new young generation of employees believe that they are living in a new enlightened age (bestowed by their corporate overlords) compared to their older counterparts who were living in the intellectual un-enlightened, dark ages.


Also, the whole thing about "velocity" is just weird. It's as if some marketing goon heard the word "Velocity" and decided to over load the term in a manner that does not mean Velocity in the mathematical sense. Like in Calculus... the velocity curve is the first derivative of the linear graph, and the accelleration curve is the 2nd derivative. And for that matter, who cares about velocity, we have always wanted accelleration which sorta yields velocity. The problem with velocity, it's a dumb idea, because if I drop a rock from a tall building, it builds velocity as it free-falls down, and so it goes -- you have differetn kinds of velocity, but you only really have one kind of accelleration.

Another issue, as the blog points out, story points are a simple scallar number, and you cannot easily say how that sum is factored into risk, complexity, effort, etc... So why even have it? Clearly a story needs to be a kind of vector with these properties (risk, complexity, effort...) set to possitive or negative values, and then do the typical vector math to plot the dirrection and magniture of the story in the n-dimensional space... Then, one can actually calculate stupid ideas like velocity or accelleration to speak in the language of project/program managers.

Ultimatly the managers need a way to quantize the units of work done by development staff to better plan, and that's understandable, yet really hard. I think putting the cart before the horse is usualyl a dumb idea, and things should go back to measuring performan after the work is done, instead of estimating performance before work begins. Nobody wants to commit to a performance contract for each and every task ad nausium, and thats what happens durring sprints.


All of these shenanigans are just dancing around the intractable problem which is - nobody ever figured out a way to produce reliable estimates.


That’s because you’re dealing with a pile of abstractions on top of the cpu, data storage and channels that connect them all. You’re perched on top of the babel tower trying to figure where to put the next stone without everything crumbling around you. And you don’t even know the shape of the stone. You only have a vague description of it.

Just too many unknowns.


But you have !sprints! and if you have laptop crunched you either have other dev taking over - so something else doesn't get done - or story that dev with crunched laptop is not going to deliver. That's just life and I also understand how disconnected management can be - but a lot of time you cannot say "exactly" when something will be done, because that is just not possible.


I don't understand people like this author. I guess I do. It's all marketing flamebait to get attention.

He knows exactly what story points are, he goes through them exhaustive, but strangely deriding them the whole time.

Then he concludes by purporting to invent the very practice you ALWAYS were supposed to have been doing to make story points work. You have to find a set of repeatable work to compare new stories to for reference. That's the whole game. That's his tasks "idea". That has always been part of every implementation and lesson on story points ive been exposed to.

It's a completely nonsensical article.

Story points have always been about queues and implementing Little's Law. Always.

Yes, it sucks to be on teams that just argue about points and don't work to refer to standard architectures for building blocks. That doesn't mean story points are broken, it's pointing out something else in your organization is broken.

A lot of real criticism in this comment, but props to the author for writing at least. It's more than I do as a part time internet complainer.


I can see that perspective I suppose, but it's certainly not marketing flame bait.

Whatever story ports were always supposed to be, they aren't. Numerous people's real world experiences go sideways, because of the way that they are designed. You're setup for failure and confusion.

I never claimed to invent anything. I'm highlighting Donald Reinertsen's work that more people should be following.

The purpose of the article is to remind people of why everything is broken so that they can identify it and fix it, including examples.


So this may not be fair, but you lost me quite a bit with the positive take on SAFe, which is the worst, most unproductive experience of any process I’ve had.

I also do not understand the queues and capacity issue. I have never been in an environment where we do not have so much work that we cannot meaningfully see past the end of the queue. I don’t necessarily view that as a bad thing.


Finally, Queues and Capacity.

There will always be something else to do, that is definitely true. In this context we are talking about taking a chunk of that and trying to create an expectation of when it can be ready.

In a Sprint, for example, you take a slice of work from that queue, a few stories with the goal of trying to get it completed in the next couple of weeks (hopefully). That creates a new queue for the sprint, essentially.

You've controlled the flow of work based on the teams historic capacity using some metric, which has previously been velocity.

Overloading that capacity is what happens when new work sneaks in. Support items. Requests from people outside the team. Long meetings. A team member unexpectedly being out. Either the amount of work has gone up or capacity has gone down. If anything that was planned is much larger than initially expected, you're going to end up way behind. That will push everything else farther behind at the same time.

The moment you start scrambling to try to get it all done, you're going to start making mistakes. People are not going to take the time to refine things. Errors will happen. They'll take longer to fix or worse they'll end up in production and create new problems...that will create more unplanned work for the next time.

Hope that helps?


Yeah that helps. I thinks sprints are a bad approach to work, so probably some fundamental disagreement there.

I also strongly believe that all processes are downstream from people, team culture, and organizational incentives. If you don’t get those right, your processes won’t matter, and if you do get those right, your processes won’t matter very much.

I guess I’m not your target audience. That’s ok.


Oh, I agree on all points except that processes are inevitable in a sufficiently sized organization. Understanding how they affect people is critical to making them work.

I was just using sprints as an example here.


I’ve read the horror stories on here and I understand. There’s what it is supposed to be, what is taught and how it ends up.

Rigidity is probably the biggest issue. It’s supposed to be adapted to an organization leveraging what works well, handing more control to developers and addressing some communication gaps.

When people try to implement it strictly and force the company into the example template it creates a lot of friction.

When I’ve previously explained on here what it should look like, it’s typically nowhere close to that in the horror stories. Developers should have significantly more control in a SAFe environment fwiw. I’ll explain it more if you like though.

I have to run out but I’ll come back to explain the queue stuff too.


I do totally agree on the SAFe bits. I've seen it implemented extremely well when 1) everyone got regular training (biannually or quarterly) and 2) senior leaders adopted their appropriate SAFe roles.

Quite often i see organizations do a SAFe kickoff and then nobody ever learns more about it and senior leaders view it as a team level thing only. It doesn't work then because nobody's actually doing it.


Having senior leadership on board is critical to the entire process.


Alright, comment part 2 since I have a little more time now.

First, the SAFe explanation. I've been a software developer for a little over 22 years. My first experience managing a team was 12 years ago and I shifted to it full time in 2018. My entire motivation for doing this was living through environments that were painful and unproductive for everyone involved.

I found Reinertsen during a lot of reading when I was trying to learn the best ways to do things and get a clearer picture of things that I didn't understand because I hadn't had a view of them in my dev/arch/ops roles previously. When I picked up that book I periodically shouted, "YES! EXACTLY!" while I was reading because it showed the math to prove just about everything I'd experienced. Then I used his methodologies to lead two teams with a great deal of success from my perspective.

What I was not doing a good job of was communicating with the business side of the company regarding what was prioritized, why and how. That lead to a lot of back channel grumbling from other people in leadership who had different priorities. I'd just spent months talking with everybody about what their priorities were and mapped out a plan to have it all done in about 6 months. I was extremely confident in this plan at the time. It was beautiful.

One of the senior guys, who I will call "Squeaky Wheel" torpedoed the plan because his top priority wasn't first. That was the moment that I realized I needed to find a better way to get these folks involved in the process that wouldn't blow up all of the great things we had happening.

Long story short, I found SAFe and the CEO sent me for training after I explained my rationale. Here's what appealed to me about it.

1. WSJF - All facets of the company come together to agree upon the value of different initiatives. Sales, Support, Product, Legal, Marketing and other senior leadership when necessary. Following the discussion all of the back channel "we should be doing this" stops because everybody knows exactly why we're doing what we're doing.

Then you combine that with an effort estimate to get a projection about the best value for your time. This will naturally end up prioritizing a bunch of small stuff that has been ignored for a while first, but once you get through that you start more cohesive conversations about larger initiatives. In that first experience, Squeaky Wheel had a low value high effort project that went to the bottom. Aside from him, it went beautifully for everybody else.

2. The PI process - After people have put in the effort to agree on priorities, PI planning happens where Dev gets to lay out the plan for the best way to achieve those priorities. You spent a couple of days planning it, presenting, reviewing and everybody comes out knowinng what the priorities are for the next 8-12 weeks. Then as a dev you get to execute without dramatic shifts in direction. Now, when somebody in sales doesn't close and tries to make a case to change everything you're doing he's directed to make a case to product for WSJF consideration for the next PI. This brings sanity to the dev process.

The PI plan also involves capacity planning so you build in a natural buffer too. As a side benefit, you find out most of the vacation plans of your teams over the next quarter well in advance. One of the big goals of PI Planning is building camaraderie and skipping future meetings with people who may need to provide input. This factor is significantly more difficult in a remote environment.

3. The "Solution" Backlog - Part of the process is that dev gets to keep and prioritize its own backlog of technical priorities for the next PI. You typically split the capacity planning 70/30 or 80/20 across the Feature/Solution backlog. This gives the tech side of the house the ability to ensure work that they know needs to be done, gets done without having to make a case to the business about why. This gives time to improve processes, automation, reduce tech debt, refactor, performance tune, etc if it hasn't already been built in.

When those 3 things happen every developer's life should get easier. Unfortunately, it doesn't always work that way and you'll end up with more layers of control + dictation rather than handing over control to interested parties.


The fundamental assumption here is that you are in such a fantastically stable environment that you can spend days planning the next 8-12 weeks, and then that plan is actually reliable.

My experience was that maybe 20-40 percent of the plan ended up working (blocked by external factors, shifting market landscape, missed or unexpected work coming up), and then there was a huge amount replanning and time wasted.

I just cannot reconcile that 8-12 week cycles is in any way conducive to agile development.


Personally, I’m a huge fan of the 37 Signals approach.

They do 6 weeks planned, 2 weeks unplanned as their regular cadence.


As far as my current employer is concerned, 3 story points equal 8 hours of work.

The previous one considered only complexity, so a simple change that needs adjustments all over the project would still be considered 1 story point, even if you needed multiple days to get it done.

The PO in the job before that kept asking "but how much time will you need for that" until he got an answer, ultimately making story points redundant.

Really, ymmv


That's like saying a system that generally yields bad results isn't to blame, its people.

For a vast majority of managers in "software companies" things like story points are about asserting control over what is created, getting commitments from various folks, and then increasing stress to have you "sprint" constantly "behind schedule" so they can inject additional requirements or pivot to the new thing they want to do.


Story points are a useful exercise when trying to discuss complexity of a task but IME it completely falls apart when used as a metric to determine anything useful like velocity.

I had a terrible experience with them once. I was a relatively new, enthusiastic engineer on a struggling team of guys who'd been at the company a long time and were pretty burnt out. Inevitably I started getting all the "hard" stories with a lot of points, til it got to some stupid point where I was outputting about 80-90% of our team's combined story points. Management caught wind of it, didn't like it, so what they decided to do was adjust my points "downward" to be more in line with the rest of my team's output. It really irritated me, because it'd result in absurd situations where my teammates would get a "3 point" ticket that was like, just updating some field in a config file to an already-known value and checking it in somewhere, and I'd get this whole-ass project wrapped in a single ticket and they'd give me the same amount of points for it. And of course this was tied in to performance reviews, adding to how annoying it was.

Another super irritating thing that would happen is I'd be asked to estimate complexity on some super vaguely defined ticket describing some problem that would take a lot of detective work to even figure out how to solve, so how am I supposed to give an accurate estimate of complexity? If I knew that much, I'd already have probably fixed whatever the issue was.


When my company first started using Scrum it was with Rally. It had story point, but also estimated time and actual time. We used them all, and I found the estimated and actual time to be the most useful. We then went to Jira, which kept the points and dropped the time.

Since we were tracking the actual time things took as well, our estimates got better over time. The team would help to keep others honest. For example, writing documentation always took at least 3x longer than anyone expected, so we’d alway make people add more time to those stories.

Once we had reasonably accurate time estimates, there was also a feature for available hours. If someone was on vacation, we’d subtract those hours while planning the sprint. It was then easy to see who was over committed, balance the work, and to make sure we were being realistic.

We worked like this for about 2 years. It was probably the best 2 years I had in the job. We got a lot done, we were being strategic about our work rather than reactive, and while we would push to get things wrapped for the end of the sprint and the demo, it never involved all nighters or heroics. We pushed because people on the team wanted to do more and go faster, not because of pressure from the outside. I noticed people got more down when they tried to do less, so I’d often push back on people trying to overload the sprint. If they finished everything, we could always add more. Outside the team, our VP told us we were 3+ months ahead of everyone and if I wanted to go hang out in a cafe in Europe for a few months, go.

This is all a distant memory now. So many lessons learned, but they all fall on deaf ears in the current organization.


The value of story points is that it acknowledges that not all stories are equally time-consuming. Importantly, story points also provide a process for identifying stories that should be further decomposed.

In my experience, story points allow forecasting that's as good as any forecasting method I've ever used. And I've used pretty much every schedule forecasting method over my long career.

The author touches on many of the reasons why story points don't work. And pretty much every reason he gives is something that you are not supposed to do.

The key to getting them to work is trust, and a commitment to never use story points as metrics to measure the performance of developers. Any attempt to do so will result in gaming of the system. The tradeoff that stories provide is lack of precision in exchange for not having to spend 50% of your development cycle up front doing detailed analysis required to provide detailed estimates (which never worked anyway).

Things you must also never do:

- compare calibrated burndown factors between teams.

- Ask why the calibration factor isn't N. THe calibration factor is.

- Have stories with more than N story points (where N is 3 or 5). Decompose them.

- Introduce the least bit of stress.

- Use burndown rates to generate commitments, instead of forecasts.

- Use forecast results to justify asking developers to work overtime.

The last point, I think, is particularly interesting. The manager who was my first scrum master made the following commitment to us: you will never work overtime again. And voluntarily working overtime will be consider a bad thing, not a good thing, since it impairs predictability of the team's productivity. "I know you don't believe me", he said. But he was right. We never worked overtime again.


Is there anyone else out there still doing waterfall and estimating in hours and everything is just going fine, or am I just that lucky?

I work on a small professional services team customizing a couple of our products for our customers. A few times a month we get a request for an estimate to add a new feature or workflow. We do a high level customer requirements doc, discuss it as a team, the seniors from each area (design, development, qa) each provide an estimate in a range of hours. This all gets wrapped up into a final price to the customer. If they approve it, we dive into detailed design and have them sign off on the result. Then we go into development for weeks to months, qa, and then release. Our processes really haven’t changed in over 20 years. We’re constantly ranked as one of the most productive teams, and get high scores on our post-project surveys from our customers.


What does QA do when you guys are building the software?


They’re working on other projects. Whether it is testing a general release for a customer, testing bug fixes, or testing a recently completed customization project.


So multiple projects all doing Waterfall?


Yep


Just estimate time. In the end, whenever I worked with story points in various companies, even though developers, Project Managers or Scrum Masters would often state that Story Points are a measure of complexity, in the end, the Velocity for a given sprint was measured in Story Points as well. So, in the end a Story Point is equal to an amount of time in a sprint.

This is also stated in the article:

> Story points do not represent Time, yet the Velocity metric they are usually combined with defacto converts them to time, sabotaging everyone from the start by doing the thing that you can't do with a precise number and a range...adding them together.

Better yet, just don't bother with SCRUM and all it's pointless and time-consuming ceremonies and just get shit done. This is my preferred mode of working and I've been lucky to be able to work like this for the last couple of years.


I disagree with the first part. The only thing less useful than estimating points is estimating hours.


I mean I agree, but it's because you should estimating in days.

No task significant enough to warrant a ticket in a queue and design effort takes less then a day. Even if you you swear it's "done", the more likely outcome is you'll be dealing with it for hours later that week when some sort of an issue crops up.

I've seen so many people "bid" 0.5 and 0.25 day units of time (in points or whatever) and then act offended when I challenge them on that, yet 3-4 days later they're still plugging away at the same task due to "complications".


Yep, I've been guilty of this in the past. Any estimate shorter than a day is for a task that wasn't worth the time to break it down and estimate it as a single unit. Like "modify firewall rules to allow traffic to port 1234" -- no, that's not a separate task, that's part of whatever task requires you to do that to get the whole thing to work.

And any task that is meaty enough to be a task will take at least a day, and probably longer.

The one place where I will more or less disagree is when bug-fixing. There are always some bugs where you pop open the debugger and have it solved in an hour or two. I've been at places where bug report tickets were handled differently from stories/tasks, though, so maybe this is fine to just think about separately.


If story points convert to "hours"/"days"/"time" in the end of the day, why is estimating in time less useful?

Asked in another way - why is it more useful to estimate in an unit that's more abstract/distant-from-reality?


The real question is “how to get shit done”?

Scrum, agile, safe, etc.. are ways to get shit done that all target the measurement.

Estimations are that measurement.

The nugget from this article that seems to missed by many is the subtle but strong advocacy for XP style Mob/Pair programming.


Say no to SAFe. It's a middle-management orgy of mediocrity.


I believe this latter suggestion is known as the “programming, motherfucker” methodology [0]

[0] https://programming-motherfucker.com/


Ugh, yes, thank you.

I wish management types would get it through their heads that you just cannot reliably estimate most software development projects. (I said "most" -- there are of course exceptions.) You can't evaluate employee performance by looking at a burn-down chart. You can't show pretty graphs at the end of every sprint and expect that to predict the future of the project.

What you can do is set a reasonable deadline with your team, have them work toward it, and allow them to adjust your expectations on what exactly you will be getting by that deadline. Yes, establishing that deadline in the first place requires some sort of estimation, but story points, t-shirt sizes, etc. are useless for that. Everyone on the team sitting down, breaking things down into as-small-as-possible tasks, and coming up with time ranges for each task is the way to do that. Then you add up all the minimums and maximums and you have a time range for the whole project. But that range is still only a guess, and can't be taken as gospel. And it may be wild, like "somewhere between 6 weeks and 6 months", and you have to accept that.

That's it. That's the best you can do. As the project carries on, the only thing you can reasonably report on is the list of features or functionality that's been implemented so far, and the new range of the estimate based on what's remaining to do. You can also look at the completed work, and map out where in the per-task estimate range the team ended up hitting, but that still can't predict the future.

You especially can't evaluate performance based on this stuff. That requires being an involved (but not micro-manager-y) manager who knows the team and can identify when their people are shining bright, and when they are struggling (or just slacking off). It's called people management for a reason; you have to involve the humans in that process and can't evaluate them based on some made-up numbers and dodgy, hand-wavy math.


Story points are pointless on their own, but estimation meetings are invaluable. In those meetings, story points serve as shortcuts for expressing gut feelings. However, after the meetings, they become completely useless, and even harmful, because, as the article mentions, people start treating them like precise numbers and do arithmetic with them.


Much like one huge difference between string "42" and the number 42 is that the latter can have math done to it. If you want to avoid the arithmetic, label tasks with words like "simple", "small", "quick", or my favorite "Won't fix".


> Much like one huge difference between string "42" and the number 42 is that the latter can have math done to it

I actually agree with your comment but this part made me laugh out loud because JavaScript :) ("42" - 1 === 41 but "42" + 1 === "421")


Good ol overloading of the + operator at work there


As someone looking at algorithms for managing buffer bloat, this resonates.

In IP routers, the goal is to keep the congested link busy. i.e., Idle time from a momentary hiccup is wasted time. You need a small buffer to do this, but piling on more data adds latency without actually doing any good.

Algorithms like CoDel realized that a lot of previous attempts to make this were noisy as heck. Minimum latency through the queue is the signal that makes sense. Everything else is misleading or gives inaccurate predictions. Why should it be any different for managing tasks for human workers?

[1] https://en.wikipedia.org/wiki/CoDel


I've personally had great success with story points.

Success with story points comes when everybody realized they are useless for anything outside of a dev cycle and when you realize that the effort into making them somewhat accurate is the valuable part.


The article lost me at a certain point, somewhere around the "solving the conondrum".

It lost me, because we have two estimations - an overall size guess of an epic and an actual implementation estimation of an epic. Like the overall size guess is just 2-3 seniors looking at an issue and wondering if this takes days, weeks, months or years to implement.

The actual implementation discussion is however what the article is talking about. We get most or all of the team into a meeting and we talk through what needs to be done, and structure all of that into individual concrete tasks everyone can have an idea of implementing them. And then we estimate those tasks.

And this estimation in turn is communication to management. Like, we've realized that about 21 is what one of us can do in a usual monthly iteration outside of massive outages and such (we're an operational team). So if an epic turns out to require some 3 21's and 3 13's... that can easily take 6-12 months unless we put exceptional focus on it. With high focus... as a team of 4-5, that will still take 3-6 months to do.

On the other hand, something that falls into a bunch of 5's and 9's and such tends to be muddled and struggled through regardless of whatever crap happens in the team much more reliably. It needs smaller chunks of overall attention to get done.

And note that this communication is not deadlines. This is more of a bottom-up estimation of how much more or less uninterrupted engineering time it takes to do something. A 21 in our place by now means that other teams have to explicitly make room for the assigned person to have enough headspace to do that. Throw two interruptions at them and that task won't happen.

It's more bin-packing than adding, tbh.


I've never understood the point of the points (pun intended) you can try to convince yourself they're measuring "complexity" or whatever but what they're measuring is time, so why not cut out the middleman?.

I'm not saying to estimate tasks accurately to the hour. Just rough ballparks: "I think I can do these 3 on friday". "Give me a week, it's gonna take a pile of tests to make sure we get it right". "Hmm I probably need the whole sprint for that, we don't have a clear view of the impact, it sounds like a small change but it uppends our current architecture".

These are the real discussions, the numbers are a silly abstraction on top of this and are unnecessary.

It should also be 100% expected that the estimates will be wrong. New bugs will show up, regressions will be introduced, requirements will change last minute. It should also be expected that some tasks that were supposed to get done won't be and other tasks that weren't in the planning could be snuck in. You are "Agile" are you not?

If you're not than just go back to waterfall and stop dragging the Agile Manifesto through the mud, thanks.

The entire point of the exercise is the discussion, it gives you some idea of what is likely to go well and be finished and what is a risk factor. If you're measuring "velocity" god help you.


I think some of the problems and issues we're discussing here derive from confused project management methodologies that call themselves 'agile' yet require detailed estimation, measurement and reporting of task timing.

Story point allocation can be useful to give a quick and easy 'good enough' estimation of time/effort required for a significant chunk of work (epic).

I find that this approximate approach is almost always more accurate than trying to estimate every little task.

If the project manager and engineers try to break down a project into small granular tasks with time estimates then it's almost inevitable that the effort will be underestimated because it's virtually impossible to anticipate every sub task, blocker, unforseen delay, etc (and then there's the extra time it takes to manage all these micro tasks in your PM system!).

In such situations the old project manager trick of doubling all estimates tends to provide a more accurate timeframe.

This is why story points can be more accurate: because you are estimating the effort it takes to do something relative to your previous experience of similar workloads.

So, if you avoid estimating granular tasks and keep your estimates as the approximate amount of effort relative to something you've done before, then you will end up with a more realistic timeframe. Story points can help with this mindset. Also your team will not have to waste time faffing around in Jira too much, or whatever system you use.


To me story points, planning poker, scrum, and a bunch of other "agile" artifacts are funny concepts.

Making 4-5 highly paid professionals sit around in a circle with a bunch of cards trying to estimate how long/complex/??? the task is, writing it down, and doing that every two weeks, to dubious results, is exactly what one can expect out of today's businesses.

My guess is management doesn't trust that engineers aren't just fiddling around, so they hire a bunch of management-like people to watch over the drones, and show them who's in control. And how else you can show you are in control if you don't introduce magical rituals and tracking useless metrics, and make grown people participate? That way, all these meeting rooms can be occupied and the owners can feel like something important's happening and that they are getting their money's worth.


The fundamental issue isn't addressed but it sure is hinted at. Scheduling is an NP problem. As queue size, or backlog, or (insert thing here that tracks work to be done) grows it takes n^p calculations to schedule it optimally. This is hit on when it mentions small teams hitting their estimates. They can do this because their task list is small enough to go through all the permutations and actually come up with an accurate estimate. The only way to keep an n^p problem under control is to divide and conquer it. The leafs of that process must not go beyond a fixed size and the task divisions can't be recombined prematurely. Everything else is just yet another management idea that that will fall apart when the task has too many pieces. Once agile or any other management methodology acknowledges the fundamental mathematics at the core of things I may actually take them more seriously.


> Scheduling is an NP problem. [...] it takes n^p calculations to schedule it optimally

That isn't what NP means. (An O(n^p) algorithm would in fact be in P.)


yep. I wrote without thinking. c^n would be better right.


I've always thought of story points as ideally* working like a kind of international currency exchange, where it is normal and expected that Team Teal Dollars will not have any permanent or consistent relationship to Team Maroon Doollars nor to Actual Time. (The saying "time is money" notwithstanding.)

The points will inflate or deflate over the course of months, or even abruptly change with team composition or shifting to new technologies or even vague morale issues. All that is normal, it captures important facets of work, and trying to stop it from happening only creates other problems.

What matters is that somebody is looking at near-past behavior and using it to make a near-future estimate.

* If someone tries to mandate a fixed arbitrary correspondence between points and real world time, that would be a non-ideal scenario for various reasons.


I'm not sure I agree with the framing of this article.

It seems to be comparing two different things, 1. Points as a method for attributing difficulty to a work item, vs 2. The collaboration process of creating a task list as a method of breaking down work.

Running a team, I found pointing useful, more useful than the individual line engineers did, as we had different goals. My goal, in optimizing the team, was to go through all our work, get everyone in a room and create a relatively optimal plan for the next couple of weeks.

By going through the exercise of pointing, we often found that something one person thought was hard, another found easy. That act of estimating would reveal knowledge one person had that another didn't, that made the task easier. Without that process, the work might have been harder do do, because the easy way was never revealed. We also adhered to a 8 points is too many philosophy that meant any item that hard needed to be decomposed to simple tasks to repoint.

The "queuing" section basically implies the planning process should decompose tasks all the way to nothing but 1 point stories (or at least similarly sized work items within some variance). It's basically the same process as pointing, except not calling out some things are chunkier than others because it all comes out in the wash.

TL;DR this articles definition of queuing is basically pointing where task = 1 point items.


But my managers say I have to allocate story points! I'm sure they know what's best and not just making me jump through hoops because of their artificial imposition of JIRA under the title of being supposedly "agile"! Right?


Remember, for any project -- software or otherwise -- in any business, the business is going to need to know the answers to two key questions:

* When is it going to be done?

* How much is it going to cost?

You should therefore estimate in units of time because you will be held accountable for time. (And time is money.) This is straightforward if you break the work down into components of manageable size beforehand and prepare a bill of materials itemizing each component that the final product will require. Using the BOM approach means that completion progress for the system as a whole can be tracked in terms of which materials are complete and ready to go, not just a number or percentage of how complete someone thinks it is.

Now what if your team gives one set of estimates and the estimates are off? Well, what managers have found is that usually there's a pretty consistent ratio between the time a programmer estimates a certain task will take vs. the time it takes them to actually perform a task. Therefore, the correct way to go about time accounting is, for each task, the developer should give a time estimate before starting the task and record the time it actually took them to complete it when they are finished. The ratio between the two, as it stabilizes over time, will give project management increasingly good information about how long it will actually take a given developer to do a given task, based on their estimate, and enable them to calculate schedules accordingly.


This is a problem from a management perspective, not a developer one.

Story points are usable, but they're a "Wild Ass Guess" metric, not a true metric of actual work, and treating them other than a guess is fraught with problems. A 13 point story could fall somewhere between 8 and 21.

Some stories we guessed 8, and ended up being more like 21 (because the dev had to fight unexpected politics, or the API was billed as being "easy to use" but anything but.)

All these are fine and should be fine to developers, except to management.


In this example, example task sizes are assumed to be the same level of effort. That's not necessarily an easy thing to do, and sort of falls into the same problems that pointing out stories.

If your tasks aren't the same level of effort, then one task could take a week, and then next takes an hour. The author seems so sure, I'm almost sure I missed something, but unless you can reconcile how you can have 1 hour tasks and 1 week tasks and consider them the same, well... I'm just confused.


The problem with the proposed solution is that many teams can't accurately break down tasks small enough. And, even those smaller tasks have variability in how long they take to do. What I've seen is that most teams don't really know how far "stories" break down into their simplest tasks until they actually do the work.

So, that puts us back at square one, for the most part. A story point may not tell you a lot, but tasks in this form often present the same problem.


Story points never make sense to me.

It's not only because of the schizophrenic process of estimating tasks' complexity (not time) to understand how many of them can be added to a two-week time window.

In teams that focus on points:

1. you spend time giving points to tasks in a meeting with the whole team

2. to calculate how many points can be added to a sprint based on the points given in the past, without any adjust for the cases where the points allocated to a task were wrong

3. to determine the group of tasks that will be assigned to each developer

4. to finally, discuss why the number of points assigned to a sprint was not fully delivered (it's rare to find sprints where the tasks are delivered earlier; Parkinson's law explains that)

It's unlikely that any stakeholder would prioritize a team's precise task estimation over the rapid delivery of features or projects.

In this sense, a kanban process will have the same outcome with much less energy spent, and small tasks will give you the same statistical value without spending hours estimating "complexity."

The Sisiphisean job of trying to assign the correct number of tasks to a sprint is only there to comfort team members who don't want to consider the actual value of the processes they use.


The proposed queue system seems somewhat orthogonal to story points to me. You could easily do the queue thing with or without story points, or vice versa.

With the pure queue solution, you still need to make sure the tasks are all "small". The problem of defining "small" is the same problem that makes story points so unreliable and confusing. So I suspect the kicker here is just having a group of motivated and competent people.


Regardless of what anybody says to the contrary, story points are and have always been used as an obfuscated measure of time. When we can never estimate plan timelines correctly in the first place, why would anybody think that we should be able to measure story points with any accuracy? It is just another stupid PM gimmick to create useless management metrics.


> what if we substitute something else for story points?

Then that new thing becomes the new measure, you change what you do to meet that measure, and things are screwed up again.

The author's final paragraphs describe implementing a complex process that an intelligent team has to use in a nuanced way. And somehow simultaneously declares that the original problem was story points, one single aspect of a complex process, and not the fact that there's a complex process that nobody understands or follows correctly.

You know how Toyota gets TPS to work so well? They do one thing, well, at a time, repeatedly.

Product development sucks because it's trying to do a million things once, constantly changes its mind, doesn't train its workers, and conflates designing, engineering, assembling, and operating, as one giant "thing". Then it wonders why it can't keep track of time.


I don't really understand why the author needed to use all those words to suggest t-shirt sizes instead of story points.

All the stuff about breaking down tasks, watching a backlog queue to monitor cadence, and have regular meetings is already happening with, or without story points.

People overthink this stuff all the time. Every team should figure out what works best for them, even down to project by project. Getting shit done isn't hard to monitor. You have a bucket of well defined tasks, have sprint meetings, look for blockers and assumptions, watch work flowing through. It's not really that difficult. Whether you use story points or some other estimation tool is really just an exercise of calibration, it's not gospel. The estimation process is the important thing, discuss as a team, agree on complexity, make sure the task is bite sized etc.


What about function points? (https://en.wikipedia.org/wiki/Function_point) They are basically just lines of code, but LOC has a precise definition and is not just "whatever it takes" like a story point.


I'm not sure I get it. If you're doing story points, you are doing big-A Agile yes?

In that case, you absolutely always have an impossibly long queue. The things that don't get done keep going into technical debt, which as a rule doesn't decrease. Until at some point you declare technical bankruptcy.

If you reduce this to just the queue for the current sprint, that is generally a smaller size, the queue length is determined by ... the story points.

So to manage the queue without story points, your job is to break the stories into equal-sized tasks, so that the queue size has a meaning. However, you cannot break a story up that way (this requires estimation of task complexity -- the premise of the article and our own battle tested experience is we don't know how to do that) and even if you could you run into the infinite queue as I mentioned.

> I am certified to teach it as a SAFe Practice Consultant (SPC)

:sus:


I find the requirement to give a note together is a good insensitive to stay focus on what the topic is, try to grab it, ask questions to start to put some lights on the biggest dark corner of the task.

I couldn't care less about the resulting quantification. What matter is how well we communicate in the team, how helpful we are with each other, how well we can progress and keep motivation where struggle with huge hindrances paving the way. The points are not the point, but they are not pointless.

The map is not the territory. Everyone can have a different map and a different metric system if any. If at the end of the day people inhabiting the territory do it in a satisfying way, all good. It doesn't matter much what the plan on the paper might look like: it will always be a mere epiphenomenal artifact of the actual human processes at play.


Story points are UBI for Product Managers


> Now, if you're paying attention and reading this with a critical eye something in your brain just told you "Wait, this is just waterfall! We can't know everything up front!"

The article presents the "story points" problem to be caused by an ignorant or presumptive misunderstanding of how the system works by outside parties.

I think articles like this are an interesting and necessary part of the overall project-managent discourse. However, they tend to infer that the problem of ignorance can be resolved by the management framework directly, to which I disagree.

Ignorance, regardless of project-management, has to be addressed by clear communication and boundaries.


I am reading this slowly and there’s some good stuff in here, but it’s really long and I don’t have 1-2hrs to sit down and read it properly.

I liked the suggestion of having a dedicated architecture team to break things down into work items.

Comment your favourite parts/highlights?


I struggled with the length but couldn’t find a good way to shorten it more unfortunately. That’s the main reason I included a TLDR.

Thanks for giving it your time though.


You could trim down the parts dunking on story points a lot. It starts to feel redundant and most of it is dubiously relevant. If we care about the topic at all, we probably already have an idea of what story points are and what's wrong with them, and are mostly interested in your proposal to replace them. You only need to refresh our memory about the existing system, and maybe link to other critiques.


Fair critique. I was mostly trying make a thorough case first.

I’ll definitely have a followup focused more on queues.


In my opinion, I think that assigning days to story points and breaking down task/sub-task to fit within a max of 5 days solves most of the problems. Yes, every task can't be estimated correctly but it does help in tracking things.


I think if you have a clear and compelling vision, along with motivated people, story points, scrum, etc. can all work well. If you don't have that, then no process works. So, get the former and don't worry too much about the latter.


Story points are in fact time, and I'm tired of pretending they're not.

You can sugar coat it all you want and say they represent complexity, but at the end of the day(or sprint), the higher the complexity, the more time it takes to complete.


I don't think anyone is really saying that story points aren't time. It's just that you don't know what the story point <-> time conversion factor is until your team is calibrated.


Fixing the machine at the factory can be quite complex, but you might have a technician in and out in a day or two to get it done.

Assembling 5,000 identical widgets is not complex, but it might takes you weeks or months.

Complexity and wall time occasionally move the same way on the graph (generally with wall time climbing much faster than our view of complexity), but they’re not necessarily or always so.

I tend to explain “complexity” more in terms of “at what skill level of employee would we stop seeing substantial gains in quality/speed/maintainability/etc when we assign this work out”.

Something that a senior could do substantially better/faster than an intermediate is “high” complexity. Something that the intermediate could do substantially better/faster than the junior is “medium” complexity.

Adding some fields to a form is low complexity—an intermediate or senior won’t do a substantially different job than the junior—but doing 10 fields versus 100 fields will change the amount of time it takes quite a bit. Architecting a new service will see gains to senior and beyond and is high complexity but may not actually take all that long.

Ultimately, this boils down to “how many decisions remain to be made”. Most tasks can be made lower complexity by making those decisions in detail. “Rearchitect this module” becomes medium complexity when someone turns that into “rearchitect this module following X pattern” and low when someone turns it into “move methods A, B, C into new class X and split method D into E and F along this line”.

This view of complexity doesn’t directly drive wall time, but _does_ very directly impact the variability of that estimate. The more decisions remaining and the more unknowns up front, the wider the range of possible outcomes. Reducing the complexity will reduce the range of estimates.


The problem therein is that that time depends on who is given the story. Perhaps that should be part of the estimation itself? (Who is working on it.)


Don’t the tasks in queues have the same issues with the tasks from story points do? What if the tasks are not defined well enough? So some tasks in the queue might take a day to complete, another could take months?


A task that takes months is a project. I view tasks as something I could get done in one day with full focus. Then at the 2 day mark I reevaluate. The additional day can always act as a buffer.


Story points seem to be a reaction to a process that doesn’t include the proper amount of time needed to actually make a proper estimate. They seem designed to obscure how long something will take to accomplish. I think people like this because estimating is really hard and obscuring is seen as the solution.

The alternative approach is to do more detailed estimates. In addition to any design work this can require time boxing a “spike” to better understand things. This approach works well for estimating but leaves the sprint uncommitted until the spikes are complete.


Story points don't help me do my job. They do a great job of wasting my time and making me less productive. It's not just story points, but the whole agile/scrum methodology feels terribly wasteful. The gross amount of time we waste setting estimates, refining stories, sprint planning. To what end? So management can make some meaningless charts? So that the "decision makers" can use these contrived numbers to figure out who gets cut in the next round of layoffs?


My experience is that when used with cross-team members as level of effort, discussed with each story and receives a high confidence vote (fist of five), then measured against each team members capacity (measured over several sprints), story points serve a valuable way to manage work.

They do not equate to hours and cannot be boxed by management expectations.

Management can separately track actuals (hours), but that really should never concern the team.


Very interesting take and definitely an article I will come back to, but in the end, the size of tasks is relative, and the author counts them. How's that better than counting story points? Feels like a lot of folklore for a similar result. In the end, like with story points, what matters is the effort put into splitting a problem to better understand it and plan accordingly.


Project Management should be called Project Leadership, and the tools of the former should inform – but not dictate – the decisions of the latter.


> Scrum tried to fix this in 2011 when “Commitment” was changed to “Forecast”

I've checked scrum guides released in 2010 and in 2011. The word "commitment" does not appear in the pre-2011 version. And, as far as I remember, the word "forecast" replaced the word "estimate" in the 2020 edition of the scrum guide compared to the 2017 edition.


In my world, on my team - everything is a 3. and if its not a 3 lets figure out why. IE, lets have a fairly sized piece of work we scope for most tickets. If its bigger than that, lets discuss it, see if its worht breaking down (if its larger) and if not lets just agreee thats a larger piece than 3. that way we can just keep an eye on relative size of issues.


That's pretty close to what the article is describing, if I understand it right. They functionally define "tasks", the ones put in the queues they suggest measuring, as bits of work small enough that most of the uncertainty is gone. So until that's proven wrong (when reality smashes a task into a bunch more tasks), it's more-or-less equivalent to all your stories having the same number of story points.


My team doesnt really have story points. But we work kind of similar to you: is this too big? If it is, lets see if we can break it down to smaller tasks. I think it works much better than arguing over story points.


Story points are primarily a social tool - senior management politics requires that something gets measured; story points make them happy and provide a quantitative way of communicating with the rest of the business. Until we come up with a way of measuring programmer productivity - which is not a queue - there is hard to improve on using story points.


“Which is not a queue” why not a queue?


Ive been in lots of teams using story points. What I've found is that what one point is worth is never consistent between teams. A 3 for one team could be a 1 for another. Or a 1 could be 0.5/0.25.

In one team, 3 points might be equivalent to 3 days work. In another it might be 1 day.


Yeah, and thats fine, there is literally no need to have them consistent between teams.


What stops management from committing Feature A to be delivered after 250 tasks are completed? Yes the fact and the reasons that the Feature A expands to 500 will be documented, but who will care about it? This will still be a "delay" from the management point of view.


To quote Douglas Adams: I love deadlines. I love the whooshing sound they make as they blow past.


Estimate using time (hours, days, weeks, months, years).

Reflect your uncertainty in your estimates using confidence intervals.

If your confidence intervals are too wide, break down the work and estimate the smaller tasks and/or spend some time doing the work necessary to narrow your confidence intervals.


I can estimate rather accurately how much time it will take me to cook a meal assuming that:

- I already mastered the receipt as I prepared the very exact same one many times in the past

- I already checked that I have all the ingredients required at hand

- I will cook in my own kitchen

- no entropy engine (aka family members) was thrown before in the kitchen to put utensils in an other place than the place were I tidily store them (admittedly without conducting formal formation of the rest of the disruption forces)

- no one will interfere because the way I do it is not the super fancy other way in which it could also be done

- no external catastrophe happen which obviously need to be taken care of immediately, just at this point in recipe where by the time I will come back most of what I did so far will be better started again from scratch because the underlying laws of physics I was assuming so far changed so significantly that the left ongoing work is now utterly incompatible with the divergences that universe bumped to

Meanwhile, in the easier transparent stationary landscape of software development, whenever I came with a to-my-mind-credible estimate for a non trivial endeavor, it was systematically rejected and superseded by a soon-to-be-blown-away deadline.


But what exactly is a "task"? It still seems like an arbitrary unit.


Well history in IT repeats itself in cycles. look up #NoEstimates for good and thoughtful content about this topic, and various anecdotes for inspiration. then derive your own conclusions.


Are there any companies that have no estimation no pointing etc just figure out what needs to be implemented and implement it type situation?


Startups that have their spec aligned to chaotic good.


You forgot to add "on bro"


`Measure queues` like `Work In Progress limits` from Kanban. Cleverly the article dodges mentioning those terms!


From when I was doing kanban there were, very intentionally, little to no forecasting done. We broke up a feature a bit and started chugging along. Every question from mgmt like "when will it be done?" was met with a "we are working on that right now, so asap" and that flew. However that was a stable product that was mature and had quite a few customers paying an annual license that were mostly happy, but of course appreciative for new features. Right now I'm working on a bigger product that fails to capture customers in a highly competitive sector and everything is about time and estimates. Middle mgmt is pretty stressed out and the latest bid is to estimate tshirt sizes. But really, it should be translated to time/increments.


Completely missing the point, which often happens with xor thinking. The beauty of story points is that they rely on human ability to compare things quickly. It's almost instinctive. ex. You're being chased by a rhinoceros through the jungle. You come upon a tree with branches and a boulder with handholds. 2 seconds, which do you choose?

Story points are just a warmup for more elaboration, not an xor decision. This article is making a single-level decision, which completely misses point of using story points. The work-effort really required will be discovered in more elaboration. Story points just give you a live or die measurement, that's it.


Could somebody explain how queue theory fits into agile processes? Where are these unmanageable queues, that need to be emptied coming from?

Queue theory only becomes a problem when (1) stories are being added to the active story queue at a furious rate; and (2) nothing gets shipped until the active story queue is emptied. I don't think think either of those things are supposed to be true in an agile process, especially the last.

It sounds suspiciously like a symptom of gamification to me (if new stories are being added by the development team). Or a broken process where field defects (which are supposed to go to the top of the queue) are so numerous that they completely overwhelm active development, which is an entirely different issue, requiring an entirely different response from management.

If story points are so wrong that it no longer fits in a sprint, it seems reasonable to split the story. In my experience, I don't think I've seen it happen more than a handful of times. How often is an initial story point estimate so wrong that it has to be revised upward to the point that it no longer fits in a sprint? If it's wrong, but still fits in a sprint, just do it. It makes me wonder whether there's gamification going on around using story points to evaluate developer performance.

(Assuming that stories are converted to tasks at the start-of-sprint meeting).


So basically a “kanban board” with priorities attached to each unit of work. Gotcha


Or just lose Agile altogether


Good lord. Get to the point.


The system doesn't matter per se. What matters is that you have one


In my experience people mainly like to complain or exist within a state of conflict where they are against something.

Story points are relative value as defined by the team, for the purpose of forecasting and reporting. They really can be anything the team decides, as long as it's a reliable measure. Ultimately they exist to answer"How long will this take?", which is the most dependable question from people writing the cheques.

People asking that question don't care if it's Fibonacci or hours or t-shirt sizes or anything at all.

"Is x bigger (or smaller) than x", followed by "is this more important than this" should be brain-dead easy and logical for everyone involved.

In my 10+ years I've learned that it's ultimately just that people don't want to be told what to do and/or need tribalism (an "other") to feelat balance with their environment.

There's no escape from prioritization and sizing. Throw scrum in the garbage and you'll still be doing it by another name.


The point of all of this is to communicate to management how long a task is going to take.

I would suggest that a good manager doesn't need this communication.

A good manager will have already done the task, or something similar, to already know how quick it can be done.

If they haven't, they're probably inexperienced in the task. They were appointed because they were a "people person" but this fact has actually introduced more friction to the team (needing the story points yada yada) rather than just assigning a simple deadline from the get-go.

Once a deadline is introduced, Parkinson's Law kicks in and you get more or less efficient work depending on how aggressive the deadline is.

I'm advocating for experienced managers, not deadlines per se, since that is just 1 tool in the toolbox.

I'm also advocating against the countless asinine ways to communicate to lousy managers. Throw the points in the trash and start hiring cracked programmers as your engineering team's standard bearer.

See the company grow and have happier employees.


I’ve never had a problem estimating projects using time units.


We should be thankful that it's commonplace to use points instead of units of time like days even if points are horseshit. Estimates aren't that useful most of the time, and the last thing I would need is a manager telling me they're concerned because my ticket took 3 days instead of the two that the whole team estimated.


Yeah sure everything eventually averages out with enough sampling, and maybe queue based forecast converge faster than velocity and point based forecast.

Why? Don't know, whole article reeks of gut feeling, which is strange as data should be available since the whole point of scrum masters is clerking things out.

The problem with a queue system is not in the averages tho, but in the exceptions: say sprint or project is late, which features give you the largest impact when moved around or canned? If the customer wants to reduce cost, how do you give visibility of feature effort? Or are we in the toxic agile version with fixed scope, cost and deadline? Because then just ditch the overhead and waterfall your way trough

OH and btw this system is not gaming resistant, people may pick the smaller task first to meet queue processing frequency, and then you'd get massive frequency drop at the end.


... but $OTHER_TEAM is going through 20% more points each sprint than your team.

I hate story points, in fact I hate most automated issue trackers, generally speaking. I don't mind that there's a backlog and that features should have some details. But I'm now stuck in an org where we're doing SAFe/PI planning 6+ months ahead of where the devs will actually do the work, and designs that likely won't represent the true solution in practice among dozens of teams in a very large company.

It's kind of insane.


The post is correct about some problems but completely wrong about solutions. Breaking down tasks that are already sub-sprint-sized in a meeting is the wrong way to do it; you'll put the boundaries in the wrong place and end up duplicating work, and then going overtime when the integration stage takes longer than you thought it would. The right place to do that breakdown is agile, just-in-time: give one person or pair/mob responsibility for a piece of user-facing functionality (that they've already agreed is sub-sprint-sized) and let them do whatever breakdown makes sense for that. If they get confused or stuck they can always raise it in standup (that's why we have them!).

Similarly:

> What happens when the team has turnover? What happens a few months down the line when this work comes back up based on the points that were given previously?

Why would you ever put points on it a few months ahead of time? You do estimation in the sprint planning when it's a candidate for that sprint. There's no need to write down the reasoning for the estimation because the estimation is only relevant for the duration of that meeting (as you prioritise stories for that sprint), and maybe in the retrospective two weeks later if the estimate was way off.

I can see the argument for t-shirt sizes. The "queue" idea is the opposite, and has all the problems of point/time estimation.

> When anyone not directly involved with the project asks why it's taking longer than they thought, the answer will be spelled out in the tasks list. These changes were added on these dates, for these reasons based on this feedback from these people. There is no "your estimate was wrong" situation. There is no "re-estimating" process. There's not even an ask to approve if you can change the point value. It just happens.

Guess what? They're going to ask for dates. They're going to ask why the estimate changed, and not care about the answer because they just want to blame you for their estimates being off. And your "tasks" have just become the same time tracking that you were (rightly) scared of; you have the same problem of having to do a bunch of pointless busywork to justify that you were actually working. (Suppose a "task" is suddenly twice as complex as you thought it was; now you've got to file a second "task" with a fake description to justify why you only did 3 tasks this week when Bob did 4).

The problems the article identifies are: spending too much time and effort on estimation, estimating too far in advance (and then having the team and/or circumstances change), and treating estimates as deadlines. These are all real problems. But they're not problems with story points (indeed story points are actively helpful on the last one, since everyone has to at least pretend to admit that story points are not time estimates), and they're just as easy or difficult to solve whether you use story points or something else.


> Why would you ever put points on it a few months ahead of time? You do estimation in the sprint planning when it's a candidate for that sprint.

Because people (customers and managers in particular, but not just them) want to plan ahead, they can't escape the optimistic (and usually wrong) planning mode of BDUF projects. They fear uncertainty and want to know, at a glance, how long the work will take based on their current backlog/queue/whatever. Customers don't like to be told "We'll deliver when we deliver" so managers (salespeople) give an optimistic schedule now based on today's staffing (and optimistic assumptions about future staffing levels and future staff abilities).

If they'd spend 5 seconds thinking they'd realize they can produce and deliver most (but not all) systems in an incremental fashion that will satisfy the customers while leaving key decisions and estimations to be made when there's enough information to actually make them. But that takes 5 seconds and that's too damned long.


People take these kinds of things way too literally. There is no golden solution here. What gets repeated over and over continues to be true: teams should choose a system that works for them. And ideally that system is measurable, so the team can evaluate progress, improve its own performance, and align itself better with other teams and the business.

But in terms of scrum and points here's my take:

I've seen points work on some teams and not work so well on other teams. It's imperfect, but if you just accept that, you can make it work quite well.

The reason it's helpful to estimate complexity as opposed to time is that people with different experience levels would give different estimates based on their abilities. Complexity allows you to rally around a common understanding of a solution regardless of how fast one team member might be able to complete it versus another.

Does complexity have some relationship to time? Absolutely. Everybody knows this. That doesn't mean that we should be using time instead.

So how can a team estimate accurately? You will hear from some people that their estimates were wildly off or that it's impossible to estimate a project or they felt pressure to under-estimate. If your estimate is too broad, you need to do the mental work of breaking it down into smaller chunks that are easier to estimate. If you feel under pressure to ship on an unrealistic schedule, that's not a points/scrum problem. But the "it's done when it's done" is also not realistic either.

The idea that the estimate has to be 100% spot on is also not true. Again, it's imperfect and that is ok. But you'll find that the better a team knows their codebase and knows the product, the better they'll get over time at estimating. But if the work is too vague, the team should push back until they have enough information to more accurately break things down. This process makes for better software, especially when the team does it together.

Another missing aspect I see a lot is having a feedback mechanism. If you as a team are discussing why a task took longer than the estimate, or track metrics over time, you can all get together and figure out where problems on the team are. For example: maybe there are too many bugs that are hindering product work? Why? Maybe you're moving too fast vis-a-vis the expected quality bar. Some sort of feedback mechanism (e.g. retros) is crucial - the team as a whole should aim to deliver what it says it would and understand why it couldn't.

The whole point of these things is that as a team you can deliver consistently not more speedily. Consistency comes before speed. The other important thing is having a way to continually improve. You want to use each sprint as a way to measure the team so it can get better.

When I've seen teams that did this well, they were dramatically more productive than the teams that didn't do it well.


Given the size of my scroll bar, I rate reading this article a 13. My PM has decided I shouldn't bother reading it, as the likely value isn't worth the effort, because whenever we take on a 13, it ends up dragging on for like 4 sprints, and preventing a lot of other higher value/lower effort stuff from getting done.

Maybe at some point in the future, we'll break the article and have a few people on the team each read it part of it for lower effort, and then synthesize their take aways.


I feel personally attacked by this comment. This hits so close to home. I think I have some personal reflection to do.


Totally get it. :-)

I did include a helpful TLDR near the beginning with the highlights though.


I found the tldr really confusing and next-to-useless. It’s a really really long article and I just wanted to read about queuing since yeah story points can suck.


I’ll see what I can do to improve it.


This article was not written for the reader, but written because the writer loves to hear himself write.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: