We have known this for a long time, but it is hard to have an alternative system which scales well with huge organizations. For startups and small companies, I could see an informal system working pretty well, but as the company grows to hundreds or thousands of employees, it becomes necessary to standardize and have some kind of metrics used for reporting and evaluations. This will inevitably shift the company's culture towards gaming those metrics.
Somewhat like grading systems in education. High grades don't necessarily mean you will be capable of generating more value to society than average grades or even low grades. And students often become good at improving their grades without that actually adding much value. But there is a correlation. And we don't have many better (non-experimental) alternatives that I'm aware of.
> This will inevitably shift the company's culture towards gaming those metrics.
Yes, but that's only the beginning of the story. People gaming metrics is a type of security problem, in that "attackers" try to game the metrics while "defenders" try to make them less game-able by improving the accuracy and precision of how the metrics are gathered so that the final numbers continue to tell a valuable story over time.
The issue isn't that metrics can be gamed; it's that organizations which pride themselves on being data-driven rarely make the investment in hiring blue teams and red teams to defend and attack the metrics. If you appreciate that investing in cyberdefense is key to protecting your company from cybersecurity threats, why can't you appreciate that investing in "metricsecurity" is key to protecting your company from "metricsecurity" threats?
I think this is a rubbish excuse made by people who don't understand the role of such a "blue team". If the organization initially adopts metrics that are counter-productive (e.g. measuring feature completion and not technical debt), it is the role of the blue team to change the metrics such that the neglected areas are properly accounted for in final performance metrics. No metrics should be final; only iteratively tuned to achieve results that are more and more indicative of the underlying performance.
It is difficult, but still possible, to measure technical debt and other "hard" metrics. It is precisely the job of the blue team to deal with that.
Just because you've made a team and given them the job of doing something (incredibly) difficult, doesn't mean you've actually solved that problem or even should expect them to solve it most of the time.
You're absolutely right that having a "blue team" is much better than not having one - but it doesn't mean that calling out the reality that many organizational activities can't be easily measured is a "rubbish excuse made by people who don't understand".
And how do you measure the person who takes time out of their day to help a colleague in another department, who is having a tough time understanding an issue, so the person takes 15 minutes out to help them, boosting morale and overall company cohesion?
Or should they be penalised for wasting 15 minutes?
First of all, not every company wishes to incentivize this behavior. Peopleware reminds us that phone calls and other real-time interruptions are big drags on productivity for knowledge workers who need to concentrate. Every time somebody has a verbal conversation to illuminate something unclear, is a time that it wasn't recorded into some kind of documentation that will help future people with the same confusion.
But say you do wish to incentivize that. You can, if you can track the medium of exchange. Take everyone's phone records, reward short conversations, but disincentivize conversations that are too short ("sorry, not now, bye") or too long (social chatting in place of productivity). If you can cost-effectively put it through some kind of ML classifier that could tell you if the conversations were helpful internal support, personal, etc., then all the better. Translate that into some kind of score and factor it into whatever formula that produces personal KPIs.
Not saying it's easy. Just saying it's possible, and it's realistic if you have a team whose full-time job is to come up with these kinds of solutions.
Do you seriously want to incentivize productive behavior during people's work breaks? Because that doesn't sound to me like something you want to incentivize (risk of burnout etc.).
No, I'm taking a quick break from my "work" to help a valued colleague solve a problem informally in a way that actually adds more value to the organisation than if I had been sat at my desk. I enjoy the human interaction, its good for my sense of wellbeing. My colleague has his stress levels reduced. We learn a bit about each other's jobs in those 15 minutes and a product is delivered a week earlier than it would otherwise have been.
On the other hand if this is a priority and incentivized, what’s to stop it from going too far, where employees would get extra credit for chatting for entertainment
Depends, do you want that to happen or not? You seem to be making an assumption this is good, but in some organizations this would be a bad thing to penalize. I'm not sure why you would do this in engineering, but it is important to acknowledge that this isn't a universal good and so maybe your company wants to discourage it for some reason.
Assuming you want people to help each other, you need to capture metrics on it. A few years back I had a metric of helping n people in a different department: I kept track of those interactions so I had something to report at the end of the year.
Was that your personal metrics? Metrics created for yourself are subject to less gaming because when you start lying to yourself about those, you will start to wonder why keep those metrics at all.
If that was a company-issued, top-down metric, I hope it wasn't defined literally as "helping n people in a different department", because that has enough wiggle room to sail an aircraft carrier through. The difficulty of creating a good metric here comes from the difficulty of defining what exactly does it mean, in company context, to "help other people" - and also what it explicitly doesn't mean.
I had to report it to my boss. It was top down, but only a few interactions were required, and it wasn't reported farther up the chain. Because I had to report to my boss, he knew me well enough to judge if it was enough. It was just enough of a metric to ensure people looked for something to bridge a communication gap, without being hard enough that people tried to game it much.
There is a philosophical debate underlying this. Take the analogy of a ML algorithm:
We know many algos are DESIGNED to be a black box, to be unexplainable. Red team iterates an incredibly effective algorithm that produces the desired outputs (with unseen risk built up as well). Blue team, in order to manage risk, is tasked with .. explaining the unexplainable process? An impossible task.
Is it possible that human/organizational processes can be also unexplainable?
In practice, you wouldn't really need to have a metric for the blue team, for the same reason you don't need metrics in a 5-person startup. Management is close to the blue team to replace being "close" to thousands of people, and because management is close to the blue team, can judge their output without needing a formal metric.
Maybe if you had an organization that was big enough to require several blue teams (a military or government?), then you'd need a metric for blue teams. Such a metric would probably compare the sub-KPIs of each sub-organization that each blue team was responsible for, including metrics on customer satisfaction, and warrant investigation if the metrics went under.
The blue teams can't really game that metric without the entire organization falling over, and if that happened, the executives would be to blame, not the blue team.
I know a girl who works for United Airlines' "blue team." It's a small (5-8 people?) inward-facing operations consulting group that reports directly to Oscar, the CEO.
It's composed of engineers and they analyze existing processes and create new metrics all day long.
They do worry about their own careers/promotions etc but the group is too small for there really to be any opportunities to "game" anything beyond basic politics.
I trust it's just habitual, but I think it's important to describe her as a woman and not a girl. When a company is casually described as being made up of men and girls, it sends an insidious message to everyone about who's taken more seriously.
The role of metrics is to grant visibility. Any policies governing outcomes, e.g. pay bonuses for high metrics, those policies are set and iteratively improved by people working off intuition.
Doesn't mean the metrics are bullshit, just means they're a tool.
The problem though is that executives need some set of a few hundred numbers they can use to track the state of a company. For just my job alone I could generate more metrics then that to properly characterize our state and problem space -- but then an exec would need to deal with thousands and thousands of numbers.
Sorta sucks, but that's how it goes. Good execs manage a sufficiently decentralized system, but they still need SOME set of summary numbers.
Good theory, but in practice challenging the wisdom of metrics that really important managers have put into place tends to be a career-limiting move.
The very important managers themselves should care about evaluating the metrics they impose, but typical unspoken manager performance metrics include "episodes of dissent" (addressed by discouraging advice), "displays of weakness" (addressed by threats and aggressive attitude), "time management" (addressed by not thinking matters through), and so on.
>why can't you appreciate that investing in "metricsecurity" is key to protecting your company from "metricsecurity" threats?
Because if I'm a C-level or EVP-level person responsible for this type of decision, why would I want to spend money on a team of people fighting against my ability to get a big fat bonus?
> ...but as the company grows to hundreds or thousands of employees, it becomes necessary to standardize and have some kind of metrics used for reporting and evaluations.
Does it? Is it inconceivable that each part works to its own goals and metrics consistent both with its own values and those of the wider organisation?
Trying to group 1000 people together and measure all of them, sure that seems insurmountable for an informal system.
Taking that same group of 1000, splitting them into subgroups of 7 and giving those individual groups their own goals and autonomy to pursue them may again allow for informal performance measurement.
> Beyond a certain number of people, absolutely it is.
that's because of the lack of trust, and the flow of responsibility and control.
If you structure a company such that the employees themselves has to be responsible for their output in such a way that higher output leads to more money for them, you'd not have this problem. For example, contractors who work on a results basis.
I think you’d prove the problem though because everybody would want to work on projects/bugs/teams that have clearly measurable results. I work at a company with clearly(ish) defined “this is what a Level X engineer does” so you can measure yourself against the current and next level for promotion’s sake. I’ve mentioned to my manager that I struggle with authentically choosing projects/tasks, I want to work on things that I see as valuable to me and/or the company but I also want to be promoted and some projects are clearly better promotion material but are not necessarily as valuable... I strive for value but probably overthink the situation :p
> everybody would want to work on projects/bugs/teams that have clearly measurable results.
so let that happen. Those who can't can leave, and see how the company actually fairs. When they find that some crucial roles aren't taken and the company starts "failing", then they will surely admit that said role is needed, and reward it accordingly.
But perhaps there are indeed roles that aren't useful, and therefore, actually could've been eliminated but for the stigma - so may be this is the way to go forward.
> employees themselves has to be responsible for their output in such a way that higher output leads to more money for them
But what metric would you use to measure output that solves the gamification problem?
Even for contractors or sales people (where you could use the sales volume), this could lead them to favor short term results and compromise the long-term health of the company (e.g. by favoring quick, low-quality solutions by contractors, or selling features that don't yet exist and creating unsustainable roadmaps by sales people).
You could reward them in the same way that executives are increasingly remunerated for their performance-based pay - in shares or other financial instruments with an enforced holding period that are linked to the health of the business / business unit as appropriate.
While that's an interesting approach for compensation which might mitigate knowingly bad/irresponsible decisions, it doesn't look like it would address the core issue here of having to choose a metric to base compensation on.
Maybe the gaming effect would be lessened by that compensation approach, but at a very large scale org, I doubt that it would. Although, it would be interesting to see real life studies of this, in case such practices have already been tried out.
>that's because of the lack of trust, and the flow of responsibility and control.
I currently work for a company that has been transitioning from being tiny (I was employee #24) to pretty big (we're close to 60 now). One thing I've learned, much to my disappointment, is that once you get past a certain size it gets harder and harder to recruit people worth trusting with that kind of responsibility. The supply of such people is too limited, and they tend to get poached quickly.
>If you structure a company such that the employees themselves has to be responsible for their output in such a way that higher output leads to more money for them
Oh so all you need to do is fairly and reliably measure "output" in an ungameable way? Easy! /s
50-60 is a transition point for small businesses. That's the point at which it literally becomes impossible to be indifferent to process or structure.
Prior to that size companies can kind of get by on luck, skill, or the "heroic efforts" of individual contributors to carry them along. Once you hit 50 FTE that approach starts to fail more and fail harder. This is why tons of small businesses flame out when they hit this threshold.
500 is just getting into medium size. I'd say you need to approach 10,000 before you can say large. You need tens of millions before I could call you huge (Ie a country, depending on how you define organization might need to be a dictatorship to count for you)
I agree with your point, but I think your scale factors for size are off by orders of magnitude.
> You need tens of millions before I could call you huge
That seems like an absurd standard to me. The three largest employers in the world (by number of employees) are 1. The U.S. Department of Defense, 2. The People's Liberation Army (China), and 3. Walmart. Each of these three largest employers in the world has between 2 and 3 million employees, which is much less than your standard of "tens of millions".
Thanks for bringing this up; I was a bit disheartened that an article basically talking about Goodhart's Law doesn't mention it.
EDIT to add: To be fair, the author's book (on which the article is based) mentions the term twice (and twice in the references). Still feels like this is inadequate, though.
As I read through the 50-odd comments, I wound up with a kanban / 5 whys level thought. All these points of discussion revolve around pluses and minuses of metrics, but the next level is "Why establish metrics at all?"
At a deeper (very generalized) level, we've been infected by production line, Deming style, efficiency focus.
If you are handling previously defined routines, great. Optimizing can help you.
As soon as you are working with adding value, be it in code, education, product development, metrics are all premature optimization.
Promote the idea that you can't apply optimization techniques to creative steps and encourage managers to get off the numerical crutches and make value judgments. Takes a better manager but then that's the point anyway, right?
"we've been infected by production line, Deming style, efficiency focus"
Could you say more about this? I'm trying to form an opinion on this Deming-ism. My impression is that the actual Mr. Deming gave different advice: "Eliminate slogans, exhortations, and targets . . . Eliminate quotas, numbers, numerical goals."
Generally I think the problem isn't in goals and measures, but in poor goals and measures. Deming said "The aim defines the system". So if you focus on short sighted metrics you will define a poor system. Metrics and reward systems need to be carefully considered and well thought out with regard to the larger scale system outcomes.
That, in turn, tends to have a lot to do with the incentive structures of the person coming up with the metrics.
If you've got someone who's been tasked with improving large-scale system outcomes, and they're developing metrics that are meant to be a means to that end, then you might expect them to come up with a well-crafted set of metrics. Of course, you can only expect them to implement those metrics to the extent that they're able to alter the system in a way that allows for collecting them. The best plan in the world isn't worth much if nobody has the power to execute on it.
If, on the other hand, they're being designed by someone whose task was, "We need to be more data-driven; come up with some metrics and give me a dashboard," they're going to do exactly that. No more, no less.
If it were only premature optimization, it wouldn't be that bad. But it tends to lead to actively counterproductive optimization. Teams get incentivized to do things that increase their KPIs even when what they're doing delivers negative value.
For example, I've definitely seen this happen on Agile teams where management makes KPIs based on velocity or ticket flow: Suddenly, lots and lots and lots of waste and technical debt are generated due to implementing unnecessary things. Because unquestioningly implementing tickets as fast as you can increases velocity, while spending time to stop and talk about whether or not something is a good idea, or can be solved in a better way, decreases it.
Metrics are defined by committee, which in turn answers to another strategic leadership committee. No single person is responsible for failure. Mediocre managers recite the entrenched KPI mantra because it cannot fail.
2: Differentiability.
If you can't measure how well you're performing, you can't tell if you're improving or regressing with any given change. Total revenue or other coarse metrics are so far divorced from any given decision that they can't provide feedback about it. Just like gradient descent needs the underlying neural network to be differentiable, a business strategy needs the performance to be differentiable. KPIs are an attempt at that.
Ultimately, KPIs suck but give a fairly predictable average outcome. Not doing KPIs is a risk that most leadership is unwilling to take. KPIs sit in the same category of concepts as airport theatre security. They're a product of (find something you can measure and try to manipulate that number, no matter how meaningless | something must be done. This solution is something. Something is better then nothing. Therefore let's do this).
> If you can't measure how well you're performing,
There are a lot of assumptions baked into that statement, not least among them: (1) that being unable to put a number on it means you can't tell if you're improving or not; (2) that this all boils down to a number anyway (or a reasonable set of numbers); (3) That such numbers are actually predictive.
There are two primary reasons people complain about metrics in my experience: (1) They're difficult/tedious to get and/or (2) they lead companies to focus on the wrong actions. These would belie the claims you're making. (I can't tell if you're making them personally, or just echoing what you've heard, so I'll argue against the generic theory)
Turn your scenario around: We have a system. Experts in the field agree the system is more likely to cause poor decisions than not. Should we continue the system? (And trying to gather metrics on whether the system works or not is not the solution, because the question at hand is if metrics accurately guide decision-making.)
In the expected environment such as large corporations, the desire for an objective (even if bad) metric is driven by the fact because yes, being unable to put a number on it does mean you can't tell if you're improving or not - in every case you'll have some people making points why they believe it's improving and other people making points why it's becoming worse, with the arguments mainly driven by politics; in the absence of such numbers, any claims about a suborganization will be even less reliable as bad metrics - bad metrics can be manipulated, but no metrics means that claims can have no relation with reality at all.
And regarding your latter paragraph, the answer to "Should we continue the system?" depends on what alternative systems can plausibly be implemented for making the same decisions. If the current system causes poor decisions, but the alternatives are even worse (or there are better alternatives which we are unable to implement), then we should keep the current system.
Yes, the whole system would work better if people could just agree on what best needs to be done, and which activities are useful for achieving these goals, but often they can't. Metrics are a solution to a particular problem - an environment with limited trust and plenty of subjective politics, where any non-metric system for evaluating success and assigning benefits will be gamed as well. There possibly can be other solutions, but "simply" not having that problem isn't a solution.
You are preaching to the choir. I am just explaining why KPIs are prevalent and probably not going away any time soon. The business world is full of mediocre people who will cling to mediocre but predictable methodologies rather than take any kind of risk, especially if that risk is based in their own competence.
I'm not sure how to train machine learning models before defining metrics. I wonder how many other approaches/tools make metrics unavoidable. I think the key here is not marrying the metrics, and being happy to swap them out.
People manage to survive without planning out the number of breaths they will take every day. A competent engineering team is going to do well even without target lines of code per second goals.
I would argue that's counter to your intended point though. If you go into the E.R. or the hospital, one of the vitals they measure is in fact the number of breaths taken per unit of time. In a normally functioning human we don't have to worry about that metric because it is self regulating. If on the other hand you are in critical condition, measuring your breaths is a very important metric.
So as I've said elsewhere, the problem isn't measurement. The problem is as Deming said "The aim defines the system". Well thought out measures in accordance with clear goals and aims can be valuable. Like anything though, it requires careful consideration and may need to be revised based on outcomes.
I see a lot of straw man arguments in this thread. There are bad metrics - I think that’s uncontroversial. But you (and others) appear to be making the argument that there are no good metrics - and pointing to bad metrics is irrelevant there.
When it comes to creative endeavors metrics rarely assist (ignoring code performance metrics like resident memory size or movie sales)
You focus on your goal and try and take the best path there. LoC written or test coverages are distractions imho. Trust the skilled talented people you’ve hired as they are motivated to make the correct choices.
In education, there is a practice known as "backwards design" described in the book "understanding by design". Here, goals are set, behaviours that are evidence of achieving the goal are set (these are assessments), and then instructional activities are designed that "move the needle" on the assessments. But the critical 4th step in this iterative process is continuous alignment between the goals, assessments and instructional activities. This alignment process is intended to prevent the negative outcomes of teaching to the test.
From that perspective, any metric oriented organisation might want to have an explicit alignment process that ensures that the metrics continue to capture the desired outcome.
> ...any metric oriented organisation might want to have an explicit alignment process that ensures...
Yes, I think many folks would agree that metrics aren't an easy thing to use properly.
One huge problem, however, is that organizations that are apt to rely on metrics usually aren't sensitive enough to realize that there's more to their use than coming up with a wish-list and then dreaming up bullshit KPI's which are somehow intended "move the needle" towards correct outcomes. And when it doesn't someone just bullshits their way around the failure instead of doing the hard work needed to honestly address mistakes and problems.
That's, I think, what happens a lot in dysfunctional school districts, though many of us might see this comedy of errors show up in orgs with a matrix management structure.
A combination of self assessed competence and average actual performance vs expectations in terms of oversight required, time to complete required, lead time on flagging delays, effectiveness on coaching junior members, effectiveness of upwards management, pleasantness to work with
Why do you do the work they pay you for, if no one can measure it? Seems you've got a sweet deal, get paid to do whatever you want to want regardless of result.
The only way your outcome is unmeasurable is if your outcome had no observable impact.
If your presence had any impact, then it can be observed in some quantifiable way. Perhaps it's an increase in customer satisfaction, or a bump in revenue, or an increase in team velocity. All of this can be quantified in some form or another.
> The only way your outcome is unmeasurable is if your outcome had no observable impact.
I don't think this is correct?
Your work having observable impact and being able to show exactly what observable impact your work had in a quantifiable manner are two very different things.
I feel this is a huge problem at Google, where performance reviews/promotions are supposed to be based on objective measures. If you fix a lot of bugs, the end user might be much happier. However, to quantify exactly happy you need to have some existing infrastructure, and even if you do have that infrastructure, how do you know whether they are happy because of the bugs you fixed or the bugs someone else fixed?
It may be possible but is not really worth the effort for a company to invest in. So, what ends up happening is people (who care about perf/promotions) fight for the projects which will give them the most easily measurable impact, and ignore other things which may well be very important to a project or company but are difficult to measure.
Ideally, your performance should only be rated on things that were actually in your control. Admittedly, I don't know how one would do this though.
Not unless it's an extreme case. I found writing performance reviews excruciating because I don't pay that much attention to what other people do and hate to judge people.
$: I can measure how many dollars in sales I get after releasing your work. Of course you probably work on a team so I can't say how much of those dollars come from you...
It's time again for me to recommend Measuring and Managing Performance in Organizations by Austin[0].
Don't let the underwhelming title fool you: it's largely a book on why it can't be done. Austin takes classical principal-agent models and extends it to add a third participant: the client. He then shows that difficult-to-observe work at first improves and then worsens under any metrics regime.
No single metric can be found that will give the desired performance. The basket of metrics that can successfully create the desired outcome is one where the principal has total visibility into the agent's work -- which contradicts the original problem of difficult-to-observe work (ie, anything requiring skill).
Has it occurred to anyone else that the HN karma system is a performance measurement technique that undermines the goals of HN? I've stopped commenting, largely because for any comment I might make, I can imagine the cynical karma-harvesting comment that someone will make in response, the comment that would never have been made if there were no karma system here. And when I do comment, I waste a lot of time trying to harden my comments to such opportunism.
I've largely stopped commenting due to karma also. It's not just that wording the same sentiment differently can result in either up or downvotes, it's that I don't want to have to care about getting points on the internet
Right on. I'm a photographer as well as a programmer, and I've largely stopped sharing photos on Instagram because I don't want to measure my work based on the number of likes each receives, and I don't want the people I care about who "follow" me there to feel the need to like my work. I've watched friends go through their various networks' feeds and mechanically like, react, whatever lest their friends think they're giving them the cold shoulder.
For the same reason, I've come to find using Uber exhausting. Why do I care about my 4.75 (was 4.77) rating? I take taxis whenever practical because I don't want to endanger my rating by making too much small talk or not making enough small talk.
A thought experiment: What if HN changed nothing except that it never displayed point counts for posts, comments, or users? I think it would tamp down gadfly behavior. It may also, however, do the same to behavior that we want to encourage. There are no easy answers.
Maybe, but consider how terrible the conversation is on most Internet forums. Since it's hard to prove a counterfactual, we don't know for sure whether the point system is making things better or worse.
Looking at upvoted versus downvoted comments, my impression is that it makes things better overall with some subtle downsides, but it is hard to show this.
Meh, it's just a fact of the internet. Many sites often turn into an echo chamber and you can choose to either get with it, ignore it, or just get off the internet. This site isn't too bad (compared to say, Reddit) so I stick around despite the flaws.
You comment on someone else's comment, or the article. What you will get as replies, if anything, is unknown at the time of writing. Thus, it need not be and cannot be part of the process of writing the comment in the first place.
A good manager will use metrics as information to help in assessment of work, not as the sole measurement. Very important for any manager to coach is the real organizational concern behind the number.
If you’ve got more that n=5 things, at a certain point you _need_ metrics to make sense of how things are going. Are we getting our product to customers when they want it (OTD)? Being able to say, yes 90% of the time is very helpful compared to saying “who knows, we don’t do metrics here.”
“But the most dramatic negative effect of metric fixation is its propensity to incentivise gaming: that is, encouraging professionals to maximise the metrics in ways that are at odds with the larger purpose of the organisation”
Maybe there is some truth in it. But I also feel that you might get a better result if you let humans to figure out what is needed to be done to achieve a goal instead of telling them through a metric.
The feature that distinguishes a well designed metric is exactly that. Specific enough to ensure alignment with broader strategic objectives, general enough not to dictate exactly how.
I think the point of the article is summed up in the sentence:
"The key components of metric fixation are the belief that it is possible – and desirable – to replace professional judgment (acquired through personal experience and talent) with numerical indicators of comparative performance based upon standardised data (metrics)"
Basically, it is a problem of trust. You're trying to replace trust in people with a metric which, in theory, lets you not to have trust in them.
Personally, I think it is a foolish goal. You are effectively replacing the trust in people (which, admittedly, is vague) with trust in the metric (which connection to reality is vague). The vagueness has a reason - reality is complicated.
Why not just tell people what the "broader strategic objective" is, instead of trying to come up with a metric that is an exact (and so necessarily wrong) description of it?
Trust is the wrong word, its more about alignment. I do trust that employees can exercise their professional judgement. I don't expect that every employee can perfectly align themselves behind a strategic objective without guidance.
If my broader strategic objective is to cut costs by 15% allow more competitive pricing, the last thing I want is for every individual to define what that means to them and hope the math works out in the end.
"I do trust that employees can exercise their professional judgement. I don't expect that every employee can perfectly align themselves behind a strategic objective without guidance."
That's strange to me, because the former surely seems to be much more difficult than the latter.
Sure, say a company desires to broaden their customer base so they are less reliant on a small set of large customers. A goal could be "30% of net revenue should come from companies with less than $100M in revenue".
Its strategic, specific and measurable (we are at 5% and need to get to 30%) but it doesn't say how. Focus on a specific industry? Launch a new product? Restructure the sales org?
I think this article conflates Management by Metrics and intense focus on short terms results. Management by Metrics is just as compatible with a long term focus or goal.
Now on the issue of people gaming metrics, that is a real concern and I typically address it with "counter metrics" intended to protect against gaming. It is not perfect, but it works well enough when you take the time to choose your metrics (and counter metrics) carefully.
So instead of "improve A by 20%" we get "improve A by 20% without impacting B"
> Now on the issue of people gaming metrics, that is a real concern and I typically address it with "counter metrics" intended to protect against gaming.
From the article:
> In an attempt to staunch the flow of faulty metrics through gaming, cheating and goal diversion, organisations often institute a cascade of rules, even as complying with them further slows down the institution’s functioning and diminishes its efficiency.
This reads to me like a culture problem. A good engineering culture should help a lot to insulate against people who just want to game the system and aren't really invested in the success of the project.
Of course, how you get there is anything but simple.
Agreed. If it's measured in a metric, like bugs for every check-in or number of check-ins or numbers of lines of code, then reducing it down to metrics doesn't do anything to achieve the goal of the success of the product.
I'd - cautiously - argue it actually favours success of the individual over success of the product.
Every measurement you make that is advertised will change behaviour. It is very important when making measurements to take that into consideration. Similarly, there is a difference between measurements and metrics. A measurement is simply a measure of something. A metric is a measurement that you make that is fed back into the process improvement cycle: in other words it is a measurement you make to drive changes to the official process.
Things like targets or published measurements that have no particular purpose in your process improvement cycle are risky because they will change behaviour in undefined ways. You should always avoid them if you can.
Measurements should be taken in order to answer a question. You should avoid taking measurements if you have no questions. Similarly, you should try to measure the minimum that you can that will answer your question. This will minimise the change in behaviour.
Before you can have metrics you need a defined process (because by definition a metric is a measurement that is used to change your defined process). Avoid taking metrics when you are unsure of your existing process. When you change your process as a result of a metric, use measurements to answer the question, "Did we affect things we didn't expect to affect". If the answer is "Yes", then take more measurements to answer the question, "Is this change something we need to do something about".
In this way you will be able to control your process. One last thing. Do not attempt to protect against gaming other than to not measure things. People naturally optimise and will do it whether they are aware of it or not. Simply observe the effect of your measurements and decide if they are what you want. If not, then remove the measurement.
>Things like targets or published measurements that have no particular purpose in your process improvement cycle are risky because they will change behaviour in undefined ways. You should always avoid them if you can.
The problem is that a lot of businesses focus hard on metrics and this doesn't translate to a process improvement cycle, rather, it's on a "person improvement" cycle - making raises, bonuses, promotions, etc. dependent on that notion.
A perfect example how this is problematic is in areas like Customer Support, where a number of handled tickets doesn't equate to a good value of the product provided.
In a posit, let's consider there's two Support Agents working on supporting 'X' product for 'Y' company, over the course of a quarter. The product and the company have been around and established for quite some time, so there's no need for semantics on availability pool in this posit. Support Agent 1 works a case that takes 3 months but is a relevantly large product bug that impacts millions of users around the world via the fix.
Support Agent 1's workload, of course, has a considerably smaller queue size due to this. Now, Support Agent 2 keeps picking-up the low-hanging fruit because it's a numbers game and they know this.
What should happen is that management sees the disproportionate numbers and attempts to find out why that is (and/or attempts to automate the low-hanging fruit out of the equation). They should also look at the impact that both of their work had, in overall user-base.
Instead, what more than likely happens, is because Support Agent 1's numbers were so low, this makes Support Agent's Manager's numbers look bad and that goes up the chain. Upper Management, oft, doesn't care for the semantics around numbers, so shit rolls down-hill (as the colloquialism goes) and, now, Support Agent 1 is on "Unsatisfactory/Needs Improvement" for their review cycle, losing raises, bonuses, promotions, etc.
I think that would disenfranchise Support Agent 1.
Since this is a majority American board, consider the wonderful hell that is known as Wal-Mart. From what I understand, it also utilises metrics across the board. If you think on it, can you name a person "happy" to work there? I'd reckon you couldn't. (Again, this is my understanding of it and not anedotal, first-hand experience with the company or anyone who works for it. Aside from Asda, in the UK, Wal-Mart isn't a "thing" here.)
So, more than just being fruitless, if it doesn't go into the process improvement cycle, I'd say it rolls pretty high numbers in damaging the overall morale.
That example is a measurement, not a metric (even though it is mistakenly called a metric by people who don't study this stuff). It is also a bad measurement to take, as you point out.
This gets into very complicated discussions. How do you measure productivity? How do you reward good performance? Can you use a reward for good (measured) performance as extrinsic motivation?
Anybody who tells you they have a good way of doing this that will work across a wide range of different tasks is either deluded or lying (or both). Lot's of successful businesses are run by deluded/lying people.
How do you measure intangibles like knowledge or customer service or innovation? How do you weigh any number against another?
Surely a single number or sign only is insufficient for such broad areas. Much less a predictor based on one of those numbers which is why... measures lie. People lie less often and in a more detectable way.
The tangible results of them can be measured but there are multiple variables affecting it, from politics, through skill, talent and finally luck. If any of those elements fail, you get nothing, despite intangible gains that may provide a long term advantage.
>How do you measure intangibles like knowledge or customer service or innovation?
You figure out proxies as best you can. How do you know "customer service" or "innovation" are important, are improving, exist, or are what you think they are, without figuring out a way to measure them? This is precisely the bane of buzzwords. "We focus on innovation here!" but no one can determine what that is or if the company is any good at it.
>The tangible results of them can be measured but there are multiple variables affecting it, from politics, through skill, talent and finally luck.
There are a few comments in this thread that seem to come from the same premise: that measuring is hard because "multiple variables" come into play when you measure things. I hate to break it to you, but those factors are in play whether you are measuring things or not. The purpose of measuring is to minimize the unknowns and reduce variance. All of the variety of factors that are in play in a massive organization are precisely why you need to measure things, not a reason to cast measurement aside.
>The intelligence analysts who ultimately located Osama bin Laden worked on the problem for years. If measured at any point, the productivity of those analysts would have been zero. Month after month, their failure rate was 100 per cent, until they achieved success.
I guess if the metric was "caught Osama Bin Laden or not", then yeah, I agree.
But who on earth would measure the analyst's productivity just so? Nothing but a strawman.
>The source of the trouble is that when people are judged by performance metrics they are incentivised to do what the metrics measure, and what the metrics measure will be some established goal. But that impedes innovation, which means doing something not yet established, indeed that hasn’t even been tried out.
Why? If a (admittedly weak) metric like "# of paying customers" is used as a goal, how does that "impede innovation"? How are people being stopped from coming up with creative ways to get more customers? Why do we assume the ways in which they improve the metric will be bad, and not innovative?
> I guess if the metric was "caught Osama Bin Laden or not", then yeah, I agree.
> But who on earth would measure the analyst's productivity just so? Nothing but a strawman.
Is that much different than measuring LoC, tickets closed, or features implemented? Some features just take time -- sometimes just the research alone to implement something could take a few weeks where literally nothing measurable happens. That's the point: completion metrics aren't useful for larger or open-ended projects. There was no gnatt chart, no schedule for finding Osama bin Laden; similarly you sometimes cannot plan out a large software project from the very start.
I really disliked that example. The Pakistanis knew where he was for years and the CIA sent teams of "vaccinators" to conduct a hunt for him, making many people in need of vaccinations distrust modern medicine, a crime that no one will answer for that could easily cost more lives than bin Laden ever did.
This is OT, but if you're interested in understanding this epochal event in American media narrative, there's no one better than Sy Hersh to listen to:
Is there any reason to think that people are worse at gaming qualitative evaluations than they are at metrics, in general? Things like making friends with one's boss, appearing to work hard, and so forth.
The people who are gaming are probably not. However, the people who are not trying to game the system might decide to do it if there is an objective goal.
But perhaps you're right. In capitalism, the most common purpose why companies exist is to game the system (i.e. make profit). I don't see why in that case, employees shouldn't try to game the people who game them (i.e. capitalists).
However, in companies that are collectively owned, this becomes less clear.
>However, in companies that are collectively owned, this becomes less clear.
Unless it has no management there is still a management class with disproportionate decision making power and thus the antagonistic relationship necessary for there ti be a clear case for gaming continues to exist. Humans are really bad at removing hierarchies, to the extent that I would assume any place without hierarchies has them in a harder to notice format.
No, but performance metrics actively damage the work of the people who aren't gaming the system. If you can't make things better, you can at least not make them worse.
I think metrics can be very useful but as with many things, you need to know how to interpret the data.
I'm in the restaurant industry, so my #1 metric is profit (over the long run anyhow). Now, the next metric is sales, since profit is a % of sales. So to get those sales, we're looking at how many new guests are coming through the door, repeat guests, average guest check, etc... After that, it's margins. Sales mixes play into that (relating to COGS), and labour. Now, to determine what we're getting out of our labour, more metrics. How much each server is selling, what products they're selling, how many guests they can serve in a night, etc... For cooks, it's about productivity and consistency.
Anyhow, there's a lot of shit restaurants and workers out there (due to low barrier of entry education-wise), lots of people misinterpret the data, leaving to a feedback loop of shit managers and employees. Often restaurants will focus too much on one of those metrics, leading to places that just gauge guests, or others that entertain guests who are fishing for free shit.
Now, as for metrics in the programming world which probably don't matter, I always hear about lines of code written. Obviously it's easy to game and hard to discern, as code can be too terse or too verbose. Programmers are also usually so far removed from the sales part of the business that there's no objective sales metric to use either. And that's where you need good managers. I'm sure there's a good set of metrics that give some idea of performance, but they need to be interpreted by someone.
And all of this reminds me of stats/economics (what I did in university), where you're bombarded with data and need to interpret it. Like GDP per capita can indicate general well being, but then you adjust it to PPP to get a better metric of quality of life, and add inequality calculations to understand how it affects different segments of the population, and you can go even deeper with specific stats underlining quality of life (amount of disposable income, amount spent on housing, amount spent on entertainment, etc...).
Anyhow, tldr here is that metrics matter, but interpreting them is a skill and makes all the difference.
All valid points. Of course though, you can't rail against metrics as a whole. We all use them every day and always will.
We don't personally verify a financial advisor is a genius, we look at their annual rate of return.
We trust that if unit tests fail, code isn't good.
We trust the SATs, which are a better and more scalable and fair measure of aptitude than any group of humans I could imagine.
The piece rails against metrics, which often backfire, but doesn't even mention the alternative (trust those in power make any choice they desire with no justification) which also is often abused.
I don't have an attribution for this quote, so if someone knows it please share...
If you want an opinion let's go with mine.
Otherwise let's look at the data.
It's true that many quantitative metrics can incentivize people in the wrong way. There's a funny story about a city trying to get rid of rats that incentivizes people to hunt rats and turn in the tails... but the people start farming the rats in order to get the reward instead while the project leaders pat each others' backs.
However as an engineering leader you should not discount the value of qualitative metrics or metrics altogether. I try to only make a decision "from the gut" when there is no other choice and decision must be made. Data is key. How do you know how well your code review culture is thriving if you do not see, across the entire team, metrics reporting the number of open pull requests, how long they are staying open on average, how many comments they're getting, etc? How do you know if one of your engineers is struggling if you're not watching what gets checked in, what code is being rewritten frequently, etc? How do you gain leverage for your team to management if their are no facts, metrics, to back you your arguments?
Just a reminder that you should use metrics. Maybe avoid using them as reward mechanisms and use them as leverage instead.
I don't work for GitPrime but you should definitely check out what they're up to.
Well if you want to play that game then how do we know the person who's relying on their instincts and intuitions isn't lying either?
Who do we believe?
I didn't suggest using metrics to make anyone accountable but using them as leverage to empower your team.
What would be the point of gaming those metrics? Nobody would profit from lies and if you're checking the providence of your data, in the case of Gitprime your own repository, then identifying a culprit would be rather easy.
> Contrary to commonsense belief, attempts to measure productivity through performance metrics discourage initiative, innovation and risk-taking.
Is this really contrary to commonsense belief? I think most business leaders are very aware of this, however, the alternative is chaos. That's why certain specialist teams are given leeway.
I think awareness is very bimodal. I've sat in rooms with people who really thought they could transform a business with 5% YOY improvements.
(That's not always the wrong view, either — a key part of this is recognizing what kind of situation you're in and whether you really have the trust and resources to bank on a revolutionary change)
The author in the article focuses only on one-half of the metrics within a given process, the lagging results. There is another whole dimension to performance metrics which is the causal dimension, your leading metrics or leading indicators.
You can only effectively use a lagging indicator–such as time to restore–once you understand the activities that make up the world of restoring service. Does the product have an SOP document? Is there monitoring in place to alert technicians quickly? What is the training level of your employees? Are your employees empowered and engaged? All of these things matter far more than lagging outputs.
Sure, we can disparage lagging indicators all day, but we can't throw them out just because they can be gamed. You have to push deeper and that requires time, process knowledge, and building a healthy working environment.
Finding useful performance metrics for evaluation is difficult and assumes somebody knows what is best for the dept/company/division, that the evaluators are well trained, and which metrics show the correct way. There are useful metrics if you are playing Golf, baseball, poker, bridge, chess, but even then individual performance will vary year to year and even the stars have slumps. The Normal Distribution shows 95% of employees are within +3 Std Dev of the mean with only 2.5% excelling and 2.5% needing help in a given year and in the next year everyone will have a different ranking. Performance metrics are mostly a waste of valuable time for both the employee and the management.
Compared to the other sciences, it seems to me that management science, as an entire discipline, is a nearly complete failure. This is a field where startup experiments by complete newbies lead the conversation against a network of business schools and long-standing organizations. There seems to be no consensus whatsoever on the best way to manage people. It’s not even clear, to me at least, that modern management practices at many large corporations are provably better than some kind of naive baseline like a pure democracy or pure dictatorship. Expert texts are nothing more than pontification and hand-picked anecdotes. Does anybody really know what works?
Even worse, in the same source basic concepts have multiple definitions, partially overlapping, partially contradictory.
This is exactly why I believe C-level officers are grossly overpaid. Previous success does not guarantee future success, especially in different industries, hence they don't know some magic formula to make his salary * 1000 in profits for the shareholders anymore than a mid level manager does.
Its happening all over my company, with dashboards for all kinds of useless metrics.
The worst part is gaming unit test coverage to get over the 80% mark.
I keep trying to say I would rather have 40% well written tests than 80% crappy tests.
Management via metrics is a really useful shortcut to generating alignment, but it is a shortcut. If you tell people that they need to move a number there isn't a lot of ambiguity and you can just keep beating the same drum over and over. Trying to align people on the whole business context is much harder and takes a lot more time though obviously much more valuable.
> to replace professional judgment (acquired through personal experience and talent)
The thing is there is no magical way to measure this. How you define personal experience and talent? From the number of years spent in the industry? It's still back to another metric.
Metrics do matter. Common sense does also matter. Making decisions in a chaotic world is indeed difficult.
What this article misses out is that the metrics will be present and used whether they are articulated or not. By making them explicit they can be improved, validated, debated, and removed. Standards are consistent. When they're implicit, you get none of those benefits and instead promote an insider culture of unspoken biases that get no scrutiny at all.
The choice isn't between metrics and human intuition, it's between explicit and implicit metrics.
Of course there are bad metrics; to use an example from the article - measuring the output of analysts finding OBL on a binary metric is simply a bad metric. But if we start from there, we can improve it. How many leads do they have at the current time? What stage is each lead at? What intelligence has it generated to this point?
> Economists [..] report that in recent years the only increase in total-factor productivity in the US economy has been in the information technology-producing industries.
And guess how those industries did that? Metrics.
Metrics are a simple tool. They are almost worthless by themselves and can even be a detriment. But like any tool, if you pick the right ones and use them right, it's much better than not having the tool. There are tons of ways business processes can improve by analyzing metrics, it's not just an employee productivity stick.
Reminds me of a great book, about managing the expectations instead of managing the companies for real, and how the system is rigged to select for short-term gains what in most cases backfires.
We have known this for a long time, but it is hard to have an alternative system which scales well with huge organizations. For startups and small companies, I could see an informal system working pretty well, but as the company grows to hundreds or thousands of employees, it becomes necessary to standardize and have some kind of metrics used for reporting and evaluations. This will inevitably shift the company's culture towards gaming those metrics.
Somewhat like grading systems in education. High grades don't necessarily mean you will be capable of generating more value to society than average grades or even low grades. And students often become good at improving their grades without that actually adding much value. But there is a correlation. And we don't have many better (non-experimental) alternatives that I'm aware of.
[1] https://en.wikipedia.org/wiki/Goodhart%27s_law