When the flight instructor story and regression to the mean were discussed on HN a year ago[1], user "cortesoft" had a brilliant way to convince you:
Yeah, I always make an example with coin flips to show how this is true.... lets say heads is success and tails is failure.
Flip 100 coins. Take the ones that 'failed' (landed tails) and scold them. Flip them again. Half improved! Praise the ones that got heads the first time. Flip them again. Half got worse :(
Clearly, scolding is more effective than praising.
Really brilliant. I had an experience in basic training shooting a 9 millimeter pistol at a target using a 'laser trainer'. I got a perfect bullseye the first shot. The drill sergeants were highly impressed. One particularly savvy sergeant (must have read this article?) Immediately said, "do that again". And I did not make a bullseye again, and everyone went about their business yelling at privates, and getting yelled at. That particular drill sergeant knew something about probabilities apparently.
This, and the original link, base this concept on the idea that the task being monitored is subject to random chance. But the original link is talking about an act that requires skill and something that a person is trained for. So the analogy of the coin toss doesn't make sense to me, nor does the original link.
It's like saying that just because I was able to parallel park successfully this time in one maneuver without hitting the curb or either of the cars in front or behind me that I have some sort of increased chance of not being successful the next time. That seems kind of absurd, doesn't it?
Don't equate things that aren't 50-50 with not having a degree of chance. Parallel parking and aerial maneuvers are not devoid of chance playing a role in success. The impact of chance does decrease with skill, but skill doesn't remove the element of chance.
So the coin is an extreme example, not an irrelevant example.
One of the most pervasive myths we humans cling to is the idea of agency; that our skill and will an overcome random chance. While we might be able to undertake methodologies that skew chance distributions in our favor, everything around us is a game of probability we can only gently influence.
Methods are MUCH more important for a career than any individual execution.
The point of this example is that there is (obviously) no difference in effectiveness. However compare this simplified example with the following citation from the article:
>“On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver, and in general when they try it again, they do worse. On the other hand, I have often screamed at cadets for bad execution, and in general they do better the next time. So please don't tell us that reinforcement works and punishment does not, because the opposite is the case.”
With praise, half got worse and half stayed the same.
With scolding, half got better and half stayed the same.
Scolding is obviously better, in terms of effect on outcomes.
Though it's probably a better parallel to the flight instructor case if you have a six-sided die, and you praise on 5-6, and scold on 1-2: most of the praised get worse, most of the scolded get better, those who get neither scolding or praise have mixed results, and a few of those scolded get worse and likewise a few praised get better.
The coin flip example, while in one sense more clearly showing the problem, simplifies to the point where the connection with realistic scenarios is harder to see, IMO.
In other words Gaussian distributions will reliably revert to the mean, whereas feedback will always push to the right. Those praised, already on the right, will move against the desired direction, left, towards the mean. Those scolded, already at the left will move in the desired direction right, also towards the mean.
I think they summed up multiple coin tosses (say 10). Then "scold" or "praise" the losers/winners with lowest/highest sum of 10 tosses. (So you get a probably distribution and regression towards the mean.)
What if it wasn't 50/50 though, and the flight instructor took actual measurements and found that 80% of praised cadets did worse on the second try (vs. no praise) and 80% of scolded cadets did better on the second try (vs. no scolding)?
Wouldn't that indicate that indeed no praise + lots of scolding is the best approach?
It might if the flight instructor also controlled for what circumstances he hands out praise and scolding.
If it's a Gaussian distribution of performance, and he praises above the 99th percentile of bad performance and scolds below the 99th percentile, that 80%-80% pattern might very well occur---but again, it'd be expected to occur, independent of the signal he's putting into the system (because the candidate's he's scolding are staggeringly unlikely to get worse, and the candidates he's praising staggeringly unlikely to get better).
Man, that example knocks it out of the park. I had my own retort to the basic concept behind the article, but that one tears it to shreds.
Anyway, the tendency of the article pushes negative reinforcement, not just anecdotally, but also as a continuous prevailing message, instead of alternating with degrees of praise even if only on occassion. There are at least three things wrong with that on a behavioral level: a. resentment as an inversion of stockholm syndrome b. learned helplessness, and c. simple fatigue.
In other words, people will hate your idea of leadership, conclude that you can never be pleased, and realize that they'll be worked to death on a treadmill of demands, if they don't jump ship sooner or later.
It's more of a cautionary tale about dealing with people--I can't count the number of times I've been in some kind of work situation where the people above me believed in some absolute folklore that was detrimental to the execution of the work but they would 100% argue with anyone trying to be reasonable about what was really going on. It's like a learned helplessness, or some kind of odd comfort with the status quo. They would invent from whole cloth a whole belief system about how some software worked that was completely irrational. I include IT admin people here, not just non-techies. And the thing is, I've seen it happen more than once amongst otherwise intelligent people!
So then you're in a situation where the believers begin to exert their authority over you in order to control the situation and back up their beliefs instead of trying to solve the problem, up to and including threats of termination. It's as though they might suffer emotional harm if you fix the actual underlying problems that caused the whole need to create the belief system in the first place. Those behaviors are that strong and that lizard-brained.
One example: A transaction issue (actually the lack of a proper transaction) in a database was causing duplicate order numbers to occur pretty frequently each day on a large, multi-user system. Instead of getting someone competent to come in to analyze and fix the issue, the people at the company developed all these elaborate procedures for how to enter orders so that they could avoid creating duplicates in the system. They had actual written procedures and even the work schedules were affected by this order entry bug. I swear they were twirling around three times and throwing salt over one shoulder before entering a new order, but only on odd-numbered Wednesdays and Fridays.
I've seen similar problems involving thread local data and badly written multithreaded code, odd caching issues on networks relating to AJAX calls from certain browsers, network setup issues, you name it.
You can also consider that flight training (or any skill really) is not a situation where you can really top out the range. Cadets can always get better. When they do, the standards go up.
This is a good example of why it's important to recognize the difference between individual anecdotal experience and results gained from properly analyzing data.
It's rare for people to make enormous improvements or experience enormous depredations in performance one experience to the next. It's common for natural variations due to random factors, especially when still learning, to crop up and impact performance from instance to instance. That's true whether one is shooting a basketball at a net or flying a jet aircraft. This means that quite often the experience of providing negative feedback will seem to result in improvement. But this is also true if one simply were to secretly write their negative feedback down, put it in a letter, and never show it to the student. Due to the regression to the mean effect. Students who do poorly in one round will often do better in the next, just as students who do well in one round will often do slightly worse in the next.
Only over a longer period of time and collecting data from multiple students will you balance out these effects and be able to determine whether positive or negative re-enforcement work better. And the data does seem to indicate that positive re-enforcement is superior.
The question then is, what is the optimal latency to recognise improvement or decline? Could deferring judgement help with combating this observation?
Then also what magnitude of praise or negative feedback elicits best results?
Essentially this could down to finding out the optimal training schedule for humans. (Like for neural nets or classifiers.) And this is similar to adversarial training. You have to consider both the trainee and the trainer (discriminator).
I suspect what truly works is (a) the instructor knows something that you don't know, (b) you want to learn it, (c) you establish communication. Praise and blame are an inherent part of that communication. In certain circumstances, especially where (b) is less true, then praise and blame get upgraded to reward and punishment. For example if you are a child at school.
We like to say that people (especially children) shouldn't be punished; they should only be rewarded. And then we deny that we are still punishing them, after all, for example by silence and withdrawal.
The reality is you aren't in a relationship if you only get praise and positivity. That feels meaningless or even creepy depending on the intensity. A genuine connection will feel positive/neutral most of the time and negative occasionally, since we are knowledge-creating entities. For example, in a computer game you're learning optimally if your win:lose ratio is somewhere around 80:20. I would guess this applies to our relationships too.
Neither reward nor punishment will 'work' in the absence of the other. This is why tyrants go over the top with punishment, because all blame and no praise is lack of relationship too.
I think context-specific norms are important, too. I was on a lot of Little League teams where the coach was a curmudgeonly guy who was always yelling, and the kids mostly didn't mind being yelled at. It was happening to everybody, so it didn't mean anything bad about them personally. Same thing in school; some teachers were just mean. But then there were teachers who would treat different kids differently. Some kids would get the acid tongue over and over again no matter how hard they tried; others just got sweetness and light. That's terrible. It definitely doesn't have the desired effect; at least, I hope no teacher would want to make kids feel sick about coming to school every day.
If you establish a norm that everybody gets yelled at, even the better performers, then getting yelled at comes to just mean "pay attention" or "you need to focus more on this." Otherwise getting yelled at means, "You are not living up to the standards of the group, and we resent you for it. You should worry about what's going to happen to you here, if you're even allowed to remain."
> Praise and blame are an inherent part of that communication.
Language acquisition presents what is perhaps the easiest falsification of your claim. Children don't learn verbal communication so well because of a system of instruction inescapably based on rewards and punishment from an instructor who can teach the lessons that the child wishes to learn (or thinks they should learn). They learn so well because their brain essentially builds them a nice filter chain based on the sounds they hear at an early age range (thus language exposure is an important factor in later language learning).
We hoist praise and blame onto that process mainly because most parents have decades- (or sometimes centuries-) old concepts of learning that aren't based on modern research. Still, I'd much prefer they err on the side of too much praise rather than risk abusing their children. We have plenty of research that tells us the clear risks when that happens.
> In certain circumstances, especially where (b) is less true, then praise and blame get upgraded to reward and punishment.
The example above is a case where (b) is less true. Infants don't desire to build a language filter based on the sounds they are hearing. It happens involuntarily. But your system would actually guide a parent in the wrong direction-- escalating praise/blame to reward/punishment in a situation where neither are warranted.
> For example, in a computer game you're learning optimally if your win:lose ratio is somewhere around 80:20. I would guess this applies to our relationships too.
For that to be testable, your character in the game would have to stay dead once it gets killed. Or at least its injuries would need to follow it everywhere. Like a friend tries to show you a new move and hands you the controller, then the game says, "Hey, you're that guy with the broken leg," and doesn't allow you to do the move.
I think the win:lose ratio would change significantly in that case.
People implicitly praise and blame by their emotional responses. A negative response from a person you admire or seek to emulate is felt negatively whether it was intended as blame or not.
Reward and punishment are an amplification of a generally unwanted signal. I'm not advocating these; I'm not taking a moral stance. Rather I'm talking about what people already do irrespective of what they think or say they are doing.
Children learn language because it helps them to get what they want. If reward and punishment guaranteed results then adults would be able to reliably recite the multiplication tables (which they can't).
I agree with you except in interpersonal relationships people tend to confuse “punishment” with meanness or losing one’s temper. There’s never a good reason to not be patient or to lose your temper. Having room for negativity is important, but communicating negativity is much more difficult than praise on many levels.
I bet those who believe that anger is always and everywhere wrong are the ones who lose their temper. (They may go silent and direct the anger inwardly.) Those who use the anger will employ measured criticism and intensify their efforts.
I'm not sure how accurate, but I've read a book that claimed killer whale training only uses positive reinforcement. That you can't punish an orca and then expect to be able to get into the water with it.
"We like to say that people (especially children) shouldn't be punished; they should only be rewarded."
I think neither is correct. I think this is a tactical decision that needs to be made in the moment based on time and energy.
Which is to say, it is neither correct nor incorrect to slowly, agonizingly, peel off a band-aid. This can be a successful strategy. Sometimes, however, you just need to rip it off and get on with your life ...
Yes. I don't think one can escape from the fact that punishment and reward go hand-in-hand. By the contrast principle, absence of reward is logically equivalent to punishment. Even without explicit rewards and punishments, children will pick up their parents' emotions. And if there are no emotions to pick up, the child will choose another parent figure. Because children want to grow.
Ask yourself why do they do double-blind experiments. I had read that depending on the illness, a non-active treatment (sugar pill) given to the control group improved their reported condition by significant and substantial margins over a population that did not get the fake treatment.
"Placebo interventions are often claimed to substantially improve patient-reported and observer-reported outcomes in many clinical conditions, but most reports on effects of placebos are based on studies that have not randomised patients to placebo or no treatment. Two previous versions of this review from 2001 and 2004 found that placebo interventions in general did not have clinically important effects, but that there were possible beneficial effects on patient-reported outcomes, especially pain. Since then several relevant trials have been published."
...
"We did not find that placebo interventions have important clinical effects in general. However, in certain settings placebo interventions can influence patient-reported outcomes, especially pain and nausea, though it is difficult to distinguish patient-reported effects of placebo from biased reporting. The effect on pain varied, even among trials with low risk of bias, from negligible to clinically important. Variations in the effect of placebo were partly explained by variations in how trials were conducted and how patients were informed."
This suggests the double-blind may have more to do with stopping the researchers fudging the results, than convincing the patient they're getting real treatment.
I've heard that claim before, and it seems rather obvious when you think about it.
That raises the question, has the medical research community picked up on this? I would expect a burgeoning movement to correct this in medical trials.
As far as I'm aware, they have, and once you take that into account placebo only has a very limited effect in controlling self-reported pain and nausea, everything else was just bad statistics.
Though as that article explains, shysters, whether practicing some kind of standard woo, or working for big pharma, are keen to continue to pretend they don't understand this as long as it's paying their salary.
I read it once and have no idea what point they are trying to make. Are they saying, be it reinforcement or punishment, either way its an exercise in futility?
He's saying that rewarding success and punishing failure, combined with regression to the mean, leads to a perceptual bias in which we believe that the reward causally failed and the punishment causally succeeded, because of actual but spurious contingencies that are induced.
Let's say that you repeat something over and over again with the same individuals, with random Gaussian outcomes on each trial. What will happen is that an individual who is at the extreme on one trial (positive or negative) will be more likely to be in the middle on the next trial, because the extreme deviations are extremely unlikely, and the middle deviations are relatively likely. So the really good person on one trial is likely to look worse on a subsequent trial, and the really bad person is likely to look better.
If you praise the person who did well on the initial trial, then, you will see good outcome -> praise -> worse outcome. If you punish the person who did poorly, you will see poor outcome -> punish -> better outcome. But this is spurious, because the good -> worse transitions and bad -> better transitions would have happened regardless of whether you praised or punished.
I think the thought experiment is imperfect for various reasons but does illustrate an important potential source of bias pretty well.
I think the lecturer and the objecting instructor are talking about two different things.
The instructor's thesis is that positive reinforcement is more effective. I'm assuming that this was a measured, long term result, e.g. students that were positively reinforced did better on final tests or other summarizing measurements.
The objecting instructor is talking about single instances "in the seat," not measured over time. "More often than not, the next maneuver ..."
I'm assuming that a student's performance will be somewhat more random than an experienced practitioner's performance. So there would be a tendency for regression to the mean in isolated instances, and a small enough sample of instances would not be enough to predict final measured proficiency.
Also because of the way the objecting instructor phrased his objection, I'm assuming that he hadn't checked his observations against the final summarizing measurements.
This is how I made sense of the conflict between the two people.
Nate Silver's book "Signal and the Noise" is all about this kind of thing. The key question being "What is signal and what is noise?".
Is doing well on one maneover mostly signal (the pilot skill level) or noise (random chance).
In the book Nate talks about his own career as a poker player, and how there's so much chance involved that he still didn't really know if he was a good player or not after doing it for a couple of years, that's how hard it is to tease the signal from the noise. Would beating himself up about every lost hand and celebrating every win be worthwile, or would it just distract him from the real task of slightly beating the odds and grinding out a profit at the margin.
On the contrary - but suggests that a lot of psychological research ignores how the person training reacts to the feedback and performance of the trainee - thus rendering the recommendations hard to follow or moot.
No, they are saying that encouragement and reinforcement work better over time, however if you only look at a single example you will get the opposite impression due to regression towards the mean.
Judging from what I remember of Kahneman's Thinking Fast and Slow, this is probably about the base rate fallacy. So yes, the claim is that often there is no real learning involved.
Yes, it's an exercise in futility in high noise environments such as initial flight cadet performance (where mean performance only shifts over the long-term)
While I agree in general, you seem to be implying that because praise is "better" that punishment has no value.
I am sure that isn't what you are saying, but for the sake of argument, here's a counter.
After raising a few kids, and watching many other parents do the same, I have found that both praise and punishment is an absolute requirement for success. But praise is good for personal bonding, and punishment for learning responsibility.
As an adult we punish ourselves (if raised to self evaluate) and on the extreme side we justify our selves in error if only given praise. Self evaluation cannot be done without self punishment.
Before you go nuts on what "self punishment" means in this context, I will provide one. At the lowest level it means acknowledging that an error was "my fault". The most minimum amount of punishment possible is taking responsibility. It hurts, it's punishment. Blaming others is easy, and doesn't hurt at all, but will not help you avoid repeating a mistake.
Having to apologize for a mistake or hurting others is a form of self imposed (or parent imposed in some contexts) punishment. And it's extremely effective in resolving conflict.
Again, I am sure that I read into your point, but it seemed fitting to address this purely from an argumentative perspective.
The instructor thought that "50% chance of receiving a positive signal if you send out a negative signal" was better than "50% chance of receiving a negative signal if you send out a positive signal".
No, the instructor thought that their feedback had a clear causal connection to the cadet’s subsequent performance (scolding makes them perform better, praise makes them worse) when actually the effect (if any) could not be separated from random noise in performance and regression to the mean.
My bad, the choice of "thought" implied more deliberation than I meant. As I read it (with the coin experiment) the instructor was more likely to focus on positive results if he gave a negative response and negative results if he gave a positive results than on a positive results if he gave a positive response and negative results if he gave a positive response because of bias reinforcement.
It’s not really that either. The correlation the instructor saw was real—a cadet would, on average, actually perform better after scolding and worse after praise! But the instructor wrongly assumed that their feedback caused the change in results when in actuality the only clear causation was from the cadet’s earlier performance to the instructor’s feedback.
Because of regression to the mean, an exceptional result is more likely to be followed by a more average one, irrespective of the instructor’s reaction. What the instructor actually achieved was condition themselves to respond in a certain way to a student’s performance, rather than conditioning the student to respond to the instructor’s feedback as intended!
> I immediately arranged a demonstration in which each participant tossed two coins at a target behind his back, without any feedback.
I thought the demonstration was going to be to set people in pairs, have one toss a coin and the other punish / scold him every time he didn't come up with tails ;-)
That would help make the case that the positive effect of punishment is entirely due to chance.
For those finding this piece inspiring, I would recommend "The Undoing Project: A Friendship That Changed Our Minds" by Michael Lewis [1].
The book is about Daniel Kahneman and Amos Tversky and their research. In my opinion a very good combination of telling the story of the main characters and also giving interesting points from their research.
Be wary of applying this to thought experiments. As yourself what if the supposed reversal was not true, and I am only being told this?
The 'gets better, gets worse' quality here, is contestable because the underlying "skill" is a phantom. There is no perversity, testing if a non-existent thing is a property of the system or not: thats a strong test in science.
The perversity, is people's belief in the distinction of ability over something subject to random chance.
Most real-life situations are not like that. The contingency assumes that your underlying process is random with a normal distribution. In everyday life, especially when working in teams or projects, processes are not of that kind. Other people are neither random, nor is their behaviour normally-distributed. The relative effectiveness of praising and scolding is unclear in such situations. So it seems unwise to transfer this anecdote to other contexts.
It shows that when there is a gaussian distribution of performance, then exceptionally good performance will, purely based on statistics, be followed by a decrease in performance, and exceptionally bad performance will be followed by better performance.
The point is that this effect will usually be much stronger than the effects of feedback, and if you naively analyse what kind of feedback works better, it will lead you to the wrong conclusion because praise is given for exceptionally good performance while scolding is given for exceptionally bad performance.
"...because there is regression to the mean, it is part of the human condition that we are statistically punished for rewarding others and rewarded for punishing them."
I was enlightened.
No, the point is that praise is better, but appears to be worse because you praise exceptional performance and people are unlikely to repeat or exceed that exceptional performance right away.
that doesn't help deciding whether one should prefer praise or scold though , since he says both fall within placebo levels. If he is trying to say that there is more room for improvement from a bad performance, thats kind of obvious but it still means most of the time you will need to scold instead of praise.
The evidence for praise being better is completely separate from this whole argument and comes from different sources. The article is about refuting a very misleading interpretation of common experience.
In the particular example he describes, both are placebos, because the performance is essentially random, and even in that case, it will appear that punishment is better than praise, because the administration of punishment/praise is linked to performance, which in this example, is purely random.
His thesis is that praise, in real-world situations, has a positive effect, and punishment has a negative effect, but on extremely short timeframes random effects dominate, leading to the reasonable empirical belief that praise is ineffective compared to punishment.
Yeah, I always make an example with coin flips to show how this is true.... lets say heads is success and tails is failure.
Flip 100 coins. Take the ones that 'failed' (landed tails) and scold them. Flip them again. Half improved! Praise the ones that got heads the first time. Flip them again. Half got worse :(
Clearly, scolding is more effective than praising.
[1] https://news.ycombinator.com/item?id=12707606