This is such a damning list. At some point, reporting on polls when they have *t...

systemvoltage · on Nov 7, 2020

One of the things that's annoying about 538 is that they don't admit they're wrong. The excuse they keep repeating is that "1 in 10 chance Trump can win. That's about the same chance as rain in LA, but it does rain in LA!". 538 seems to have forgotten in their condenscencion is that we are not look at you to explain the odds. We are looking at you to provide an accurate probability estimate. Their defense is completely ridiculous. I follow betting odds which had Biden/Trump at 60/40 split before the election day - what's the point of polling when you can look at people putting money on the line, let the incentives work for you.

scrooched_moose · on Nov 7, 2020

I'm not sure that's quite fair. They do write some excellent retrospectives and admit when they made mistakes, at least more so than almost any other news outlet I've seen:

https://fivethirtyeight.com/features/the-real-story-of-2016/

https://fivethirtyeight.com/features/how-i-acted-like-a-pund...

https://fivethirtyeight.com/features/what-i-got-wrong-in-201...

https://fivethirtyeight.com/videos/what-harry-got-wrong-in-2...

I have noticed them getting a bit more defensive recently which is irritating, but I think they are honest. There's only so much "garbage in garbage out" they can compensate for. If this ends up at +74 EV, which is where it looks to be heading, it'll be "Z=-.6" prediction (not actually normally distributed, but it's the easy calculation) prediction, meaning 27% of the outcomes were more Trump. It's not great, but I appreciate them giving a realistic model.

systemvoltage · on Nov 7, 2020

Thanks for these, I've been following it for last few months.

Nate Silver still makes fun of Betting odds: https://twitter.com/NateSilver538/status/1325186985988935680

I don't see him posting how the Bettfair was way more accurate than his polls.

I also get that Nate Silver doesnt poll himself, but he aggreates polling information, assign grades and mixes it into his formula. I just have zero trust in it. It's not like we get to test his predictions often. 1 sample size every 4 years.

robryan · on Nov 8, 2020

Biden got out to around $4-$5 during election night. I wouldn’t put much weight in the betting when it appeared to fail to account for the expected composition of postal votes. Trump should have been at best a slight favorite before the effect of the postals started showing in the counts.

whack · on Nov 8, 2020

And why do you think they failed to provide an accurate probability estimate?

If I tell you that the odds of your dice roll being a 6 is only 16%, and you roll a 6, does that mean I failed to provide an accurate probability estimate?

The only way to judge the accuracy of a probabilistic model, is by quantifying the weighted error across numerous predictions. Not by looking at two yes/no outcomes. This is the exact point they are trying to get across, which you wrote off as annoying.

You do raise an interesting point about the betting markets - I too am very curious about whether they have a better track record than people like Nate Silver. Looking forward to someone analyzing their relative accuracy over a large sample of independent elections.

klipt · on Nov 8, 2020

I saw predictit.org swing from pro Biden the morning of the election, to Trump in the afternoon, to Biden the next morning, so it seemed they were as confused as anyone.

They also didn't seem very aware that mail votes (leaning Biden) would likely be counted after in person votes (leaning Trump), which 538 had been predicting would cause a temporary pro-Trump lean for ages.

hackinthebochs · on Nov 8, 2020

If you roll that same die 10 times and you get a 6 each time, then we can say that your probability estimate was wrong. But that's what happened with the polling of these states. The fact that the estimates consistently overestimated the Biden vote points towards a bias in the model used to estimate likely voters. If this were just a matter of the margin of error at work, each pull at the lever (each state) should be randomly distributed around the estimate. But that wasn't the case.

whack · on Nov 10, 2020

> If you roll that same die 10 times and you get a 6 each time, then we can say that your probability estimate was wrong. But that's what happened with the polling of these states.

The 538 model is explicitly based on the assumption that the state-polling errors are correlated to one another to some extent. Ie, if Trump performs better-than-expected in OH, he will likely perform better-than-expected in FL as well. This is why they rated Trump's chances as being 0.1, not 0.1^8. Given how polling works, this is exactly the right assumption to make.

Hence my earlier comment that if you want to evaluate the accuracy of 538 or any other model, you need to evaluate it across numerous different elections/events, over an extended period of time. Not a single day of elections in a single country.

hackinthebochs · on Nov 11, 2020

This is kind of missing the point. The issue is the bias in the state-level polls. 538's model is to predict the winner of the election based on state polling data from other sources. The state level predictions are just averages of polls presumably weighted by quality. But this averaging cannot remove bias in the polls if that same bias is in many polls.

ashtonkem · on Nov 8, 2020

538 is a poll aggregator, not a pollster. If the polls are systemically wrong, which is what is being alleged above, there is <explicative> all that 538 can do to fix that.

systemvoltage · on Nov 8, 2020

Yes, I’ve been following 538 for a while now. They also grade each pollster.

The point is - 538’s input is a bunch of polls, their output is a prediction. Whether they do the polls or aggregate them is not relevant - they’re analysts whose job is to provide accurate estimates.

You can’t shift the blame on inaccurate underlying polls - Nate has time and again said, they look at many aspects in their estimates. Not just polls.

ashtonkem · on Nov 8, 2020

They look at many aspects at the beginning of the race. By the end these other aspects, the “fundamentals”, are purposefully dialed down to 0 and all that remains is an aggregate of polls. This is based on a theory that polls should be more accurate the closer you get to Election Day because voters have less time people have to change their minds. If those polls are systemically wrong, then there is nothing 538 can do to fix that; it’s a literal garbage in garbage out moment.

ece · on Nov 9, 2020

Looking at trends, which is easy to do on real clear politics, you could have seen the Senate was going to be extremely close, and the presidency, while not the blowout everyone expected thanks to 538, would still favor Biden. Here's one forecast that was arguably closer than 538: https://www.270towin.com/

The polls did tighten at the end, but you can't just look at a snapshot (even near the end) to account for the trend. In all of these states the polls narrowed in some, but the final results were within MoE:

FL: https://www.realclearpolitics.com/epolls/2020/president/fl/f...

PA: https://www.realclearpolitics.com/epolls/2020/president/pa/p...

NV: https://www.realclearpolitics.com/epolls/2020/president/nv/n...

AZ: https://www.realclearpolitics.com/epolls/2020/president/az/a...

GA: https://www.realclearpolitics.com/epolls/2020/president/ga/g...

NC: https://www.realclearpolitics.com/epolls/2020/president/nc/n...

In Iowa the Des Moines Register poll had Trump at +7, which is within MoE, the rest were garbage: https://www.realclearpolitics.com/epolls/2020/president/ia/i...

In Ohio, none of polls were within MoE, so all of the polls there were garbage: https://www.realclearpolitics.com/epolls/2020/president/oh/o...

For the Senate I think North Carolina was wrong everywhere I looked, but 75/25 for Democratic control at 538 was the most wrong I think: https://fivethirtyeight.com/features/final-2020-senate-forec... https://www.realclearpolitics.com/epolls/2020/senate/2020_el... https://www.270towin.com/2020-senate-election/consensus-2020...

You can go state by state[1] to see which polls were more/less accurate within MoE compared to the final result and the trends in each state. 538 had a better chance of Biden winning by 400+! EVs [2], than the 306 he's likely to win with. It's this distribution that could have been better at accounting for severe polling errors in some states.

538 did do CYA posts[3], but here while bringing forward the error from 2016 in Ohio seems right, it still projects a win of 335+ EVs. Optimistic for Biden is the kind way of saying what the final 538 projection were, severely more wrong than individual polls is more accurate. If their distribution in [2] was better, I would be more willing to give them a pass.

[1] click on the state: https://www.realclearpolitics.com/epolls/2020/president/2020...

[2] "Every outcome in our simulations" https://projects.fivethirtyeight.com/2020-election-forecast/

[3] "What if polls are as wrong as 2016? Biden still wins" https://fivethirtyeight.com/features/trump-can-still-win-but... https://fivethirtyeight.com/features/im-here-to-remind-you-t...

evgen · on Nov 7, 2020

They provided the probability distribution. The fact that you can’t handle math and need some sort of absolute certainty for a future event is not 538’s problem.

defen · on Nov 7, 2020

Here is a guy who is fairly good at math saying that the polls did in fact fail:

https://statmodeling.stat.columbia.edu/2020/11/04/dont-kid-y...

https://statmodeling.stat.columbia.edu/2020/11/07/what-would...

Karrot_Kream · on Nov 8, 2020

That's a bit strong for what Gelman said. I'm a big fan of Gelman (and learned from his books!), but he specifically mentioned that both Gelman et al's Model and 538's Model did indeed capture the outcomes in their probability distributions, but that to improve performance going forward it was much better to predict closer to the median than closer to the tails. (And funny enough, Gelman gave 538 some grief earlier on making a model with very wide tails.) This is a nuanced but very fair criticism, and taking a Twitter-style summary of it I think is overly reductionist.

evgen · on Nov 7, 2020

Ah yes. Mr 'let me tell you why Nate is wrong' Gelman, who is now Mr 'let me tell you why the fact that I missed bigger than Nate is not my fault and in fact is entirely the fault of these other people' Gelman. Forgive me if I find his excuses laughable, but I guess if it makes him feel better about himself we can humour him. He even manages to choke his first rant by missing once again on EV and vote percentages.

defen · on Nov 8, 2020

Whatever your complaint about Gelman is, I don't think "he can't handle math" is very convincing.

listenallyall · on Nov 7, 2020

it's not one single election -- it's consistent failure over multiple state elections, by large margins, all in the same direction -- which falls beyond any reasonable probability

roenxi · on Nov 7, 2020

I don't think much of evgen's unreasonable personal attack. But 538 isn't necessarily claiming that the per-state error will be normally distributed around their predictions.

I don't know the specifics of their model, but probably they are claiming "with these polls, the probability of this outcome is...". The polls being consistently biased doesn't tell us much about 538s model. They said Biden would almost surely win and despite a massive surprise in favour of Trump, Biden won.

And even if Biden lost, 10% upsets in presidential are expected to happen once every 10 elections like this one.

listenallyall · on Nov 7, 2020

> But 538 isn't necessarily claiming that the per-state error will be normally distributed around their predictions.

If per-state error isn't normally distributed, that's evidence of bias, or bad polling.

JamisonM · on Nov 8, 2020

> If per-state error isn't normally distributed, that's evidence of bias, or bad polling.

No!

Assuming the per-state error would be normally distributed in some neutral world is making huge assumptions about the nature of the electorate, polling, and the correlations of errors between states, you can't do that! You would specifically /not/ expect per-state error to be evenly distributed because the nature of the error would have similar impacts on similar populations and there are similar populations of people that live in different states in differing numbers.

You should review the literature about the nature of the (fairly small) polling misses that impacted the swing states and thus disproportionately the outcome in the 2016 election. You will probably find it interesting.

listenallyall · on Nov 8, 2020

Yes!

There are unavoidable, expected, sampling errors which are, by definition, random. That's why valid, trusted polls calculate a confidence interval instead of a single discrete result.

Other types of "errors" -- election results that repeatedly fall outside the confidence interval, or are consistently on only one side of the mean -- only arise when the poll is flawed for some reason. Maybe you relied on landlines only, maybe you spoke with too many men, or too many young people, asked bad questions, miscalculated "likely voter," whatever. Accurate, valid, trusted polls don't have these flaws, the ONLY errors are small, random, expected sampling errors.

dragonwriter · on Nov 8, 2020

> Accurate, valid, trusted polls don't have these flaw

Yes, they do. Because (among many other reasons) humans have a choice whether or not to respond, you can't do an ideal random sample subject to only sampling error for a poll. All polls have non-sampling error on top of sampling error, it is impossible not to.

listenallyall · on Nov 8, 2020

when polls don't match up with reality, as they didn't in 2016, the pollsters have a responsibility to re-calibrate the way they conduct the poll. Ask different questions, find new ways of obtaining respondents from all demographics, adjust raw data, etc. A professional pollster doesn't just get to say, hey, some people didn't want to talk to me ¯\_(ツ)_/¯

dragonwriter · on Nov 8, 2020

> when polls don't match up with reality, as they didn't in 2016, the pollsters have a responsibility to re-calibrate the way they conduct the poll.

Pollsters do that for continuously, and there were definite recalibrations in the wake of 2016.

OTOH, the conditions which produce non-sampling errors aren't static, and it's impossible to reliably even measure the aggregate of non-sampling error in any particular event (because sampling error exists, and while it's statistical distribution can be computed the actual error attributable to it in a by particular event can't be, so you never no how much actual error is due to non-sampling error much less any particular source of non-sampling error.)

JamisonM · on Nov 9, 2020

"There are unavoidable, expected, sampling errors which are, by definition, random."

This is false, if you think sampling errors are "by definition random" you don't understand polling.

Which is fine, just accept that you don't and dig into the literature or move on.

Karrot_Kream · on Nov 8, 2020

> That's why valid, trusted polls calculate a confidence interval instead of a single discrete result.

That is what each of these statistical models did, yes. And the actual outcomes fell into these confidence intervals.

> Other types of "errors" -- election results that repeatedly fall outside the confidence interval, or are consistently on only one side of the mean -- only arise when the poll is flawed for some reason.

Or the model was inaccurate. Perhaps the priors were too specific. Perhaps the data was missing, misrecorded, not tabulated properly, who knows. Again, the results fell within the CI of most models, the problem was simply that the result fell too close to the mean for most statisticians' comfort.

hackinthebochs · on Nov 8, 2020

>That is what each of these statistical models did, yes. And the actual outcomes fell into these confidence intervals.

The CI is due to sampling error, not model error. If the error of the estimate is due to sampling error, the estimate should be randomly distributed about true value. When the estimate is consistently biased in one direction, that's modelling error, which the CI does not capture.

Karrot_Kream · on Nov 11, 2020

> If the error of the estimate is due to sampling error

What does "estimate" mean here? Gelman's model is a Bayesian one, and 538 uses a Markov Chain model. In these instances, what would the "estimate" be? In a frequentist model, yes, you come up with an ML (or MAP or such) estimate, and if the ML estimate is incorrect, then there probably is an issue with the model, but neither of these models use a single estimate. Bayesian methods are all about modelling a posterior, and so the CI is "just" finding which parts of the posterior centered around the median contain the area of your CI.

I'm not saying that there isn't model error or sampling error or both. I'm just saying we don't know what caused it yet.

listenallyall · on Nov 8, 2020

Landed within the confidence interval? Are you kidding? CI is generally 2-4 points in these election polls.

- Texas: 538 said Trump +1.0, actually won by 6

- Ohio: Trump +0.4, won by 8

- Iowa: Trump +1.4, won by 8

- Florida: Trump -2.5, won by 3

- Penn.: Trump -4.6, lost by 0.5

- Nevada: Trump -4.9, lost by 2

- Wisconsin: Trump -7.9, lost by 0.6

https://en.wikipedia.org/wiki/Statewide_opinion_polling_for_...

https://www.nytimes.com/interactive/2020/11/03/us/elections/...

Karrot_Kream · on Nov 8, 2020

> Landed within the confidence interval? Are you kidding? CI is generally 2-4 points in these election polls.

The models and their data are public. The 538 model predicted an 80%CI of electoral votes for Biden as: 267-419, with the CI centered around 348.49 EVs. That means that Biden had an 80% chance of landing in the above confidence interval. Things seem to be shaking out to Biden winning with 297 EVs. Notice that this falls squarely within the CI of the model, but much further from the median of the CI than expected.

So yes, the results fell within the CI.

Drilling into Florida specifically (simply because I've been playing around with Florida's data), the 538 model predicts an 80%CI of Biden winning 47.55%-54.19% of the vote. Biden lost Florida, and received 47.8% of the vote. Again, note that this is on the left side of this CI but still within it. The 538 model was correct, the actual results just resided in its left tail.

listenallyall · on Nov 8, 2020

Dude, you're gaslighting by using the national results as evidence instead of the individual states, which is what this has always been about since my original comment. Nearly every consequential state fell at, or beyond, the tail end of 538's confidence interval (BTW, who uses 80%? and not 90-95%?), on the same side. A bit closer to the mean in AZ and GA but same side, over-estimating Biden's margin of victory. Deny it all you want, gaslight, cover your eyes, whatever -- but clear, convincing, overwhelming evidence of a systematic flaw or bias in the underlying polls is right there in front of you.

Many political handicappers had predicted that the Democrats would pick up three to 15 seats, growing their 232-to-197 majority

https://www.washingtonpost.com/politics/house-races/2020/11/...

Entering Election Day, forecasters projected Democrats would gain House seats and challenge for the Senate majority.

https://www.cnbc.com/2020/11/05/2020-election-results-democr...

Most nonpartisan handicappers had long since predicted that Democrats were very likely to win the majority on November 3. "Democrats remain the clear favorites to take back the Senate with just days to go until Election Day," wrote the Cook Political Report's Senate editor Jessica Taylor on October 29.

https://www.cnn.com/2020/11/04/politics/2020-election-senate...

Karrot_Kream · on Nov 8, 2020

> Nearly every consequential state fell at, or beyond, the tail end of 538's confidence interval

While I haven't checked each and every individual state, I'm pretty sure they all fell within the CI. Tail end yes, but within the CI.

> (BTW, who uses 80%? and not 90-95%?)

... The left edge of the 80% CI shows a Biden loss. The point was 538's model was not any more confident than that about a Biden win. So yeah, not the highest confidence.

> Deny it all you want, gaslight, cover your eyes, whatever -- but clear, convincing, overwhelming evidence of a systematic flaw or bias in the underlying polls is right there in front of you.

Posting a bunch of media articles doesn't prove anything. I'm not saying there isn't systemic bias here, but your argument is simply that you wanted the polls to be more accurate and you wanted the media to write better articles about uncertainty. There's no rigorous definition of "systemic bias" here that I can even try to prove through data, all you've done is post links. You seem to be more angry at the media coverage than the actual model, but that's not the same as the model being incorrect.

Anyway I think there's no more for us to gain here by talking. Personally, I never trust the media on anything even somewhat mathematical. They can't even get pop science right, how can they get something as important as an election statistical model correct.

Karrot_Kream · on Nov 8, 2020

Not necessarily. Errors, like outcomes, are not independently distributed in US elections. Politics are intertwined and expecting errors and votes to be independent on a state (or even county) basis is overly simplistic. This is also what makes modelling US elections so difficult.

listenallyall · on Nov 8, 2020

Sampling errors are random, and expected. Other types of misses are not simple "errors" but polling flaws, like sampling a non-representative group, ignoring non-responders or assuming they break the same as the responders, asking poorly-worded questions, etc.

Occasional flaws in polling is understandable and tolerated. But when those misses repeatedly line up the same way, and are rather sizeable, that's evidence of either systematic flaws, or outright bias.

Karrot_Kream · on Nov 8, 2020

I'm not sure what a "sampling error" is. To echo the sibling poster, per-state sentiment is not normally distributed. For example, we know Trump is more popular among white men than other demographics. This means that if we were to create a random variable that reflected the sentiment of white men throughout the US, we would (probably though I'd have to dig deeper into the data) presume to see a higher median vote count in this demographic. However, we cannot say that Trump's popularity in Massachusetts is independent from his popularity in New York, because his popularity in the white male demographic is the dependent variable between both random variables.

listenallyall · on Nov 8, 2020

> I'm not sure what a "sampling error" is.

Perhaps you shouldn't be commenting on polling then.

Karrot_Kream · on Nov 8, 2020

I was discussing in good faith, so I'm not sure why you chose to be snarky. Let's clarify here, I'm not sure what "sampling error" in this case would be, such that it is distinct from electoral trends at large. The random variables in question _are_ demographic groups. How is it meaningful to discuss sampling error if your assumption is that state and county data is independently distributed? The poll data that Gelman et al used is public data, I urge you to take a look and work with it.

roenxi · on Nov 7, 2020

Yep. But 538 does not run polls.

listenallyall · on Nov 8, 2020

The inputs it uses to spit out probabilities is known to be bad. Any scientist or researcher who claimed to get valid results from known bad inputs would be ridiculed.

Karrot_Kream · on Nov 8, 2020

To offer a concrete example here, survery respondents are often biased based on who actually sees and fills out a survey. A common technique used to overcome this non-representative sample is called post-stratification. There are, of course, limits to post stratification especially in instances of low sample sizes, but techniques to overcome issues with data are well known.

roenxi · on Nov 8, 2020

Science does not require an unbiased sample from a normal distribution to work. Bias is a technical term that the field of statistics is very comfortable working with. Scientists can also often get good results out of biased inputs.

538 has corrections for bias already. They seem to have worked in this instance - I repeat myself but: massive surprise, Biden still president.

alentist · on Nov 8, 2020

> 538 has corrections for bias already. They seem to have worked in this instance

See https://news.ycombinator.com/item?id=25017810

roenxi · on Nov 8, 2020

You are pointing at evidence that 538 correctly called 11/12 races using statistics, and their confident call on a Biden president withstood a 4-7% swing (!!).

The existence of bias doesn't invalidate their predictions. Everyone knows that polls can be badly off target in a biased way - that isn't a new phenomenon.

When they talk about X% chance of Y being president they should be optimising to the outcome, not the margins.

systemvoltage · on Nov 7, 2020

I am questioning - why even bother?

It's not like we do elections every month to test out their probability distribution against empirical data. The distribution collapses into a binary outcome at the end.

I have a dice. I claim the distribution is of equal outcome for each side. Well...we don't get to test the dice more than once. 1 sample size does not prove that the 538's predictions were right (or wrong).

Thanks for assuming I can't do math, no way to argue with someone but I am actually pretty bad at it. :-)

evgen · on Nov 7, 2020

Everyone is bad at probability and statistical distributions, not just you. The problem with modeling elections is that there are so few of them and the data is very noisy and until quite recently rather suspect. Let's not pretend that this was a normal election, either in the candidates running or in the manner in which the campaign and election was conducted.

As to the question of why bother, it is because bad polling is better than no polling at all. Campaigns are now multi billion dollar enterprises managing tens of thousands of temporary employees for the creation of a product that will only be sold once and in 18+ months from when they start the process. Any data is better than nothing.

The fact that the public has become obsessed with polls is probably due to the ongoing nationalization of politics.

paulddraper · on Nov 7, 2020

> I respect FiveThirtyEight and their work, and I do think they're intellectually honest and generally speaking good at their jobs.

But 538 chooses how to weight the polls. For example, they only gave Rasmussen (which was far closer on these) a C rating, preferring less accurate polls.

FWIW, their articles clearly lean left [1], IDK if that affects their analysis/forecasts.

[1] https://www.adfontesmedia.com/interactive-media-bias-chart-2... similar to WaPo

mewse · on Nov 8, 2020

Boy, that rear view mirror makes criticism a whole bunch easier, doesn’t it? :)

ianai · on Nov 8, 2020

I am reminded of all the times in the last four years (really my entire adult life) where during policy discussions of any magnitude the ultimate “kill” for an idea is “but it’s only popular with $small_number of the vote” or “but poll after poll shows it as a losing position.” It’s used as a cudgel to push out things that would actually make people feel their government is doing things with impunity. Donald Trump’s approval rating rarely if ever dipped below ~43% (with the usual and huge error bars) and consequently republicans reported they couldn’t go against him ever because of the polls. “Spending political capital” is a concept where a ruling party takes unpopular actions/enacts unpopular law because it’ll only push their approvals toward their long run averages.

I think this needs to be a serious topic of discussion. After this administration, “governing by polling firm” seems disingenuous at best and outright detrimental to all involved.

chkaloon · on Nov 8, 2020

538 does make adjustments based on what they think are biases in individual polls. So they can't blame it all on their sources. They've taken on some responsibility to evaluate those sources and adjust their model accordingly. The polls had significant bias, and 538 failed to fully adjust for it.