Hacker News new | past | comments | ask | show | jobs | submit login
Was Nate Silver the Most Accurate 2012 Election Pundit? (appliedrationality.org)
97 points by kf on Nov 9, 2012 | hide | past | favorite | 59 comments



Thanks. I've been trying to find a good way to understand the accuracy of Silver's predictions.

What's a fair benchmark? This article offers up a "coin flip" for each state, computing that such a coin flip would have a Brier score of 0.25. (The Brier score is a mean-squared error between outcome (1 or 0) and the percent certainty of the prediction in that outcome. If a coin flip is the model, each state's result of 1 or 0 would be in error by 0.5. The mean squared error would be 1/51 * 0.25 * 51 = 0.25.)

But... that seems like too generous a benchmark. Take the simple model: "assume 100% likelihood that state X will vote for the same party as it did in 2008." That guarantees that deeply red or blue states will vote the same way, so it takes the non-battlegrounds out of the equation.

With this model, there would only have been 2/51 errors. This simple lazy model achieves a Brier score of 0.039, beating Intrade and the poll average computed in this article quite badly.

After working through this, I'm still impressed by Silver and the other quant predictions. But I'm more concerned about media that rely too much on reporting a single polls result as "news" rather than as part of a larger tapestry.

Then again, it's the maligned media polls that are the raw input to Silver and the other models. Unless the media keeps funding the polls, the quality of these more advanced models will suffer.


Thanks for the suggestion. I've added your benchmark suggestion (along with a bunch of other fixes and new data like a 2008 RMSE benchmark along the lines of your Brier suggestion) into the R document: https://docs.google.com/document/d/1Rnmx8UZAe25YdxkVQbIVwBI0...

(I don't know when the new numbers will go live on the blog; Luke handles that.)


Note how NPR is one of the most right-biased in this result. It's pretty evident from years of listening that the NPR staff generally are progressives, and would left. So I think this result exemplifies how genuinely 'fair and balanced' NPR really is.


Throughout all the criticism from pundits of Nate Silver et al. as a liberal/commie/whatever for predicting that Obama would win, I couldn't help but wonder if there was even any merit to the idea that an Obama supporter would want to say that Obama is going to win the election. I would think predicting that Romney has a 10% chance of winning (making it tough but a distinct possibility he would win) as Nate Silver did would lead to some of the BEST possible voter turnout for Romney (making it the worst possible prediction strategically, if you want Obama elected). On the other hand, a prediction that Romney will win in a landslide would make many Romney supporters stay home.

Is there actually evidence that higher poll numbers in favor of X lead to higher voter turnout from supporters of X? It seems like everyone takes that for granted but I've never seen any evidence that it's true.


>> Is there actually evidence that higher poll numbers in favor of X lead to higher voter turnout from supporters of X? It seems like everyone takes that for granted but I've never seen any evidence that it's true.

I'd like to see some raw evidence, too.

But since we're forced to speculate at the moment, my bet would be that the relation to voter turnout and a candidate's chances of winning isn't linear. It's probably more of a parabola: the more extreme a candidate's chances of being elected (or being defeated), the less likely anyone is to come out and vote, because it feels impossible to make a difference. On the flip side, I'd suspect that the closer and more contested a race is, the more likely people are to feel obligated to vote.


You're absolutely right - I agree that voter turnout "should" be highest at exactly "50% chance my guy wins" and diminish from there in either direction. I think I just got a bit too caught up in what I was arguing when I mentioned that a 10% Romney prediction should be the best one for Romney supporter turnout; a 90% prediction should probably be just as good as a 10% prediction in terms of making you likely to vote.

Under this theory, polls/predictions can't actually skew the result in either direction [1], they can only increase or decrease the total turnout (highest turnout when polls say 50%, lowest turnout when they say 0%/100%). In reality this probably isn't actually true, but we can't say for sure whether or not polls do affect election outcomes, and in which direction, unless there's empirical evidence. I don't even have a guess for which direction it would go (whether you want your supporters to be "concerned" or "optimistic") - I could see either being true.

[1] Edit: I should note that this assumes that each poll/prediction is listened to and taken seriously equally by both sides of the electorate, which almost certainly isn't true. Which actually brings up something kind of interesting - it could be that with all of its "Romney will win in a landslide" talk, Fox News actually hurt Romney's chances because most Fox News viewers are Romney supporters (but they could have done the same amount of damage to Romney's chances by saying "Obama will win in a landslide" - the best way for them to help Romney might be to say that the election is exactly tied). And I suppose it would also lead to a justification for the idea that Nate Silver has a liberal bias, using his relatively "wishy-washy" predictions to energize his mostly young and liberal audience to get out and vote. It seems that I've gone too far with this and started arguing against myself...


There is no guarantee that the effect of an almost certain victory/loss is identical for all parties. On the contrary, there is some data that indicates that supporters of some parties are more motivated to vote. See for example http://www.ncbi.nlm.nih.gov/pubmed/22065127, which shows that weather conditions can affect election results. I cannot find data on it, but I think it is not inconceivable this extends to 'going to the polling station even if it does not make a difference to the result'

However, in a winner takes all election, the effect would have to be huge. Let's say all polls indicate a 51%-49% result. Then, at least 4% of the winning party's voters would have to stay home 'because they already won' to change the result (and that assumes none of the other voters stay home 'because they already lost'). At a more realistic 60%-40% poll prediction, one in three voters would have to stay home.


If you want to go full on conspiracy mode, you could argue that Fox news' owners wanted Obama to win due to the belief that 4 more years of being able to lament the horrors of a having a Communist Muslim in the White House would be much better for their ratings. And that is why they pushed the Romney by a landslide thing so hard.


Just blowing smoke rings here, but it also could be possible that "undecideds" might break towards a perceived winner. If they aren't committed, they may just pick the winning side.

I have no evidence for this whatsoever.


It's worth adding that Nate is on the record as saying he'd have voted for either Romney or Gary Johnson, if he would have voted.

http://www.mentalfloss.com/blogs/archives/150042


Okay, I found the reference someone cited as "Nate Silver openly rooting for Obama". It's from March 2008, before he joined the NYTimes:

http://www.fivethirtyeight.com/2008_02_24_archive.html


I'm skeptical that that statement should be taken at face value. Nate Silver isn't the smoothest in in-person interviews.


That's weird - I thought I read that he was personally pulling for Obama somewhere on 538. I'll go look for that.


Here he says: "I would describe myself as being somewhere between a liberal and a Libertarian."

http://m.npr.org/news/Books/162594751


Not just for voter turnout. Higher poll numbers are important for raising $$. People (especially those with large bank balances) want to back a winner.


Well Romney supporters were certainly predicting a Romney landslide, so it's certainly plausible that Obama supporters would predict an Obama landslide. That's independent of whether or not that's a good idea in the game theory sense, of course.


You're definitely right, and it really didn't make sense to me that they were doing that. I guess it's probably just a human nature thing more than anything else.

I think my point still stands though: this doesn't showcase NPR's neutrality - perhaps they are in fact biased toward Obama and just lie more strategically than most Republican pundits. (Certainly not accusing them of that, only saying that I don't think we can glean much from the fact that they gave Romney higher numbers than he deserves.)


I don't think a prediction of a Romney loss would significantly spur Romney voters. As one example of many in my life, a co-worker voted for Romney because Obama "tanked his 401K". These are not people who (on average) read real news, IMO. They wouldn't see the prediction.


It seems like most of the polling organizations that did the worst during this election did poorly because they were way off-base with their likely voter model. No one in the press seemed to notice the scale of the turn-out machine Obama constructed since 2010. Many organizations (including, apparently, Romney's internal pollsters) assumed a turnout somewhere between '08 and '10 levels, when the Democrats managed to match or beat 2008 turnout in critical areas.

I'm not sure it's really a question of ideological bias so much as filtering the raw data through a backward-looking model.


I think it is a huge mistake to equate "polls that tend to be biased towards Candidate A" with "this institution favors Candidate A's policies".

The former can occur simply because of the assumptions that your model makes. Bias in "statistical bias" does not mean the same thing as "political bias".


I would like to see them add in David Rothschild at Yahoo[1], who's an expert in scoring rules and prediction markets and whose February (!) predictions were almost exactly on the money.

[1] http://news.yahoo.com/blogs/signal/


He works at Microsoft[1]. I have been following his predictions on PredictWise.com for the last four months.

[1] http://www.linkedin.com/pub/david-rothschild/12/651/681


Just to clarify, he recently moved from Yahoo Research to Microsoft Research when the former imploded. Same with David Pennock and others. The Yahoo folks in New York mostly switched to MSR (founding MSR-NYC) and the folks in California mostly switched to Google.

(I used to be part of that research group but I left at the end of 2010 to start Beeminder.)


This compares top lines. I think a comparison of turnout model accuracy would be more informative. Most of the models that erred predicted that the 2012 turn out would lean less Democratic than the 2008 turn out model, based on the 2010 mid-term turnouts and a (mis)perceived dampening of enthusiasm among Democrats and increased enthusiasm among Republicans. Based on exit polling, there was a drop off of 7 million white voters, and I don't think anyone who predicted that.


The reasons for the low turnout will be interesting to hear. Nationally, nearly 3 million fewer votes for Romney in '12 than McCain in '08, and around 10 million fewer votes for Obama. Polls showed the election to be fairly close, so that would not tend to explain either "giving up" or "overconfidence" as a reason person didn't vote.


Still a bit early to tell how many people voted in total. Probably need to wait a week or two for the final result.

Actually, that makes this whole exercise premature, the rankings may change once the final data is in.


Does anybody know more about YouGov's methodology? On the face of it, I'm suspicious of their very low margin of error which seems substantially better than any other poll out there, but you can't deny that their polling was accurate.

Another thing that looks odd on that graph: the given polling numbers from Washington Times/Politico/Monmouth/Newsmax/Gravis/Fox/CNN/ARG all look identical despite their differing margins of error (which suggests their source data is different). What's going on there?


Here’s a good article from the YouGov site that describes their methodology: http://today.yougov.com/news/2012/10/23/obama-stays-ahead-ju...


Thanks for the link. In short, it seems like only poll people they can confirm are actually registered to vote based on the fact that:

"According to US census data, just 71% of eligible Americans are registered to vote. In 2008, almost 90% of those who were registered did vote. So in any poll, it is vital to know which respondents are on the register."

But there're a few more interesting subtleties in there too, so it's worth a read.


YouGov does online polling, which allows them to get very large sample sizes for national polls. I believe they had on the order of 36,000 for their final poll. Compare that to others with just ~1000.


> Another thing that looks odd on that graph: the given polling numbers from Washington Times/Politico/Monmouth/Newsmax/Gravis/Fox/CNN/ARG all look identical despite their differing margins of error (which suggests their source data is different). What's going on there?

I dunno. IIRC, Drew Linzer of Votomatic even worried on his blog that the polling numbers were too close and that pollers might be fudging their numbers to be more similar to each other (which would lead to substantial overconfidence in estimates). Still, the final results seem pretty accurate, so...


Article doesn't mention Sam Wang. His confidence level for Obama winning was at 99%

http://election.princeton.edu/


He mentions in the comments that Wang's website doesn't appear to give the raw numbers he needs to analyze its accuracy.


Right. Wang seems to now have released raw numbers but did so after I finished my numbers and gave OP the go-ahead. I plan to add Wang and some other stuff today.


Update: Wang's Presidential states & shares have been added, but not his Senate races.


Wang's Senate races are now incorporated; they make his performance substantially more impressive.


Oooh, I wonder what "he" might choose for a username on a social media site!


I can guarantee, with a near 100% confidence rating, that I am not Sam Wang. Merely coincidental that we have the same last name and first initial. I mean there has to be at least 100 million people in the world with the surname Wang so the odds that we are the same person are pretty slim!


"I mean there has to be at least 100 million people in the world with the surname Wang so the odds that we are the same person are pretty slim!"

Except this seems to ignore that P(comment about Sam Wang | I am Sam Wang) is much higher than P(comment about Sam Wang | I am not Sam Wang) :-)


A reply below says that the username is a coincidence. I'm inclined to believe it. Why use your real name to troll HN over a pretty obscure topic that's not even related to the site's userbase?


Are you related?


Nope.


Actually the most accurate 2012 election pundit was Drew Linzer (http://votamatic.org/). Provided Florida goes Obama's way, he correctly predicted the electoral college - Obama 332, Romney 206.


Did you read the article? Drew Linzer is one of the specific people it rates Nate Silver against.


Didn't Nate Silver say that was the most likely split? There was a post about it on HN for like the past 2 days.

Edit: The confusion is probably that Silver was reporting the average electoral split as his prediction, when the mode is more important in what you're talking about. His average was almost never a number that was actually possible, since he was quoting them to the nearest 1/10th of a vote, so its kind of unfair to punish him for not getting it exactly right.


You're right, I didn't realize Nate Silver was using an average. Also after reading the article, it looks like not just predictions for the 2012 presidential electoral vote, but other votes such as the Senate races are also being compared.


On his Electoral Vote histogram, he'd had 332 as a big spike for weeks. The day before the election it was over 20% for that one outcome.


It's worth noting that if you used the 2008 results as your 2012 prediction, you would have gotten 49/51 on this scale.


Yeah, but if you simply predicted 2008, you'd get a mediocre Brier score on your state victory predictions (because it would punish you for getting 49 compared to everyone who got 50 or 51), and the RMSE is even worse: the margins were different and the electoral vote & popular vote very different from 2008.


Slate Magazine found two other pundits which were as accurate as Silver.

http://www.slate.com/articles/news_and_politics/politics/201...


Josh Putnam is considered in the article, which looks at a finer level of detail than just correctly predicting the states.


Great article, but I disagree with the colouring on the first graph: if reality was within the poll's margin of error, I don't think it should be coloured, because that implies a bias that (probably) isn't actually there.


Isn't YouGov the XBox pollster? They're in the top group.


Also interesting to their their margin of error is significantly lower than any of the other polling organizations. From what I gather on Wikipedia, they are internet-centric although I'm not sure if that includes XBox.


Is it possible to be more accurate than nailing all fifty states?


Well, maybe yes. One strategy for predicting the outcomes would be to paint a quarter red on one side and blue on the other, and assigning the outcome of that state to the outcome of the coin flip. That probably wouldn't be a very effective strategy, but it's possible to still nail all 50 states (actually, 51 in this case). Silver's method is much much more accurate than the coin toss method.

Silver takes all the polls, even the crappy ones, and includes them in his calculation. If you're a bayesian you'd find this comforting because all of the evidence is included in the belief. There might be some handwringing about how important each poll really is. If you're not a bayesian, then you have some other weird strategy that might or might not work.

So really, the predictions are nice, but what we're after is a system that produces good predictions. It's not clear that Mr. Silver's is the best. Perhaps there is some horrible flaw an evil agent could exploit that just didn't get tickled this election. It's tough to say.


> If you're a bayesian you'd find this comforting because all of the evidence is included in the belief.

Off-topic, but this is my beef with the "Bayesians" - it's not all of the evidence; it's completely absurd to believe that all evidence can ever be accounted for, when considering anything.


"More of the evidence" seems like pretty much the same thing...


YOU DIDN'T READ THE ARTICLE DID YOU




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: