Hacker News new | past | comments | ask | show | jobs | submit login
How One 19-Year-Old Illinois Man Is Distorting National Polling Averages (nytimes.com)
241 points by ptrkrlsrd on Oct 12, 2016 | hide | past | favorite | 107 comments



Nate Silver at 538 talks about the whole dissecting polls deal, or "unskewing" them. All polls must make methodological choices, and all of those choices have advantages and disadvantages. Spending a lot of time trying to dissect those choices and passing judgment on them is not as productive as:

1. Looking at aggregates of lots of polls.

2. Looking at the variance that a poll captures from one iteration of it to another.

Or at least, so he claims. Obviously, he runs a poll aggregator, using a model that heavily weights the trendline of individual polls, so he has a dog in this fight.


> so he has a dog in this fight.

I don't understand this criticism. Usually when people bring this sort of thing up, they have a specific criticism in mind (e.g., Exxon Mobil on global warming): they have identified an opinion that is sufficiently, demonstrably, wrong, and then they go looking for a reason how someone might get it wrong, so one conclusion might be is "having a dog in the fight". But the logical chain here goes one way: first you demonstrate that they are wrong, then you ask why.

But what's wrong with someone working with polls in a professional sense, expressing an opinion on best practices in polling and statistics? After all, stuff that Silver writes about is quite consistent with stuff other people write about.

It should, at least in principle, be possible to address the relative merits of polling methodologies objectively, and people like Silver, and Andrew Gelman too (see his blog, in particular the xbox poststratification paper and the more recent differential nonresponse paper discussions), seem to be trying to do just that. But anyone professionally studying polls is going to have some kind of a dog in the fight, in the end. It seems hard to conclude anything based only on that.


The disclaimer is that they make money if people believe they are right, so may be inclined to cherrypick facts. So when we repeat Exxon Mobil's stance on global warming, we say that their stance might not represent all the facts. It's a disclaimer that this isn't my own opinion, and that I don't 100% trust it to be fair.


Nate Silver's reputation stakes on being accurate, so anything outside of a perfect result isn't great. This is different than Exxon Mobil's stance on global warming, where they make money when the truth about global warming was obfuscated.


Nate Silver is a recognized expert in an obviously highly complex area, so the grandparent comment is making an argument from authority. Given that, the disclaimer about that authority's possible conflict of interest is warranted. It's not a criticism but it is relevant in evaluating an authority's opinion.


You just criticized somebody for specialising in what they're talking about. Next you're going to tell me Hawking's biased about physics theories because he knows them and that's somehow a bad thing.


No, but if Hawking tells me that there are two theories, "a" and "b," that compete to explain a given phenomenon, and "a" is right and "b" is wrong, and "a" is a theory that Hawking himself came up with, then I'll note that.

Even if, as in this case, I think that despite the bias, Silver is right.


I think this can lead to the Wikipedia fallacy. If you reduce the credibility of information when you can perceive a means whereby the information could be compromised, you end up placing undue weight upon non-transparent information sources.

People don't trust Wikipedia because anyone could have written it yet place more trust in books that may just have a veneer of authenticity.


And the fix for either is peer review. Obviously, a forefront-expert in a given field may not have the luxury of peers, making peer review more difficult, but as sub-peers attempt to dissect the work, they learn, and the more that attempt to dissect, the more sub-peers are elevated to an extent that they can either debunk or give approval.


Isn't that just a perception problem? If a source isn't transparent about their methods and biases then the standard should be even higher for demonstrating truth because we have no choice but to assume the worst forms of bias.


Let's keep it simple; it wasn't Silver and 538 who did the piece on USC/Dornsife and why it was an outlier, it was Nate Cohn and the NYT. I think Silver and his org know more than 95% of the others about polling, but you'd think that this would be an obvious one for them to break since it is ostensibly their specialty.

Not that there's anything wrong with getting scooped. But it is an indication that his nose is too far in the meta-model and not in the real world. That's a class of mistake that can lead you to drive off a cliff.


> but you'd think that this would be an obvious one for them to break since it is ostensibly their specialty.

Silver's already touched on this poll and its unusual methodology:

http://fivethirtyeight.com/features/election-update-leave-th...

He didn't get scooped; explaining the technical details isn't his ostensible specialty, adjusting for them is.


Good point. Although I think Cohn's speciality is more valuable to me here. I came away from the Silver piece when I first read it with the problem merely being a fixed population of voters; the Cohn piece was much more illuminating since it showed the specifics of how that pre-condition has been affecting all kinds of aggregation methods. So Silver was the early warning, but Cohn had the nitty-gritty.

I'm just used to Silver providing that--they seem spread pretty thin. As you point out, you have to specialize in something, though...


And so what if he were to be scooped? No ones all knowing.


Cohn is the Times's data journalist geek. He's, frankly, just as good at this stuff as Silver is. They seem to like each other, even.

It's a good article. And yeah, 538 got scooped on this one. But it's not because they weren't paying attention, they just had other stories to write.


Well technically yeah. The old saying goes "an old theory isn't thrown over by convincing people, but because their proponents die of old age" for a reason.


I like Nate Silver. He broke new ground. But I've grown less enamored this most recent cycle because his predictions seem to be a lot more of the Nostradamus type--pre-analyze outcomes on three possible resolutions, then get credit for being right. Most of his wins come because polls have a tendency to tighten up to the actual numbers the closer they are to the election, as any latent bias gets washed out. He gets graded on more accurate information, not when uncertainty is high. (This isn't inherently a problem, but it does expose why he 100% missed the Trump primary phenomenon...he's trying to address it with "fundamentals" and other alternate models, but this is just more hedging so that at least one story makes sense.)

He's not deceptive about this--I will give him that. It's all in the open. But it seems to amplify a kind of narrative fallacy if these mistakes aren't revisited anew (rather than just saying "as predicted, since outcome C actually happened, it was due to voter bloc X doing it as we said."

But I can't complain about a pundit class not giving actionable data--that's not their job. Silver's job is to add data to the discussion, and to stimulate the discussion. And I give him a "B" for that.

That said, this cycle, I've rediscovered Sam Wang at http://election.princeton.edu/ and think he maybe gives a purer approach to analysis, one not so much driven by clicks. He doesn't seem to hedge as much as Silver.


Nate Silver's writing quality this cycle has definitely been worse than last cycle. IMO, it has quite a bit to do with running his own website, instead of working as a blog under somebody else. There is much more pressure for fivethirtyeight.com to produce frequent updates than there was for fivethirtyeight.blogs.nytimes.com. I get that a major justification for branching out was to provide more sports coverage and some sparser statistical coverage of other topics. But there just isn't enough daily news on the presidential polls to justify the article publication rate he's running this cycle.


He's written about this poll specifically: http://fivethirtyeight.com/features/election-update-leave-th...


His dog won pretty handily in the last election, though. It's not an exact science, but his methodology seemed to eliminate a fair amount of doubt from the equation.


Have you seen the comparisons to Princeton's Election Conesertium which did something similar but without the "special sauce"? He did a little worse, which is a bit of evidence saying Sam's more straight numbers are better than Nate's judgment.


or random error.


Definitely possible, that's why I said "a bit"


I think it will be really interesting to see what happens in election forecasting if Trump wins this election -- doubly so if there's no major scandal for Clinton between now and election day.


What happens with election forecasting will be the least of our collective problems.


If the polls stay as they are and the election goes the other way, it means that polling as practiced is somehow fundamentally broken, which would be very surprising.

If there's a gradual shift in the polls, I'm not sure what that would mean. Which scenario did you have in mind?


BRexit will likely want to have a word with you.

I have a feeling that polling is used now a political weapon not a measure of reality.


Nate Silver arguably lost a good deal of "prediction calibration points" by placing a very low chance on Trump getting the nomination in the first place.


Not at all. A low probabilty does not make the prediction invalid because of the result.


And wouldn't the same be true if Trump wins the election? After all, Nate Silver isn't predicting a 0% chance for Trump to win, I think last time I checked it was 10-15%.


Well, that's not what I said either.


>doubly so if there's no major scandal for Clinton between now and election day.

Unlikely, considering there's one every other day


Repeating the same thing every other day doesn't count.


That wasn't the feat people make it out to be. In US elections there are only a handful of states that could go either way.


It adds up pretty quickly. For 2008 and 2012 combined, he only got one state wrong. Say there are 10 swing states close to 50/50 chances, and all others are fixed. The chance of getting them all right, twice, are 0.5^(2*10) = 0.000000953674316.

(that's actually not the best method to evaluate him, because he provides estimates of chances and could be better evaluated with something like the https://en.wikipedia.org/wiki/Brier_score that checks for calibration as well, but it's more intuitive).


There aren't ten swing states. It's probably closer to half that. And they aren't 50/50 either. I got 49 right the last election just by looking at the public polls in the days leading up to the vote.


[flagged]


His name is Nate Silver. Don't be obnoxious.

538 didn't fail to read data properly re Trump's nomination. Trump was a new phenomenon with no history to build a model from. 538's mistake was guessing instead of admitting they had insufficient data -- which is bad behavior, but not because their preferred methodology is bad.


Bull. Their polling based model performed okay (although, at least through Super Tuesday when I looked at it, his polls-plus model didn't outperform his polls-only model, and neither significantly outperformed the RCP weighted average of polls). What failed during the primaries was Nate's "Party Decides" punditry.


LTCM failed too because they did not consider confidence intervals on its models.


The Princeton Election Consortium's model [1] has a better track record than 538. This year, it has been much less volatile than Silver's. It only measures state polls, and does so in a very transparent way.

[1] http://election.princeton.edu


Actually Nate Silver did criticize polls like this on his podcast. At least the choice of weighting based on who they voted for in the past election. Which is known to be very biased. 538 does grade the quality of polls and excludes/weights polls that have lots of problems.


Nate Silver has been criticized a lot by Mish (http://mishtalk.com/) and Mish was more often correct with his predictions than Nate. (What was Nate's prediction of Trump becoming the candidate? "Trump's chance of becoming the GOP nominee at 2 percent.")

Nate is strongly biased in favor of Hillary. That is fine but that is not scientific. Nassim Taleb said about Nate:

"55% "probability" for Trump then 20% 10 d later, not realizing their "probability" is too stochastic to be probability."

and

"So @FiveThirtyEight is showing us a textbook case on how to be totally clueless about probability yet make a business in it."

Proceed with caution.


538 didn't give any predictions on the primaries. Yes, they wrote articles saying that he had no chance, but those are separate from their statistical predictions.

And, regarding the volatility: the probabilities reported by 538 are closely linked to betting/prediction markets. Anyone with a better model could parlay into money quite easily.


"538 didn't give any predictions on the primaries."

They gave predictions for every state during the primaries. Also see: https://mishtalk.com/2016/05/19/nate-silvers-self-serving-co...


Something happening after being given a statistically low chance of happening doesn't automatically mean the statistical model was wrong.

If we are standing at a roulette table and I tell you there is a 2.7% chance the ball will land on 4 and it lands on 4, it doesn't mean my low % was incorrect.


One thing I noted is that NY Times says that most polls have four categories of age and five categories of education; except these aren't categories, they are ordinal variables.

Age and level of education are slightly co-variant (you don't get many 18 year olds who have a PhD). Because the age classification and education levels are ordinal you should use an ordinal smoothing [0] function to turn them into pseudo continuous variables. Given the continuous and co-variant independent variables (as well as other categorical independent variables) and a categorical dependent variable the best analysis is probably to use a quadratic discriminant analysis (QDA).

[0] http://epub.ub.uni-muenchen.de/2100/1/tr015.pdf and http://cran.r-project.org/web/packages/ordPens/ordPens.pdf


Side rant: what the heck did they implement on NYtimes.com ?! When I quickly click the text (which I compulsively do to select a paragraph etc.), it changes the font size (sigh)


I am a compulsive selector and I absolutely hate websites that try to add functionality to selections. No, I do not want to share this selection, website, especially when your JS code hangs for a moment as it loads a stupid twitter image.


Ugh, what were they thinking? I can't even imagine what this feature would be for!


I think it's some sort of idiotic tap-to-zoom, like for mobile/touch browsers? Since browser zooming is hit and miss, they probably thought they were oh-so-clever. As Raymond Chen says, someone probably got a raise for this.


Running NoScript (https://noscript.net/) in default deny all javascript I can click to select paragraphs without the text doing anything beyond becoming 'selected' text. (Note, I do not normally do this, I had to test to verify.)


This is really fascinating. I get why the poll creators made these decisions, but the results of the weighting lead to a ridiculous result compared to other polls. Supposedly this poll was extremely accurate in 2012, so who knows?


Then the poll creators made bad decisions. Even with transparency there's no reason why a poll can't be constructed specifically to heavily favor a particular candidate which it seems is just the case.

It's very humorous when Trump cites this poll considering what an outlier of civilized humanity he is. Matches nicely.


Interesting use of the word 'civilized' that I wasn't previously aware of.


I'm not sure I get why they bracketed the age group so small as 18–21 vs the broader census definition.


This is the first presidential election for those voters.


I don't get why the bracketed the age group at all. Surely you can just estimate the distribution?


You don't really need to estimate the distribution of ages; the electoral roll and the census will give you a distribution with an error that is several orders of magnitude small than the sampling error from the poll. For all intents and purposes the electoral roll and census data is so close to reality that you might as well declare it as such for calculations.


Well if the idea is to correct for the biased sampling of your poll you actually do need to estimate the distribution of ages in the people you've polled.

Dividing them in small groups and calculating the proportion of people in each isn't the best approach though.


You don't need to estimate the distribution of the ages of the people you have polled unless of course you don't know the ages of the people you have polled.

You have two distributions of ages here; both are known (within reasonable comparatively minute errors), no need to estimate. The distribution of ages of the electorate, strictly not known, but so so close to known compared to all the other estimates you have to deal with. And the distribution of the ages of your pollees, you simply ask them their age (estimating whether they are lying to you about their age is an infinitely deep philosophical hole you'll never get out of).


I'm not sure if he doesn't know his role, and I'm curious how tracking polls like this try to account for the large media attention paid to the poll and its methodology. This guy is known to stats nerds, and they've been tracking his moves and (rather mean-spiritedly) calling him "Carlton" for a while now.


Will the era of click-bait titles come to an end?


As soon as it stops working, it will stop being employed.


Sadly, when I first clicked on this, it had a much better headline. Something like "LA Times poll, how sample sizes make it an outlier".

Maybe clickbait++, or maybe the NY Times felt better about fingering a young black guy than a competing paper.


Can you come up with a better one? Because it's as fair a summary as I could write within that space, and the article was pretty much what I expected before I clicked.

I see no harm in headlines that create interest as long as they are not misleading.


The article has almost nothing to write about the "19-year-old Illinois man" but everything to do about the "U.S.C./LAT poll" and its characteristics. The poll chooses weights and as such this random person's decisions are given higher precedence.

The title should reflect the _poll_, not the man. HE himself is doing nothing. The title is misleading as I assumed the article would discuss grassroots activities by this man and how he's making great strides to influence polls. I would learn who he is, what he is doing, and how his actions have been effective. But, this is not the case. He has no idea and in fact this man is entirely irrelevant as an individual.


TLDR; The 19-year black man is very small demographic and would have small sample size. Apparently, the sample for LA Times poll includes and outlier who favors Trump which then gets weighted disproportionately to arrive at conclusion that trump is favored by young black voters.


It's not that the cell is small. It's that it has a single respondent representing it, so his weight is increased enough to be the equivalent of 30 respondents.


This makes me feel so much less confused about that poll


I'm posting under a throwaway account just to say I agree with you; if that says anything.


We detached this subthread from https://news.ycombinator.com/item?id=12696142 and marked it off-topic.


I'm posting under my own account to say I disagree. We're talking about data here. Whether or not the Times has any motivation behind publishing this article, the data and methods still speak for themselves.


why?


[flagged]


Please comment civilly and substantively on HN or not at all.


Let me see you implement observer, async io or something similar without it ...


Let's go find the really heavily weighted members of _all_ the polls and dox them too. This way we can screw them all up. Not just one that is influenced by a potential Trump voter.


This isn't doxxing.


'Spose not. I think it is an attempt to subvert the authenticity of the poll though.


The article does not question the authenticity of the poll. It complements the pollsters on their documentation and sharing of data:

It’s worth noting that this analysis is possible only because the poll is extremely and admirably transparent: It has published a data set and the documentation necessary to replicate the survey.

It does point out an aspect of the poll that may undermine its utility in accurately predicting the result of the election, but it's a pretty dry fact, the repeated inclusion of a heavily weighted voter.

If Trump takes 10-20% of the black vote, then the poll did a good job predicting the results. I don't expect that to happen.


The whole reason for being transparent about the polling methodologies is so we can analyze and critique them.


What exactly is wrong with questioning the "authenticity" of a poll based on its methodology?


A drastically skewed poll isn't very "authentic" in the first place.


Regarding the Times' decision to run this article, I wonder how much of it was based upon "hey, polling is kind of goofy" and how much of was "look! Here's another way we can show that Trump isn't really resonating with voters!".


Does it matter? The point is that they published an article that is factually accurate. If they were publishing an opinion piece based on such motivations there might be an argument here. But they aren't.


You're completely missing the point. Publications show bias all the time based on what factual information they report (or don't report). Look at CNN.com right. Literally nothing about the new wikileaks that came out today, even in a factual manner. Plenty of opinionated anti-Trump garbage.

This is what your parent comment was referring to.


Either that, or you are the one with bias, and the news outlets you go to and/or the people you associate with feed into that bias. You can see just as much bias in the opposite direction on any number of news sites. They're all biased in different ways, some knowingly and some unknowingly.

Individuals are usually even more imperfect than most news outlets when it comes to having limited sources of information and unexamined preconceived notions about things. No one is off the hook for being personally responsible for trying to understand all sides of a debate or being as educated as possible on all sides of an issue. I only say this because I see huge correlations between people complaining about bias in news outlets (regardless of political leanings) and people who don't see just as much bias in their own preferred media inputs.


"They're all biased in different ways, some knowingly and some unknowingly."

All are biased knowingly. We don't live in utopia, newspapers/media are paid enterprises.


The interesting thing here, of course, is how many big names in journalism are starting to question whether current (as opposed to, say, 50-60 years ago) standards of "neutral" reporting are really all that neutral, since they've begun to realize that the push to just dispassionately report every side of something led to mainstream media outlets becoming the largest provider of free advertising -- to none other than Trump -- in American political history.

To put it in context: there's a story I've heard once or twice about Walter Cronkite, during the JFK administration. JFK supported a bill to provide funding to the state of Alaska to construct mental health facilities, of which the state had (at the time) none, and a few of the more out-there Republicans who opposed him spun this into a conspiracy theory that JFK was secretly building Soviet-style gulags in Alaska, and would deport all his political enemies there once construction was finished (a sort of precursor to today's allegations the FEMA is a front for (insert Democratic president here)'s secret labor camps for his/her enemies). Cronkite supposedly refused to dignify the allegations with any kind of airtime, knowing that the mere act of mentioning the JFK-labor-camp stuff on a national news broadcast would implicitly legitimize it.


Considering the universe of facts is endless, it seems that it's impossible to report without "bias" then – you necessarily have to make decisions and there is no objective standard to judge stories by.

(and, btw, the leaks have three stories in the "top news" block on the top left, and an above-the-fold story in the politics section right now)


Less than 8 hours later and I don't see it. And 8 hours between my comment and yours. They show it less than a day, but Kim getting robbed has been front page for a week


It does matter. Popularity bias affects how people vote. NYTimes is a fairly liberal media source, and it's incredibly suspicious that this small case has been used to show "hey this bad guy is getting support due to this!" ... why didn't they publish this before?


NYT is not liberal. It is just left of extreme right wing. NYT was a leading cheerleaders for Bush's Iraq War


It's probably roughly in line with the centrist Democrat view (which is also generally pro-war), if you don't look too hard at the prominent columns given to people like David Brooks and Ross Douthat which would drag their aggregate editorializing to the right.

So in common American usage, not liberal. If you're a socialist though, "liberal" would accurately describe the whole Democratic party, who favor somewhat regulated capitalism and mild social liberalism.


Are you asking if at this point, what difference does it make?


I'm Canadian, but if I were in the US I would be voting for Hillary. That said, I actually agree with you. I think most members of the media are understandably anti-Trump, but that bias is definitely coming through in the reporting. In my opinion, that does their cause a disservice. Simply factually reporting on both candidates should be sufficiently anti-Trump given his many transgressions. But when you regularly see blown-up stories like the kicking a mother and baby out of a rally, told in a clearly biased manner, it leads people on the fence to discount everything the media says about Trump, which results in him getting away with far more blatant lying that he otherwise might.

There's nothing at all wrong with this article in a vacuum, but there does appear to be a bias in article selection. (Although I suppose that some of that must also be simply because 'Trump' sells newspapers (or clicks or what have you.))


I don't know if you've followed polling closely, but this particular poll (LAT/USC) is outside the envelope and it has attracted considerable attention because of this. See the lead sentence in: http://www.latimes.com/politics/la-na-pol-poll-faq-20161006-... .

Anyway, the story turns out to be a great example about survey design tradeoffs.


Oh, I'm certainly not disputing the NYT article itself. But as with all political coverage this year, I'm highly suspect of timing and motivation.

This post had an additional benefit: I've been able to observed the votes for my comments go up and down like a roller coaster for the past hour, as the readers of HN apply their own biases. Rather interesting, especially as the number of actual responses is relatively small.

What can we learn about HN's user base from the ultimate comment score, I wonder?


Probably that they dislike politically motivated conspiracy theories when there is an obvious, non conspiracy based explanation, e.g. the one you just replied to.


Isn't this the point of having a discussion site, however: to discuss issues rather than simply downvote them to oblivion?

Note my first post. I intentionally left it open-ended to spark discussion. In an election cycle where online propaganda and sockpuppetry have been confirmed, I don't think my question is in the same league as those posed by flat-earthers or 9/11 conspiracy theorists.


You wasted my time reading a comment that is meaningless and highly politically slanted. I want to save that time for future readers.


Trump had a really bad couple of weeks and all the polls showed a big Clinton lead except this one. It just tipped over. The timing seems perfectly reasonable.

As for the motivation, as a purveyor of facts I have to imagine that the NYT is frustrated at Trump's complete disregard for them, and his disregard of polling is a big part of that.


Of the NYT wanted to swing the election, they might have publishedthis article before Trump imploded at the debates.


Hacker news has been left wing hack news for a long time. I'm still waiting for them to downvote me out of existence. It's taking too long. :)


If there was a similar situation for Clinton I could see see Breitbart or Fox talking about it as proof of some Democrat conspiracy. There definitely are biases but I think this story would get traction either way.


[flagged]


Please don't comment like this on Hacker News.


What makes Hacker News a bastion of professionalism and gravity?


Karma, culture, governance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: