SCOT goes POP!: From Brexit to Trump : can we trust the polls anymore?

Tuesday, November 8, 2016

From Brexit to Trump : can we trust the polls anymore?

I think it's probably fair to say that it's been a great many decades since US media coverage of an American election has referenced British politics so frequently. That's a sign of how Brexit matters internationally in a way that routine changes of government don't, but it's also a sign of Trump supporters casting around for glimmers of hope wherever they can be found. The theory is that support for Trump is vaguely similar in character to support for Brexit, and that if the latter was underestimated by the polls, there's no reason to think the former isn't being underestimated as well.

From this side of the Atlantic, there have been two reasons commonly cited for why that is probably wishful thinking. The first is that we knew in the days prior to the EU referendum that the postal votes that had already been cast were painting a different picture from the opinion polls, and that Remain had a significant deficit to overcome on polling day itself. On the whole, the opposite seems to be happening in the US at the moment, with the early voting data tending to look more promising for Clinton than for Trump. The second reason is that, supposedly, the opinion polls in the EU referendum were nowhere near as inaccurate as portrayed.

The first reason makes perfect sense to me, but I have to say I think the second one is pushing it a bit. This goes back to the meat of a rather unpleasant (and now largely deleted) argument I had with a New Statesman journalist and a few others on Twitter in August, on the topic of "can we ever trust the polls again?". Immediately after that exchange, I had been planning to write a blogpost setting out my thoughts, but I decided against it because the whole thing had become too heated. However, this may be a good moment to make some of the points I had been planning to make that night (but leaving aside the personalities involved, obviously).

* First of all, it really must be understood that the standard 3% margin of error in individual opinion polls does not provide any sort of alibi for the polling failure in June. If the methodology used across the industry is basically correct, the error on the polling average should be considerably lower than 3%. For example, if one campaign is actually on 44%, you would expect just as many polls to have that campaign one, two or three points below 44% as have them one, two or three points above 44%. The underestimates would balance out the overestimates, and you would end up with an average that is pretty close to being bang on the money. So it's a form of sophistry to look at the string of late polls that overestimated the Remain vote, and claim that the ones that fell within the margin of error (or came close to doing so) were all technically "accurate".

Regular readers of this blog will remember that I had been completely open to the strong possibility of a Leave victory throughout the referendum campaign, but when the last polling numbers came in on 23rd June, I finally threw my hands up in the air, and said that if the polls were right, there was clearly going to be a Remain victory of some sort. My exact words were -

"Leave can only really win now if there's been some kind of systemic problem with the public polls - although that's scarcely unheard of."

I entirely stand by that summary, and exactly the same is true of the situation in the US right now. Donald Trump still has a chance, but that categorically isn't because he's "within the margin of error". He may be within the margin of error in individual polls, but if he was really tied with Clinton, and if the polls were getting it right to within the margin of error, there ought to be as many polls putting him three or four points ahead as there are putting him three or four points behind. Self-evidently, that isn't the case. The reason he still has a chance is because it's fairly common - as our own referendum demonstrates - for polls to be misleading due to factors that are not taken into account by the standard margin of error. That 3% wiggle-room only allows for normal sampling variation, and basically assumes that the underlying methodology is otherwise going to be perfect - which is pretty optimistic in this day and age.

If Trump wins, or if Clinton wins much bigger than we expect her to, it'll be because the polls were wrong, just as they were on Brexit. Not necessarily wrong by all that much, or by a historically unprecedented amount, but certainly wrong in a way that the margin of error can't account for. (Although polling firms will doubtless attempt to make that excuse by cherry-picking individual polls.)

* It's been suggested a number of times that the EU referendum polls were much more accurate than supposed, because people tend to only look at the last batch of polls, and ignore the ones earlier in June that were more favourable for Leave. That's plainly a load of nonsense, because the reason why the later polls moved towards Remain is remarkably simple - there was almost certainly a genuine swing towards Remain as polling day approached.

The word "accurate" is a bit slippery when used in relation to opinion polls, because strictly speaking, and with the obvious exception of exit polls, all polls are snapshots of public opinion rather than predictions of election results. A poll can be an accurate snapshot even if it differs markedly from the final outcome. Nevertheless, if "accurate" is used to mean closeness to the final result, it's perfectly reasonable to say that later polls should be more "accurate" than earlier ones, because the closer you get to election day, the more people have made up their minds. Therefore, the fact that the EU polls got progressively less "accurate" towards the death of the campaign makes it worse for the polling industry, not better. It strongly implies that there was a significant in-built error all along. When Leave appeared to be slightly behind, they were actually slightly ahead. When Leave appeared to be slightly ahead, they actually had a decent cushion. And so on.

* One of the apparent saving graces for the polling industry in June was that, against all expectations, online polls proved to be somewhat more accurate than telephone polls. Nevertheless, the performance of the online polls was significantly tarnished by a Populus poll published on referendum day that was absolutely miles out from reality - it gave Remain a 55% to 45% lead. It was suggested to me that somehow that poll doesn't really count, because it was the only published Populus voting intention poll of the entire campaign, and is therefore difficult to put into proper context. I must say I can't make head nor tail of that line of argument. We know that Populus had been conducting extensive private polls throughout the campaign, meaning they'd had as much opportunity as any other firm to hone their techniques. It may well be that a 55%-45% lead was an outlier from their normal results, but it shouldn't have happened at all if their methodology had been essentially sound. (Even the occasional 'rogue poll' that statistically will happen one time in every twenty shouldn't really be out by as much as 7%.)

So, yes, that Populus poll does deserve to be treated as an online poll like any other, and the fact that it was one of the final polls of the campaign (when it should have been more accurate, not less) does detract from the notion that online polls in general performed tolerably well.

* * *

To return to the original question, I think the simplest way of putting it is this. If you want polls to be as accurate as the industry claim them to be, then you can't and shouldn't trust them, because recent history suggests you'll often (but not always) be disappointed. If, however, you just want a ball-park sense of public opinion that is more reliable than, say, Neil Lovatt's beloved betting and financial markets, then yes, polls are still a very useful tool, and the outcome in June bears that truth out. It really just depends on how demanding your own expectations are.

22 comments:

BillfromBostonNovember 8, 2016 at 8:00 AM
O.k. Polling, why i came here long ago before Eurovision coverage got me snagged..nice post by the way!...there are a lot of problems now in the U.s. on polling. first is the staggering tolerance of undecided voters. saying 45 to 41 two days out is ridiculous. these abound. State polls lag national polls yet they refuse to perform state polls in concert so they can be compared. the tracking polls suck and almost no one does them, so we have to piece together trends from non tracking ones. i have never seen a scientific justification for this and i have never seen it work. the proliferation of pollsters that no one even knows that have no track record or a poor one at that or that used to poll in one region that now poll another ( like the colleges- a boston school polling Alaska?) plus the pollsters that just publish trash to effect the race and then mysteriously publish a last poll the day before right in the middle of the polling average ( can you say Rasmussan?)so they then say they are accurate. to be fair, polling systems were set up and more accurate when we had real ethnic, religious, class boundaries and people actually voted the way the newspapers and endorsing Politicians said. predicting how a fourth generation american whose father was from boston and Mother from pennsylvania but grew up in new jersey and went to college in North Carolina and lives in florida is a lot harder.I think you have the same problem in the UK, but a bit less. there is also a lack of common sense: do you really think 120 thousand latinos registered for the first time in nevada and voted early waiting in line for 7 hours and 28% of them did so to vote for Trump?I mean: really? which gets us to over sampling which is now almost never done and has a bad name and was always done before- extra calls to a sub set to confirm result. i remember calling 120 extra people in a race because we had an italian getting only 45 % of the italian vote. that overall poll didn't walk out the door until we were sure! Turns out they new him and didnt like him! somebody should be over sampling former labor voters going tory now to find out how and why they are doing it.finally the big ones: 1) the tolerance by uninformed commentators of crap, the news media etc. as james has pointed out many times allows bad form to persist. One major US firm found that changing ONE word that was now interpreted differently than it used to be was causing havoc. 2) the fact that you can make 5 million to ten million polling for one campaign, so the best pollsters and polls go there and the public polls are paid for by people with an agenda or for the need to make a splash, thus they are timed not scientifically but by a media outlet. what day you call makes a difference: if you call the Midwest on a weeknight you miss all the working class bowlers- this used to be huge. Call Friday night when every town plays high school football and you will have NO parents in the 35-50 age group. Thursday night they are all home making sure their kids have their homework done, but the millennial singles are all out partying already. one last thing, polling accuracy was predicated on people in a household ( large families )answering the phone and being Happy to answer lots of questions, now people answer trying to make a point and often to game the system. one recent poll had 28% saying they weren't white, black, african, asian or latino/hispanic. ReallY?
ReplyDelete
Replies
waterNovember 8, 2016 at 9:44 AM
O.k. Polling, why i came here long ago before Eurovision coverage got me snagged..nice post by the way!...there are a lot of problems now in the U.s. on polling.

first is the staggering tolerance of undecided voters. saying 45 to 41 two days out is ridiculous. these abound.

State polls lag national polls yet they refuse to perform state polls in concert so they can be compared. the tracking polls suck and almost no one does them, so we have to piece together trends from non tracking ones.

i have never seen a scientific justification for this and i have never seen it work. the proliferation of pollsters that no one even knows that have no track record or a poor one at that or that used to poll in one region that now poll another ( like the colleges- a boston school polling Alaska?) plus the pollsters that just publish trash to effect the race and then mysteriously publish a last poll the day before right in the middle of the polling average ( can you say Rasmussan?)

so they then say they are accurate. to be fair, polling systems were set up and more accurate when we had real ethnic, religious, class boundaries and people actually voted the way the newspapers and endorsing Politicians said.

predicting how a fourth generation american whose father was from boston and Mother from pennsylvania but grew up in new jersey and went to college in North Carolina and lives in florida is a lot harder.I think you have the same problem in the UK, but a bit less.

there is also a lack of common sense: do you really think 120 thousand latinos registered for the first time in nevada and voted early waiting in line for 7 hours and 28% of them did so to vote for Trump?I mean: really?

which gets us to over sampling which is now almost never done and has a bad name and was always done before- extra calls to a sub set to confirm result.

i remember calling 120 extra people in a race because we had an italian getting only 45 % of the italian vote. that overall poll didn't walk out the door until we were sure!

Turns out they new him and didnt like him! somebody should be over sampling former labor voters going tory now to find out how and why they are doing it.

finally the big ones:

1) the tolerance by uninformed commentators of crap, the news media etc. as james has pointed out many times allows bad form to persist. One major US firm found that changing ONE word that was now interpreted differently than it used to be was causing havoc.

2) the fact that you can make 5 million to ten million polling for one campaign, so the best pollsters and polls go there and the public polls are paid for by people with an agenda or for the need to make a splash, thus they are timed not scientifically but by a media outlet.

what day you call makes a difference: if you call the Midwest on a weeknight you miss all the working class bowlers- this used to be huge. Call Friday night when every town plays high school football and you will have NO parents in the 35-50 age group. Thursday night they are all home making sure their kids have their homework done, but the millennial singles are all out partying already. one last thing, polling accuracy was predicated on people in a household ( large families )answering the phone and being Happy to answer lots of questions, now people answer trying to make a point and often to game the system. one recent poll had 28% saying they weren't white, black, african, asian or latino/hispanic. ReallY?
ReplyDelete
Replies
MarciaNovember 8, 2016 at 9:56 AM
As most of the final National polls are saying the same thing to use the American term, 'Leans Clinton'.
ReplyDelete
Replies
bjsalbaNovember 8, 2016 at 10:37 AM
I have to say that the amount of absolute drivel that comes out in the media on the Presidential election and the polling is mindboggling.

Their hyper-excited discussion bores me stiff and that includes the ecstatic glee as they slaver over the most miniscule blip in the polls.

I don't believe any poll unless I know who did it and for whom it was done. I credit James and SCOT goes POP for my education in this field.
ReplyDelete
Replies
Alan WeirNovember 8, 2016 at 2:31 PM
James: I think it is a bigger puzzle why polls are accurate at all, than why they are inaccurate. And that talk of 'margins of error' is seriously misleading, as is the use of confidence intervals, though the polling companies sometimes issue a vague caution that confidence intervals are not probabilities.

To explain: it's clear that you can't go from 'if someone is Latvian it's 95% likely they are white' to 'if someone is white they are 95% likely to be Latvian.' It seems to me that a similar fallacy is at play in some of the interpretations people put on opinion poll methodology. The reason the polls nonetheless work is just ordinary trial and error: they fine tune their weighting and sampling procedures to try to get a mix of parameters they think are relevant to voting intentions in the light of past experience. In which case, you should place weight on polling organisations using methods which have worked well in the past but you should not be fooled into believing they yield a 95% probability that the result is x% within + or - whatever margin of error.

To explain further: if, say 45% of the actual vote in the total population was for Yes then it's a fact of pure maths that if you looked at all the samples of a given size, 2000 say (you couldn't really look at them all, the total number of samples of that size is huge) then the Yes proportions in the sample will be normally distributed around 45% with 95% within 1.96 standard deviations from 45%. And then if you assume an actual sampling (say from the actual ballot boxes at the end of the poll) is as likely to produce one 2000 sized subset of total votes as any other such subset, you can reasonably conclude it's 95% likely you will find that your sampling from the final result is 45% plus or minus the margin of error, of 45%.

But there is no scientific or mathematical justification for going in the other direction, it's like going from 95% of Latvians are white to conclude 95% whites are Latvian, if you do so. If you don't know the actual result but have a small sample within which the Yes vote is 45%, almost any prediction for the result for the whole population from just above 0% Yes to just below 100% Yes can be given 100% probability without violating the axioms of probability theory.

So James, I think you should drop the talk about margins of error entirely: it gives a spurious air of exactitude to predictions which, as you say, do have a track record of some success but for which there is no scientific justification for assigning numerical 'confidence' levels (which folk will interpret as probabilities) within numerically precise margins or error.
ReplyDelete
Replies
keatonNovember 8, 2016 at 3:20 PM
Interesting. Stephen Bush, who correctly predicted GE 2015 and Brexit against the media consensus (he's like the anti-Aldo), reckons it'll be Clinton today.
ReplyDelete
Replies
AnonymousNovember 9, 2016 at 5:36 PM
So, to summarise then. Only dafties believe in polls ?
ReplyDelete
Replies

Add comment