A guest post by Scott Hamilton
In the referendum campaign we are being presented with polling data on an almost weekly basis. When a poll is released (often in the wee small hours) almost immediately there is a scrum to understand what the numbers mean for which side, often with a Twitter race to churn out a pleasing graphic triumphantly cherry picking the most striking result.
Whilst the evidence to suggest that people are directly affected by polling data - in terms of how they vote at least - is scarce at best, it is easier to conclude that voting intention could be influenced by the media who are demonstrably affected by polling data. Often it is the media establishments that commission the polls who are most vocal about the results (understandable given their money often pays for the analysis). Polling generates fairly cheap copy and makes for good, dramatic headlines where each side of the campaign is said to be “winning” or “losing”, though more often than not the narrative is much more dramatic - “blow for Salmond” appears to be something of a favourite.
Error, errors and more errors
Question - when have you ever seen a newspaper headline that expressed a poll result with ANY discussion of error front and centre? Never, right? The “Blow for Salmond” headline with the “60% No” strapline isn’t quite as sexy when you add “this value is subject to at best plus or minus 3% error, maybe much more - please interpret these results with caution”.
We (I’m looking at you MSM!) should remember that, like any observationally driven procedure, polling is subject to error. Most people with a passing interest in polls will be aware of the oft-quoted “plus or minus 3%” figure that the polling companies and the media use as something of a “quality guarantee”. This is far too simplistic a metric to use as in truth, the error associated with any single political poll is actually unknowable and here’s why.
The “plus or minus 3%” error value is the absolute best case a polling company can achieve - this is because the figure represents the sampling error, not the total error in the poll - which is actually unknowable! Sampling error is the amount of potential variation from the true value associated with trying to represent a large population with a much smaller sample. For example if you wanted to know how many left-handed people there were in Scotland, you could ask 1000 people and be pretty sure you were within about 3% of the correct answer when all’s said and done.
Now, leaving sampling error to one side, there are other potential sources of error in a polling survey. Commonly these are called the coverage error, measurement error, and the non-response error.
Coverage error arises from not being able to contact portions of the population - perhaps the pollster only uses people with an internet connection, or a landline telephone number. This can introduce bias into the sample for a whole host of reasons.
Measurement error comes about when the survey is flawed, perhaps in terms of question wording, question order, interviewer error, poorly trained researchers, etc, etc. This is perhaps the most difficult source of error to understand, it is unlikely that even the poling companies could put a % figure on how much error these methodological aspects contribute to the total for the survey! Taking the left hand/right hand example above, this is a completely non-partisan question that people should have no qualms about answering honestly. They also won’t have forgotten which hand they use, unlike say how they voted in an election three years ago which is sometimes used to adjust poll results. I’m also confident there’s little potential for vested interest in such a question and how it’s worded and framed, perhaps not the case for an issue like the upcoming independence referendum!
Non-response error results from the surveyor not being able to access the required demographic for simple reasons like people not answering the phone, or ignoring the door when knocked - unavoidable really.
Polling companies try to account for all of this uncertainty by using weighting procedures whereby the sample (or sub groups in the sample) is adjusted to make it align more closely with the demographics of the population being surveyed. For instance, the sample might have 10% too many women compared with the population, so therefore women’s voting preference would be weighted by 0.9 in the final analysis to account for this.
But bear in mind, if we accept the premise that the absolute best case for a question as straightforward as “do you use your left or right hand to write?” is plus or minus 3% error after weighting, how much error do you think may still exist in a survey on something as societally complex as the Scottish indyref? We simply do not know, and no polling company can tell you either. There’s simply no way for them to fully account for the total error in their results - not that they or their sponsors tell you that. And perhaps just as bad, when they get it right, they can’t say with confidence how that came to be so - good sample? Good coverage? Nice researcher getting honest answers?
Still feel confident about those results in the Daily Record?
OK, so we know a bit about error - but what difference does all this make? Well, quite a bit actually! Thankfully there are some quite recent Scottish elections we can use as test cases for some further analysis. Bear in mind that in these cases the data is the final, adjusted, polished, weighted, dressed up for the school disco data. The polling companies' best estimates which we can use to see how well they reflected the outcome of a real example.
Let’s pause here however - polling companies will always say “you can’t compare the poll done a month before the election with the final result! Opinion must have changed”. Perhaps uniquely in what is after all supposed to be a scientifically driven pursuit, there is no penalty for a polling company being entirely wrong, all of the time! They always have the fall back position of “we were right at the time”. But then, I’m sure the polling company would say, “how do you know we’re wrong?” and we’re left going round in circles in the context of a compliant media blindly accepting and promoting results which it itself commissioned. Anything wrong with this picture? Any space for vested interest? Media commissions poll, media shouts about poll they commissioned and perhaps even designed...
My central problem with all of this is that the media uses the error strewn polling from at least several weeks (and months!) before a major voting event to strongly suggest or even predict the outcome of that event. Even if they don’t come out and say it, my theory is that some of what they are trying to achieve is an acceptance in the voting community of a preordained outcome, backed up conveniently by their numbers. Not exactly playing fair.
As I write we’re about five weeks from the referendum so I thought this a good time to look at how accurate a few of the polling companies were in the run up to the 2011 Scottish Parliament election - but not in % terms as that can be kinda obtuse. Let’s turn it into votes!
To establish the predictive power of the polling companies at various points in the last few weeks leading up the vote I’ve turned the difference between the outcome and their poll on a given date into actual votes cast. After all we know how many people voted (thankyou Wikipedia), so we know how many people voted for each party, so therefore we can see how many people the pollsters think would vote for each too on a given date.
2011 Scottish Parliament Election
In 2011 the total votes cast amounted to about 1,990,000 on a turnout of about 50.4%. The SNP ended up getting 902,915 votes in total (45.39% of the total). Labour got 630,461- 31.69% of the total. The other parties were less significant so I’ll stick to these two.
YouGov : on the 15th of April 2011 this polling company put the SNP vote share at 40% and Labour’s at 37%. These values don’t sound too dramatic compared with the outcome but I estimate this represents (with errors for other parties included) about 140,000 Scots who didn’t vote as per the polling percentages just three weeks before the election. Most of the error is in overestimating Labour’s share, and underestimating the SNP's. This, after all the weighting procedures that are supposed to reduce error...the dressed for the school disco data.
Remember, this is for a turnout of 50% so if we scale the same error to 70% and 80% turnouts (both plausible for the referendum) we end up with quite staggering numbers- 189,000 and 216,000 voters in the “wrong box” less than a month from the vote. Repeat after me, “plus or minus 3%”. Could over 200,000 people influence the outcome of the indyref?
Granted by the day before the vote, YouGov’s polls better reflected the outcome - but their polls still didn’t match the outcome by some 100,000 voters (or 134,000 and 153,000 when scaled for reasonably expected indyref turnouts) the day before the vote. “Plus or minus 3%.......”
Just so it doesn’t look like I’m picking on selected polls - YouGov’s polls, on average from Feb to May 2011, differed from the eventual outcome by something like 140,000 people (this amount of error could mean as much as 230,000 people assuming high indyref turnout). Could 230,000 people swing an indyref?
TNS : this polling company conducted fewer polls in the run up to the 2011 election but their polling at 5 weeks out (27th March) represented about 166,000 voters in the “wrong box”- that is to say they did not vote as polled. Scaled to 70%/80% turnout that is about quarter of a million people.
By a few days before, TNS’ numbers better reflected what happened on the day but there were still 82,000 people who didn’t vote as expected. If the turnout had been 80% this would mean 153,000 voters. Could that many people swing an indyref?
Hopefully this piece has helped shine a light on how uncertain polls are, how they can carry quite serious errors corresponding to hundreds of thousands of voter (sometimes even the day before!), and why you should be utterly sceptical about any news outlet’s representation of them.
So, when you’re reading the paper on Sunday and there’s a poll in it - remember that the results could have an error amounting to a couple of hundred thousand Scottish voters. How they’ll swing on the day no-one knows, least of all the pollsters. The only 100% certainty is that 100% of the polls are wrong 100% of the time, worth bearing in mind as we enter the final weeks!