I’m firmly of the opinion that horse race coverage is counter-productive. Nevertheless, in analyzing the Democratic pre-primary process, I find that I need to refer to horse race polling in order to discuss some of the problems unfolding. In a certain sense, I’m writing this as an advanced appendix for a forthcoming piece on media bias.
The first thing to keep in mind is that horse race polling is extremely volatile with this many candidates. The candidates with the third through sixth highest polling averages have all traded places. Only recently has Elizabeth Warren, buoyed by consistently positive coverage in blue media, crept into second place; in December 2018, she was polling behind Beto O’Rourke.
In spite of numerous stories describing Elizabeth Warren as the front-runner or co-front-runner, her overall polling numbers are no closer to Biden’s than Sander’s numbers were at the start of 2019. The scattered polls (mostly with small sample sizes) that show Warren tied with Biden or in the lead are not the bulk of the polling evidence, most of which continues to show Biden ahead.
Below the fold, the most significant polling movements are the ones that aren’t being covered much by traditionally Democrat-friendly media outlets. Andrew Yang has been consistently gaining ground in spite of a conspicuous lack of coverage.
Andrew Yang has been polling above Beto O’Rourke and Cory Booker recently. There’s also been a notable climb in Tulsi Gabbard’s numbers, in spite of (or perhaps because of) recent allegations that she is a Russian asset; her position in terms of polling is comparable to Cory Booker and Amy Klobuchar, and above Julian Castro’s position.
It’s worth noting that at the level of 1% and below, it’s very hard to be sure what a polling average looks like. Rounding effects in how pollsters report polls becomes a very significant factor, as well as effects related to open-ended questions versus lists. The “back field” of candidates whose polling averages have almost always stayed below 1% includes several potentially serious contenders who simply have not had a break-out moment.
While there is one widely circulated polling average (RCP), the methodology behind it is unclear to me and appears flawed. Many people are also talking about horse race polls in a selective fashion, and selective attention to horse race polls leads to errors that compound and worsen the problems related to horse race coverage.
So, I see a need to systematically aggregate the polls in order to discuss issues like media bias. In order to do so, I will be referring to FiveThirtyEight.com’s polling data, as the data collection and curation problem is too large for me to tackle on my own time, and I do think I can rely on their data to be accurate.
Bigger samples are better
The first and most important factor to understand about polling is that the margin of error is related strongly to the sample size — in particular, the inverse square root of sample size. A poll with 100 respondents will typically have three times as much error as a poll with 900 respondents.
Not all pollsters are equally reliable
The second important factor is that different pollsters have different methodology. Some pollsters are less reliable than others; FiveThirtyEight helpfully supplies a letter rating (C- through A+) for the polls that they track. The methods used by some pollsters are more favorable to some candidates than others, and this is more difficult to quantify. For example, in YouGov’s polling, Elizabeth Warren tends to do better.
I have not tried to build in a “house effect” for pollsters; I’ve simply weighted polls based on the square root of sample size and by the standard numeric equivalent of FiveThirtyEight’s letter grades (1.7 to 4.3).
Rolling window for time series
For the time series, I’ve used a two-week window (7 days before, the target date, and 6 days after), taken the share of the sample or samples taken by a pollster within that window, and then used the pooled / adjusted sample. Pooling by pollster helps limit the impact of house effects and pollsters’ publication schedules, as well as dealing with overlapping samples in the dataset (particularly for HarrisX).
I chose a two-week window after noticing that one day and one week windows both showed cyclic artifacts, likely due to regularly scheduled / conducted tracking polls with differing house effects.