The Democratic horse race

I’m firmly of the opinion that horse race coverage is counter-productive. Nevertheless, in analyzing the Democratic pre-primary process, I find that I need to refer to horse race polling in order to discuss some of the problems unfolding. In a certain sense, I’m writing this as an advanced appendix for a forthcoming piece on media bias.

Weighted average of polling by date, candidates with highest cumulative polling averages over time, updated 10/29/2019.
The next five candidates with the highest polling averages over time, plus Tom Steyer, who qualified for the DNC debates recently. Last updated 10/30/2019.
Last updated 10/30/2019
Cumulative weighted average of all polling in the 538 dataset. Updated 10/29.

Methodology notes

While there is one widely circulated polling average (RCP), the methodology behind it is unclear to me and appears flawed. Many people are also talking about horse race polls in a selective fashion, and selective attention to horse race polls leads to errors that compound and worsen the problems related to horse race coverage.

Bigger samples are better

The first and most important factor to understand about polling is that the margin of error is related strongly to the sample size — in particular, the inverse square root of sample size. A poll with 100 respondents will typically have three times as much error as a poll with 900 respondents.

Not all pollsters are equally reliable

The second important factor is that different pollsters have different methodology. Some pollsters are less reliable than others; FiveThirtyEight helpfully supplies a letter rating (C- through A+) for the polls that they track. The methods used by some pollsters are more favorable to some candidates than others, and this is more difficult to quantify. For example, in YouGov’s polling, Elizabeth Warren tends to do better.

Rolling window for time series

For the time series, I’ve used a two-week window (7 days before, the target date, and 6 days after), taken the share of the sample or samples taken by a pollster within that window, and then used the pooled / adjusted sample. Pooling by pollster helps limit the impact of house effects and pollsters’ publication schedules, as well as dealing with overlapping samples in the dataset (particularly for HarrisX).

Dr. Tomas McIntee is a mathematician and occasional social scientist with stray degrees in physics and philosophy.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store