Illustration by wan wei/Shutterstock

How should we think about the polls in light of the 2020 presidential election?

Following the presidential election, Rob Santos, Urban Institute vice president, chief methodologist, and president-elect of the American Statistical Association, sat down with Graham MacDonald, Urban’s chief data scientist, to discuss the state of polling and related issues in light of the unfolding election results. The following is a lightly edited transcript of their conversation.

Graham: Early indicators from last week’s 2020 general election show polling has been somewhat off in some places and relatively good in other areas. Some are asking the question, “Why do we even need polls?” In this atmosphere, I’d like to first get your thoughts about the state of polling generally, given these recent events.

Rob: Well, there’s a known saying that the best predictor of future behavior is past behavior, and it turns out it applied to the polls in this case. The polls showed a remarkably consistent Biden lead of 5 to 7, even to 12 points, throughout the campaign. Lo and behold, election day comes, and the lead almost totally vanished. So it’s natural to question what exactly happened, especially in light of four years ago, when Clinton was virtually crowned the president before the election, and then suddenly, that lead in the battleground states evaporated on election night. A few factors are to be noted, but most are directed towards COVID-19, which I’ll discuss later.

After the 2016 election, in regard to polling, the American Association for Public Opinion Research conducted a postmortem analysis showing that the polls had actually done pretty well. The apparent culprit was battleground state polling, some of which was not necessarily conducted in the most rigorous way. Nationally, things worked out in the ways in which the polls predicted, but unfortunately, they got it wrong in the battleground states, which was where Trump won the election thanks to the electoral college system.

This time around, it looked like much the same was on the verge of happening until the mail-in ballots were counted. After the postmortem analysis in 2016, a number of polling organizations and many researchers doubled down and did their homework. Presumably this time around, we all figured that wouldn’t happen again. But then came COVID-19. That’s where the factors I mentioned earlier came into play.

First of which, if you take a step back and take a look at polling performance, my guess is that it’s not as bad as it seems right now, but that’s yet to be determined. The second factor is COVID-19, the elephant in the room, and how the pandemic is impacting public health and the economy. I believe these two things had a profound effect on likely voter models. So, I’m hoping there’s a way to analyze this postelection.

In political polling, what typically occurs is that you take a poll of registered voters and apply a model that predicts who among them is most likely to vote, and with those weights, create an estimate. So, there’s two types of weighting going on. The first is because we have these abysmally low response rates of around 6 percent, 8 percent, even 1 percent for some. You must align the polling results to the universe of registered voters to develop a valid generalized estimate of the full population, which is usually easy to do. But then the second is that pollsters have to predict who is going to vote. But all they have to go on is previous behavior, referring to 2016, 2018, and other previous elections. So necessarily, the modeling was created in an environment that didn’t exist concurrently with our COVID-19 reality. The models were built pre-COVID-19, so no one really had an idea of what was supposed to be happening once the pandemic hit.

The easiest example I can give of this is Google flu trends. About 10 years ago, Google found that if they harvested people’s searches on influenza-like symptoms, they could then geographically identify local influenza outbreaks. They could do it weeks before the Centers for Disease Control and Prevention (CDC) could, which used slower reporting of doctors and hospitals. Historically, there’s a surveillance program at the CDC where doctors report influenza, and the CDC relied on that to determine hot spots. Google said, all we have to do is perform these searches and we can tell you way in advance where outbreaks are happening and that can help to address public health issues.

It worked marvelously for years and even went international. And then a couple of things happened that made it less useful. The first was the emergence of the H1N1 virus. Google Flu Trends was showing that nothing of the sort was going on, but the CDC started getting reports about outbreaks. The problem was Google had been searching on terms like “high fever,” “cough,” and “flu,” but hadn’t been searching on “H1N1 symptoms” or related search terms. Their underlying model did not align with the current reality.

A year or two later, they were still muddling along, and suddenly, in the month of August, they detected hot spots all over the US and reported them to the CDC. The CDC was wondering why they were not seeing anything from their surveillance system. It turned out there was a slow weekend in August, and a major news outlet featured a spot on taking flu shots early that year, which made folks “Google” the flu and where they could get their shots, which sparked a false spread of the flu using the Google model. Once again, the underlying model failed to align with reality.

So there were two instances, one in which they missed something that actually was happening, and then the other, where they found something that didn’t exist because of a news spot.

That scenario has an analogue to the current situation in political polling. You rely on past behavior, build your likely voter models, then something unpredictable and unprecedented like the pandemic happens, with hundreds of thousands of people getting infected and almost a quarter of a million dead. That changes people’s psyches and motivations about lots of things including voting. We became even more polarized, possibly to the point that people who were going to vote for Trump decided not to participate in a survey because of their dislike of the media. People who a likely voter model would never have imagined would vote suddenly got motivated and cast votes. Whatever it was, I believe that the model didn’t work, and that’s going to be a major lesson from this particular election. The models changed and there was no empirical mechanism to capture the impact of external environmental factors into the voting behavior of people.

Having said all that, there are a couple of additional thoughts. One of which is that the classic defense, that polls are just a snapshot of a point in time, is true, but you shouldn’t take that to the bank in predicting the final outcome. That would be true if the leads were smaller. But then we had such a large discrepancy suddenly whittled down to a small difference in a few battleground states, some of which weren’t even projected to be close. That makes me think we can’t use that as an excuse.

I will just say, consumer beware. You should take polling for what it’s worth. It’s a best guess at what’s going on out there, and there are many underlying assumptions, some of which us researchers have talked about like the chain of different links in the research process with individual links representing the quality of the sample, field operation, questions asked, and the likely voter model. All of these links are essential in getting a good projection at a point in time, but the final result is only as strong as the weakest link.

Graham: Great points. No matter the reason some polls weren’t as accurate, one of the things we are going to have to contend with as a result of this is a potential decrease in trust in polls. I’ve heard many smart people recently saying that we should use something other than polls, for instance. And then there is this related question of effectively communicating uncertainty, argued by folks like Nate Silver, who say maybe the polls and forecasts were right within a margin of error, but we weren’t reading them critically enough to take that into account. What are your thoughts on these big issues?

Rob: It’s amazing to me that people freak out over the uncertainty of polls. Yes, there are margins of error, and one maybe could argue that everything that we actually envisioned in the final election was within a margin of error. But it is really about consumers being aware of the uncertainty associated with polls in a way that transcends margins of error and gets to the root that these things have underlying assumptions, and people need to understand them. And if you don’t have time, just recognize that just because a poll says something, doesn’t mean it’s true all the time. It depends on a lot of things, like how the sample was drawn and how big it was, how the questions were asked, who participated, how the data were collected and processed, and so on. There is a method to this madness.

Now, we live with uncertainty in pretty much every aspect of our lives. Some go to the doctor and the doctor will give a grim diagnosis and they’re forced to ask, “Well, how long have I got, doc?” The doctor says if you go through treatment, you have an 80 percent chance of survival, but what about the 20 percent? That’s the type of uncertainty that is much more meaningful than whether you’re able to call a presidential election. People look at Yelp for takeout recommendations, and sometimes they pick the five-star rated restaurant and the food is lousy. You look at hurricane models with all of the squiggly tracks of where the hurricane is projected to hit on land, they can go all over the place, and then it just turns around and goes back the other way as it did with Harvey. People deal with that environmental uncertainty, and that has profound effects in terms of damage, evacuation, and lives. I say we have uncertainty all around us, so let’s simply accept the uncertainty that’s inherent in these political polls. Let’s not take them so seriously.

Graham: Whew, I see your point, but that might give people more anxiety. I’ve recently been reflecting on how some sites like FiveThirtyEight point these out, rate the pollsters and say these pollsters tend to do better than others, but then at the same time, they’re out there trying to simplify all of that into a number or set of high-level numbers. It does seem like these aggregators are, on one hand, doing something good, but on the other hand, being attacked for putting out that high-level number where people really aren’t digging under the hood for that answer and fully accepting that uncertainty, as you say. What are your thoughts on the role of the aggregators? Do you think they are doing good, evil, making things more complicated, or something in between?

Rob: A lot of these aggregators put different weights on different polls in the aggregation process, depending on the perceived quality or rigor of a particular poll. And that’s great, but it screams of an underlying subjective model, and the assumptions of that model may or may not be true. Any given poll on any given week can be dead on and can be way off another week just because of the uncertainty associated with that. Aggregating is supposed to dampen the uncertainty of a specific poll by combining the results of several, much in a meta-analysis kind of way, and that’s fine, but it doesn’t remove the unmeasured uncertainty. The unmeasured uncertainty gets back to this notion that likely voter models are there, and to my knowledge, there is no mechanism for building in the variability associated with that, much less any systematic error that might occur.

For this season of polling, I happen to think there was systematic error rather than just wobble, because if it were just wobble in all of these voter models, then aggregating would have increased accuracy, but that didn’t necessarily happen in battleground states. Instead, you have this convergence of everyone tooting the same horn, except for Ann Seltzer in polling Iowa, who was dead on with her call of a 7 percent lead for Trump just before the election, which I thought was really interesting. I’d like to find out more about what she did.

Graham: Agreed, and that’s a great transition into my next question. Often, we’re building these models based on a handful of past presidential elections, assuming that there is some commonality between them. And what you’re arguing is that it’s just very different from time to time, and we have these likely voter models and systemic error — what looks like to be highly correlated errors between both polls and aggregators in a single direction. Are we not building enough uncertainty into this entire process? Maybe we’re not being humble enough about the level of uncertainty we’re putting into these models and about how much we know based on limited samples of past elections and recognizing that this election, or any election for that matter, is totally different in some ways about who is likely to turn out that we just don’t know at the time. What are your thoughts on small sample size and humility?

Rob: I agree with much of what you said and note that polling sample sizes are pretty standard, like in the 1,000 and the 500 range, but there’s a reason for that. The reason is the sponsors are not willing to lay out the type of resources it takes to get a much better job done. So you do the best with the limited resources you have, do more interviews rather than fewer, and maybe that means that other sources of error like measurement, response, noncoverage creeps in, and those can be systemic. I’m surprised that there isn’t more discussion of the validity of likely voter models than there is, and I’d like to see someone really tackle that.

It seems to me that the only way to do that would be with some sort of follow up or in-depth interviews where you go back to people and first, look to see if they voted and only then, discuss what was going on in their heads. You also have to engender the folks to be honest, which may be difficult in our divided, hyperpolitical environment, and especially with those voted for whoever loses because they may be bitter. Now, right after an election, might be a good time to do it if we can get them to be honest.

Graham: So, as we know, some pollsters select a random sample from a sampling frame, and reach different folks every time they poll. But other times, pollsters will recruit a sample of folks and say, instead of random outreach, we’re going to recruit and make sure the group is representative and keep calling them for their opinions over time. I’m not sure if we have a postmortem on that, but it’s interesting to think about whether panels are better and where we go from here. Do we keep the panels and add that qualitative component in you mentioned, are we trying to validate these voter models before we get a vote as our next step? Do we need to have fewer polls, but with much higher quality? Should we do another postmortem, like 2016, and find a way before the next election to do some of the validation you mentioned to get an idea of where the systematic error is coming from?

Rob: Starting with reality, it would take so much effort and money that this brings about the question, “Does society really need a hyperaccurate prediction of who’s going to win the election?” My answer is we don’t. One can show, and it may have been the case in 2016, that by projecting Clinton to be the winner, or projecting Biden to be the winner with an 80 or 90 percent chance, we alter behavior; some folks refrain from going out and voting, while others who otherwise skip voting decide to vote after all. I would almost rather say, “Here’s a lot of uncertainty with how the voters are looking” rather than finding the scientific method and resources to be hyperaccurate. You may end up creating your own uncertainty because people will see the results and alter their voting behavior accordingly.

Secondly, I don’t think it’s all that necessary. Society doesn’t need that type of accuracy for this type of thing. If it’s the estimate is about doses of vaccines needed in areas, yes, I want a lot of accuracy, but if it’s about the amount of points someone running for office is ahead or behind, then it’s not that big of a deal.

Graham: That makes a lot of sense to me. Would you say then, that a potential next step is that we need to better communicate uncertainty and get a better handle on just how accurate or inaccurate these likely voter models are and maybe better incorporate that uncertainty into our thinking next time?

Rob: I like that and am struck by the hurricane models because their starting point is, “We really don’t know.” Many different hurricane models exist, and you see all of these different lines all over the map and I wonder, why can’t we have something like that for pollical polling? We don’t know the likely voter model, but here are three or four different scenarios for what will happen if say young voters don’t come out, or Black and Latinx voters are suppressed. To me, that gives a better message and communicates what’s at stake better than just trying to have this one number at the end of the day.

Graham: Right, so you could imagine instead of running tens of thousands of simulations, you might just say pick 10 scenarios that are likely and show how that could play out in a more descriptive way.

Rob: That gives different people the evidence they need to decide if they should really go out and vote. For example, there was a lot of talk about white suburban women. My guess is that discussion ended up engaging that subpopulation a lot more. The same with Black and Latinx voters. It’s really interesting what went on in Miami Dade, where the Latinx population cut both ways. We should study that and find out what’s going on. It didn’t seem to happen in Arizona. There it was much more of a vote toward Biden, so we’ll see.

Graham: I think we’ve come to a pretty good set of issues and recommendations for the industry. I really like your comment about needing to know the accuracy of polls ahead of time. There’s been such a big focus on knowing what’s going to happen before it happens, and I wonder what you think about what the public and pollsters four years from now will look like? Or what you want it to look like?

Rob: I’m pretty sure it’s going to look exactly the way it always has. There’s not going to be a sudden discovery of a new method that solves all of these issues. People are going to work on the margins of the methods to make improvements, whatever they may be. But there is not going to be a magic potion that suddenly reveals the true way that polls need to be conducted in order to call the election. I don’t think it’s necessary and I wouldn’t want it to be that way either. I don’t think sponsors are willing to dish out the money it takes to do more accurately. I think something like survey to get an idea of what the public is experiencing is all the public really needs.

Let’s just get a sense of what’s going on and not take it so seriously that we’re slamming the polling industry and telling them they got it wrong again. Instead, let’s find out what different types of people are thinking and ask: What does that mean for us? What does that mean for me? Maybe that will help motivate voters. Bottom line, I don’t think much is going to change. Having said that, less is better in this game.

Graham: I have to throw a research-related question in here, given our roles in a research institution. How does reflecting on what’s been going on in the polls change the way you think about our work as researchers, or as people who conduct surveys at the Urban Institute? Is there work that we’ve done that informs how you think about the polls?

Rob: It’s all connected. The notion that we always have underlying assumptions, many of which we cannot validate, needs to be kept in mind. As soon as COVID-19 hit, one of the first things I did was issue a memo with recommendations on how one needs to rethink the underlying conceptual frameworks that drive programs in program evaluation. People have logic models and conceptual models that indicate how different populations react to their own motivations, environmental factors, systems, and policy.

When something like COVID-19 occurs, then the economy falls, the resulting impact almost certainly alters the underlying conceptual frameworks. That can impact how a program functions, the mechanisms for creating positive impact, and even the client population to be served. So, you can’t pretend that you live in the same environment that you did pre-COVID-19 and try to evaluate your program as originally planned.

Secondly, at best, you’re looking at a marginal impact. There’s no such thing as a perfect counterfactual to your program. If it’s a jobs program and someone is trying to get a job, the counterfactual will be someone who is not participating in that program. While that’s true, it doesn’t mean that person is sitting around, doing nothing and trying to get a job. They’re going to go out and find other resources to help themselves, and if they are not being helped by particular program that you’re evaluating, they will find some other way to find help to get a job. So, you’re looking a marginal difference between the program you are evaluating and other unknown sources of help that are out there. It’s not a matter of program versus no program. That means the differences that you will be able to detect are much smaller. It’s sort of like comparing two medicines for the health of the patient rather than comparing the use of medicine versus no medicine. I’d say it’s all connected. What’s going on in the polling industry, we face everyday in our policy research. There are underlying assumptions and implicit models, and it’s up to us to explore those and to do our best to articulate them, note them in our limitations, and when possible, try to strengthen them with empirical research. Often, it involves qualitative research to strengthen the conceptual research and logic models.

Graham: I was going to wrap this up with that last question, but you mention qualitative research, so I have to get one more question in. Can you say something to the importance of qualitative research and how relative to the polls and our research it is? And then any other final thoughts you might have.

Rob: In terms of qualitative research in the polls, there’s actually quite a bit of discussion going on in the American Association for Public Opinion Research community. They are discussing how to validate voter models and what’s really going on in the voters’ heads. Are voters deciding not to participate in a survey? This gets at the self-selection and nonresponse bias piece. Or if they are participating, are they being truthful? Having them discuss what motivates them to go out to vote, especially in the world of COVID-19, and asking them why they chose that over mail-in are relevant questions.

Qualitative research addresses very different research questions than quantitative research, of course. Quantitative measures the polls and headcount, but qualitative digs underneath that to answer research questions about the whys. Why did these people vote the way they did? What were the underlying factors for candidate preference? What does that have to with racial injustice or current events, like the economic downturn? It all connects and the most valuable insights are ones motivated by a combination of both the qualitative research questions as well as the quantitative, and that they’re merged together in a way that increases knowledge of the particular policy question.

Graham: An excellent note to end on. Thanks for your time and wisdom, as always, Rob.

Rob: Thanks, Graham.

An earlier version of this blog post incorrectly said that Ann Seltzer polled Ohio instead of Iowa (corrected 11/16/20).

- Jamila Patterson

- Graham MacDonald

Want to learn more? Sign up for the Data@Urban newsletter.

Data@Urban is a place to explore the code, data, products, and processes that bring Urban Institute research to life.