Illustration by Rhiannon Newman for the Urban Institute

Analyzing the Quality of 2020 Census: Understanding the US Population Puzzle

Data@Urban
5 min readDec 16, 2021

Every 10 years, the US Census Bureau creates a picture of the US population. With this information, Congress apportions the 435 congressional seats, allocates the $1.5 trillion federal budget, redraws voting lines, and more. Although the Census Bureau’s goal is to take a snapshot of the population on April 1, the process to count every resident is closer to carefully gathering and organizing puzzle pieces for an unknown picture.

The US Census Bureau finds those pieces by collecting census questionnaires, counting residents in remote areas of Alaska, and going door-to-door to households who have not filled out the questionnaire. The Census Bureau then carefully places each piece to determine how many people live in each area of the country, completing the population puzzle.

But the 2020 Census process couldn’t follow its usual process. The COVID-19 pandemic and related stay-at-home orders, new household compositions, possible changes to the survey itself, displacement of college students, and natural disasters are some of many events that potentially altered the pieces.

With these changes, many may wonder: “How much can we trust the 2020 Census picture? Are we missing any pieces?”

2020 Census County Assessment Tool

To help state and local decisionmakers answer these questions, the Massive Data Institute at Georgetown University’s McCourt School of Public Policy and the Urban Institute created the 2020 Census County Assessment Tool. This Tableau dashboard compares the data we created from public and administrative sources with 2020 Census data. With this comparison, we report where the 2020 Census estimates are close to expectation, slightly divergent, or highly divergent for 12 different populations.

We created our dataset, which we call the Vintage 2020 population estimate data, by merging the following data sources:

● Population estimates and decennial census data (e.g., American Community Survey 2015–2019 5-year and 2010 Census)

● Federal Emergency Management Agency Disaster Declarations Summaries

● 2020 COVID-19 weekly averages from the New York Times

● decennial self-response rates from the US Census Bureau

● unemployment data from the Bureau of Labor Statistics

For our measures of divergence, we also created a Vintage 2010 population estimate data. We used these data to calculate the proportional difference between the 2010 Census counts and Vintage 2010 values. We classified each county based on population size: small (fewer than 40,000), medium (40,000 to 100,000), or large (more than 100,000). We then computed the mean and standard deviation for each county size and the 12 population estimates. Finally, we compared the proportional differences of the 2020 Census and Vintage 2020 with the means and standard deviations. This process allowed us to determine if the 2020 differences are close to expectations (less than one standard deviation), slightly divergent (between one and two standard deviations), or highly divergent (more than two standard deviations).

Simply put, our divergence measure compared the differences of the 2010 estimates and the 2010 Census against the differences of the 2020 estimates and the 2020 Census.

Our technical document provides more details on how we aggregated our data across the various data sources and developed our measures of divergence. Our GitHub repository contains the raw data, the code to clean and join the data, the processed data, and the Tableau workbook of our tool.

How policymakers can use this tool

We hope state and local decisionmakers will use this tool to investigate the 2020 census data product releases for data equity challenges and the impact of collection during the COVID-19 pandemic. When using the tool, if a user sees a county is highly divergent, they may then want to dig deeper to verify the counts.

For example, I used the tool to see which counties had highly divergent total population counts in my home state of Idaho. I found Madison County has the only highly divergent total population across the 44 counties in Idaho.

Filtering the map by “Total Population” for Measure and “Idaho” for State. Madison County is the only count that is indicated as “Highly Divergent.”

The Review Impacts tab provides data visualizations and information on potential sources of errors for the 2020 Census, whereas the Measure Divergence tab shows the 12 population estimates and how much these estimates diverged. The tool indicated Madison County’s total population as 31.4 percent higher than expected, so I dug deeper.

Review Impacts tab for Madison County, Idaho.
Measures of divergence for various populations and list of additional resources.

Interpreting the results from the tool

Under the Review Impacts tab, we can see that unemployment rates fell below the national average throughout 2020, self-response rates were higher than the national average despite lower broadband access, and there were no declared disasters. I also searched for external articles about Idaho’s total population. I discovered that the Census Bureau ranked Idaho as one of the fastest (or the fastest) growing states for several years before the 2020 Census (2017, 2018, and 2019). Madison County also has a university, Brigham Young University–Idaho, where enrollment did not drop during the pandemic.

The university also explains why Group Quarters increased by more than 2,000 percent. Most students at Brigham Young University–Idaho will pause their education and become Latter-day Saint missionaries. On March 20, 2020, the Church of Jesus Christ of Latter-day Saints leaders had missionaries with a few months left to serve return to their home countries and suspended in-person training for new missions based on travel restrictions, COVID-19 conditions, and other concerns. This decision resulted in thousands of missionaries returning to the US and being reassigned to other areas, which could contribute to the increase in total population count.

Another possibility for the Group Quarter increase are universities counting on- and off-campus housing. The US Census Bureau contacted universities to report on- and off-campus students to ensure an accurate count of college students who might have returned home early.

Although the tool indicated the total population as highly divergent for Madison County, the 2020 Census population counts make sense.

This example is one of many ways users could use the tool. We provide more examples in our vignettes that highlight highly divergent estimates for Hispanic populations in Harris County, Texas, and Black and non-Hispanic populations in West Feliciana Parish, Louisiana.

More tools to come in 2022

Over the next year, our team will continue to update the 2020 Census County Assessment Tool and create new tools as the US Census Bureau releases data products, such as the 2016–2020 American Community Survey. We encourage users to email mdiresearch@georgetown.edu to let us know how the data align with their expectations and what improvements they would like to see.

-Claire Bowen

Want to learn more? Sign up for the Data@Urban newsletter.

--

--

Data@Urban

Data@Urban is a place to explore the code, data, products, and processes that bring Urban Institute research to life.