Using R Markdown to Track and Publish State Data
The State and Local Finance Initiative (SLFI) at the Urban Institute works with state policymakers across the country on tax and budget issues. However, understanding how a state brings in revenue and funds expenditures is impossible without considering a state’s demographics, its economy, and its leaders’ priorities. This context comprises countless data points spanning numerous sources, as well as qualitative analysis on a state’s politics and history, repeated for all 50 states (and the District of Columbia).
To meet this challenge, SLFI used R Markdown to publish its State Fiscal Briefs in January 2020. We built upon best practices established at the Urban Institute for creating iterated fact sheets (as PDFs) and fact pages (in HTML). But we also explored new ways to leverage R Markdown’s free, open-source framework for programmatically generating documents: combining quantitative and qualitative analysis in one parametrized template and creating HTML pages.
When the COVID-19 pandemic hit a few months later, we were prepared to use our R Markdown process to illustrate how and why states’ experiences with the pandemic differed and how their budgets transformed as a result. Given the ongoing nature of the COVID-19 pandemic and the constant shifts in budgets and other policies, the state briefs and COVID-19 project page benefit from an automated process capable of quickly updating and providing the latest data to policymakers and other state actors. In this blog post, we describe how we use R Markdown to efficiently update the pages using a variety of data sources, how we’ve accommodated key differences among states, and how we get the factsheets onto the web.
The ever-changing data landscape
As the landscape and burden of the pandemic rapidly evolves, R Markdown allows us to easily ensure our data are up-to-date and to document changes as they occur. Our workflow relies on three main steps: collecting the data we need, combining the data into one data frame where each state is an observation, and using that data frame to populate an R Markdown template. We use a variety of data sources and data types to convey the latest information about the economy and the pandemic.
We download COVID-19 case and vaccination data, for example, from the Centers for Disease Control and Prevention’s (CDC’s) COVID Data Tracker and save the publicly available CSVs in our project directory. Below is an example of how we pull in the CDC case information from a CSV into its own data frame in the project. With a quick download and code chunk, we’re able to update all the case data that feed into our template. We also make instantaneous calculations that allow us to compare per capita caseloads across the states.
The built-in R state dataset provides a pre-made list of state names and abbreviations, although the District of Columbia must be manually added. This is a quick addition.
`rbind(“District of Columbia”)`
The state dataset ensures we have consistent state names and abbreviations among all data frames.
Downloading CSVs is not the only way to get quantitative data quickly. For our data on industry employment and unemployment trends, we use an application programming interface (API) to directly pull data from the Bureau of Labor Statistics. This allows us to highlight the industry and sectors most affected by the pandemic in each state without having to manually download data and make comparisons.
After calculating the percentage change in employment for each industry over the course of the pandemic, we use the filter function in the above code to identify the industry (called seriesID in the script) in each state with the greatest decline. The variables in the bls_pct data frame are then referenced in our R Markdown template. The result is the sentence shown below, in this case for Louisiana’s COVID-19 page.
For more qualitative data, such as reopening statuses of states and budgetary actions, we collect information by state, draft analyses, and store the text in spreadsheet form. This allows us to customize entire paragraphs for each state. We use R Markdown syntax to format our text, and we even integrate HTML syntax to program features for our HTML output. Below, we store a unique paragraph for each state describing its historical unemployment trends and linking to our data source in a new tab.
Addressing Variation among States
After our data has been compiled, we put it all together using an R script and feed the resulting data frame into a template, where each piece of information that varies by state is parametrized (read here to learn more about how this process works).
One of the biggest challenges we faced was creating a template that is consistent across states, but also allows room for customization. There are large variations in how states function, and we had to ensure that the boilerplate language in our template didn’t obscure important differences across the states.
With R Markdown, we have the flexibility to create a general template and provide cases for “exceptions.” One frequent source of variation is the District of Columbia, which uniquely operates as a state and a locality. In the examples below you can see two ways we tackle this special case in the COVID-19 pages.
1. Creating a general statement/sentence for all states and an if clause for an exception. Below we replaced our general sentence about the American Rescue Plan with a specific sentence describing DC’s unique funding under the Plan. We also adjust references to allocations based on how large they are.
In this example, since there is only one exception, the sentence for the District of Columbia is hard coded in HTML instead of using parameters (e.g., state = params$state).
2. Use if_else statements within a function
If_else statements are used as:
`if_else(condition, true, false, missing = NULL)`
For this example, it is simpler to use an if_else statement because only one word is being changed (Mayor vs. Governor) instead of an entire sentence.
Given the number of different state pages to create, it is ideal to have as much boilerplate language as possible, and only make adjustments or exceptions where necessary. Not only does this speed up the process, but it also makes it easier for readers to compare data across states. The two examples shown above are not the only ways to make exceptions within an R Markdown template, but we have found they are a simple and efficient way to address state variation.
Publishing the Pages
The Urban Institute has used R Markdown to iterate fact pages as PDFs for several years. The documents require more formatting work because of page breaks, are great for printing and distributing by hand, and rely on LaTeX. The pandemic highlighted some of the advantages of iterated fact pages (HTML), which are better and inherit the design of urban.org through the site’s cascading style sheets (CSS).
After iterating 51 web sites, the content needs to be added to the Urban Institute’s content management system (CMS). For the state fiscal briefs, we manually add the HTML 51 times. For the COVID-19 feature, we created a less time consuming and tedious process by hosting the pages elsewhere and iframing each page into urban.org. With each product we learn more about ways to address manual roadblocks, and what can (and should) be automated for future projects.
Pulling it Together
Creating the state fiscal briefs and its COVID-19 feature was no small feat. Luckily, RStudio and R Markdown make it easier to pull in data from numerous sources and generate templates that resolve key differences between states in an automated manner. R Markdown also ensures that all pages have consistent layouts and features. For such an uncertain time in our country, we hope to provide readers with specific and timely information, and to equip researchers with tools for efficient analysis across the states.