Repurposing NCCS’s Code: State Summary Tables
The Center for Nonprofits and Philanthropy recently launched the new and improved website for the National Center for Charitable Statistics (NCCS). The new site is meant to be a one-stop resource for anyone interested in nonprofit data research. Using IRS 990 datasets, NCCS has created tools that provide insight into the nonprofit sector for the public to use and re-create. The website is the next step in NCCS’s efforts to usher in a more transparent, reproducible, and accessible era in nonprofit research.
The new NCCS website is unique because the raw R code used for analysis by NCCS’s in-house researchers will be embedded in every NCCS publication. So anyone wanting to dive deeper into the data will now be able to replicate NCCS research results using the code displayed with each publication.
In this blog post, we show how you can use code from the new website to create personalized data products. Even if you are unfamiliar with R, this tutorial should get you started.
Before you run the code
Let’s use the State Summary Table template as an example. The table on the new website shows a set of summary statistics for Alabama as totals of reporting, supporting, operating, and nonreporting nonprofits. But by making small changes in the code, this resource can be re-created for any US state.
All the code published by NCCS to the new website is written in R, so anyone wishing to replicate it will need access to R and RStudio (both are free and open source). Users will also need to download the source datasets to their computer. Not every NCCS source dataset is needed to reproduce any given product on the website. The easiest way to tell what source file is required to reproduce a data product is to look at the code itself. As shown in the final two lines of the code below (lines 18–41 in the first code box of the “Example Template: State Summary Table for Alabama” publication), the source files used to create the state summary table are the August 2016 (“bmf.bm1608.csv”) and January 2006 (“bmf.bm0601.csv”) Business Master Files (BMFs).*
A comprehensive guide on downloading these files to a local machine can be found in the NCCS Data: R Users Guide. All data files and data dictionaries are available for download in CSV file format at the NCCS Data Archive.
The last preparatory step is to save these data files in a place where R can easily find them. When a new project is created in R, the software prompts you to create a folder where the project’s data will be saved to your local machine’s hard drive. Users may also choose not to create a project and just work in R scripts. The R code shared above assumes that the data files have been saved inside of that R project folder. But the user can direct R to a data file by changing the file path in the code. For example, if the dataset was saved in a folder within the greater project folder called “Data,” the code could be changed to look like the following:
The user is now ready to start manipulating the code.
Personalizing the Code
Once a user has worked through the steps above, they have done 90 percent of the work needed to create personalized state summary tables. The code to create these tables enables users to programmatically define a parameter once, instead of needing to change it in multiple places throughout the code (we described the same functionality for our 2018 Nonprofit Sector in Brief in a previous post). Take the following block of code from the state summary table template:
The values for “stateparam” and “yearparam” are defined at the top and are then drawn upon throughout the rest of the code, as in lines 30 and 33. So if a user is interested in any other state, they only need to change the definition of the “stateparam” value once. Say a researcher wanted summary statistics on the nonprofit sector of Texas in 2005–15, instead of Alabama. That user only needs to change the “AL” in line 1 of the above code block to “TX,” as follows:
Then, after rerunning the code, they will have a table identical in format to the one shown on our website, but with Texas as a focal point. This process can be iterated with any of the 50 states and Washington, DC. More advanced R users can refer to Urban’s blog post on iterated fact sheets with R Markdown to learn how this process can be replicated for all 50 states at once.
A user can change the definition of the “yearparam” value to look at a different 10-year period. Changing the period of interest also changes the source datasets necessary, which would require the user to download the corresponding BMFs.
As noted, all data source files are available for download on the NCCS Data Archive. If a user wants to observe nonprofit trends in Texas from 2003 to 2013, they should navigate to the data archive and download a version of the BMF for 2003 and 2013. We typically recommend using BMFs produced later in a year or very early in the next year, as they contain a more holistic picture of the year in question. The best available snapshot of 2013 would be our December 2013 version of the BMF. Code users will then need to change the “yearparam” value and the call-in code for the newly downloaded BMF values:
Notice the following:
· The values of the “stateparam” and “yearparam” variables have been changed to reflect our geography and time period of interest in lines 1 and 2.
· The code in lines 26 and 27 has been altered to retrieve the newly downloaded BMF files from inside of the project folder.
· We’ve also changed the names of the output folders from “bmf2011” and “bmf2005” to “bmf2013” and “bmf2003,” in line with previous changes. This is not required for the code to function correctly, but it does make it more readable and descriptive.
Hopefully, this walk-through has shed some light on how to repurpose materials found on the new NCCS site and its publications. Similar steps can be used on almost any other coded material on the site. Although we at NCCS plan on creating more of these code repurposing walk-throughs, we encourage users to continue familiarizing themselves with the code behind the materials on the website in R. The Public Charity Summary Tables, the Private Foundation Summary Tables, and the Nonprofit Sector in Brief are all great places to start. The more a user understands our data and how we manipulate it, the more effective they will be at creating their own personalized tools in R that utilize NCCS data. As users begin to explore our data, we look forward to hearing their feedback and questions through the feedback form in the bottom right corner of our site.
Want to learn more? Sign up for the Data@Urban newsletter.
*We changed this sentence on May 16, 2019, to correct the date of the Business Master File from June 2001 to January 2006. The file name remains unchanged.