Redesigning the Nonprofit Sector in Brief in R
In the Urban Institute’s Center on Nonprofits and Philanthropy’s annual Nonprofit Sector in Brief, we discuss trends in the number and finances of 501(c)(3) public charities and updates on private charitable contributions and volunteering. Data are drawn from Urban’s own National Center for Charitable Statistics (NCCS) data and other premiere sources of nonprofit data, including the Giving USA Foundation, the Foundation Center, and the Bureau of Labor Statistics. The Nonprofit Sector in Brief is one of our center’s most cited publications and is widely used by educators, policymakers, and practitioners alike.
Historically, it has been a bit of a pain to write.
Merging data across such disparate sources isn’t easy. Creating the 2015 brief involved using structured query language to aggregate and analyze NCCS relational databases, exporting those results to a series of linked Excel workbooks, combining them with other results from related analysis in Stata, creating figures and tables using Excel, and writing the final document in Word before exporting it as a PDF and publishing it on urban.org.
The process, and the final product itself, changed little from year to year. Discounting the numbers and statistics at the heart of the brief, the formatting, the figures, and even the text were almost identical every year — we were repeating the same work to achieve similar results.
The process was inefficient. But if that’s the way it’s done, that’s the way it’s done, right?
A new era for NCCS
Wrong (obviously, to anyone reading a data science blog). Recent changes in the NCCS have jump-started a redesign of our Nonprofit Sector in Brief as part of a larger initiative to make NCCS data more accessible, transparent, and reproducible.
In 2017, we unveiled the National Center for Charitable Statistics Data Archive as the first step in this initiative. Historically, NCCS data were distributed through a registration and fee–based system — you could find some aggregate numbers on our website (and in the Nonprofit Sector in Brief), but full access was restricted to those willing to pay for it. The new data archive, for the first time ever, released all our data (almost 30 years’ worth on all nonprofit organizations registered in the US) in CSV format for free, without requiring registration. NCCS data are now more accessible than ever.
This month, we’ll unveil the next step: our new website. Designed with our new initiative’s core tenets in mind, the new website allows users to find not only NCCS publications but also the data and code used to create those publications.
And a new Nonprofit Sector in Brief
When considering how existing NCCS products might migrate to our new website, the Nonprofit Sector in Brief was an obvious candidate for a focused redesign as a reproducible, iterative document using R.
First, as mentioned, the brief’s creation was inefficient. Moving from the suite of systems described above to a single system would streamline the process and reduce possible translation errors between analysis platforms.
Second, we knew that any additional effort spent on this iteration of our annual report would pay off — time spent this year could be recouped with faster turnaround in the future.
Third, the brief’s final format does not change much from year to year — we usually know what we want to say and how we are going to say it. Because we had an end goal, the main challenge was to get as close to format as possible.
Fourth, the input does not change much either. Barring any changes to the Form 990 (the primary tax document for nonprofit organizations), we can assume the format for our data will stay relatively consistent from year to year, giving us a solid foundation.
With all this in mind, we decided to build the new Nonprofit Sector in Brief in R Markdown, leveraging tools other Urban researchers were already developing to build iterated, data-driven documents in Urban style. R’s flexibility enables us to house all the project’s needs within one system, from data input through the final HTML output.
How we did it
To see the final result, head to our Nonprofit Sector in Brief project page, to see, for the first time ever, all the code used to generate every table and figure. A few things were built into this iteration to make it both easier for internal Urban staff to replicate for future editions and more useful for external audiences. The next section will highlight the main function we used to create an easily replicable version of the brief in R Markdown but will not fully explain the entirety of the R code we used. If guides on how to use and adapt NCCS’s code are of interest to you, don’t fret. NCCS R code walk-throughs will be posted to the Data@Urban blog in the coming months.
Parameterized output
We built the new brief to be easily updated when new NCCS data are released. At the moment, the most current NCCS core file is the 2015 file. But rather than create a static link to the 2015 file, we used R Markdown’s parameterization functionality in our output to set the year of interest as a new value called NCCSDataYr
:
params:
NCCSDataYr: 2015
We call in this parameter throughout our analysis, ensuring that all relevant code will reference the correct year of data. Take the function for creating the table that makes up figure 1 of the brief:
Rather than just writing the code as
Figure1_2015 <- Fig1Table(2015)
(referring to the function we create for figure 1’s table), we call in the parameter defined at the outset of the document as
Figure1_2015 <- Fig1Table(params$NCCSDataYr)
This means that when we update with 2016 data, we only need to change that one parameter, and figure 1 will automatically update.
But we also use this parameter functionality in more than just the data-wrangling side of analysis — we use it in the text, too. Take the first line of the Size and Scope of the Nonprofit Sector section of the brief:
From `r params$NCCSDataYr-10` to `r params$NCCSDataYr`, the number of nonprofit organizations registered with the IRS rose from `r round(Table1_2015[1,2]/1000000,2)` million to `r round(Table1_2015[1,6]/1000000,2)` million, an increase of `r Table1_2015[1,7]` percent.
Rather than writing the text explicitly (“From 2005 to 2015…”) we use the year parameter to ensure that the text always reflects the year of our data. The numbers in the text are also generated by the analysis — in this case, the numbers given in Figure1_2015
.
Using R to download the most current data…
We can also use R to pull data directly from the NCCS Data Archive, rather than manually navigating our browsers to the website and downloading the data ourselves. First, using the GET function as part of the httr
package, we can write a simple function for pulling data from the archive: e.g.,
core2015_pc<-GET(“https://nccs-data.urban.org/data/core/2015/nccs.core2015pc.csv")
Because the NCCS Data Archive files and related URLs all follow a standardized structure, we can also go one step further and create a function to automatically redirect to the proper URL, given the year and type of file (for public charities, private foundations, or all other types of tax-exempt organizations) as inputs, like we’ve done in this code, which creates a function named getcorefile
. Finally, we use the NCCSDataYr
to make sure we are using the right data source:
core2015_pc <- getcorefile(2015, “pc”)
…and to save the results of analysis
On the other end, we also make sure we save the output of any of our analyses as part of our code. For example, the aforementioned figure 1 table gets saved directly after running the function through
write.csv(Figure1_2015, “Figures/NSiB_Figure1_Table.csv”)
This makes it easy to upload all intermediary steps as part of the final publication. In fact, interested users can find any of the raw tables or data used in the Nonprofit Sector in Brief on the project page.
Create all graphs using ggplot2
Access to ggplot2
is another benefit of using R for writing the Nonprofit Sector in Brief. This flexible package enables us to make the figures and graphics used in the brief reproducible, scalable, and consistently branded. Previously, all graphs and figures were constructed using Excel, but ggplot2
enables us to get close to Urban standard without leaving the R suite.
For example, compare this graphic from the 2015 brief:
With this version from the updated 2018 version:
The 2018 graph was constructed using ggplot2
(and aligned with Urban styles using our urbnrthemes
package). By using R to make the figure, we ensure that it will be easy to update with new data, while keeping the style and formatting correct, and that we can share the code so our audience can make similar figures themselves.
The finished product
All this functionality streamlines the creation process for the Nonprofit Sector in Brief and makes it more useful to our audience. For the first time, we can share the fruits of our labor (the summary analysis widely used for sector analysis) and the data and code to create the finished product. The work this redesign required would not have been appropriate for all Urban projects — one-off projects, or those that expect substantial changes in either the data input or finished output, might not be worth the additional labor. In this particular case, we feel that the value of a more accessible, transparent, and reproducible brief was worth the effort, and we hope you do, too.
-Brice McKeever