Building an R community at the Urban Institute
R is one of the premier software environments for data science and one of the fastest-growing programming languages. Created by researchers for researchers, R has intuitive tools that make it perfect for traditional social and economic policy research. R also contains tools for contemporary methods like web scraping, geospatial analysis, and massive data analysis.
In this post, I’ll talk about how we built an R community at Urban and how we measure our success and growth. The Urban Institute R Users Group pursues this mission with R Lunch Labs, on-call support, limited seed funding, and resources and tools.
The Urban Institute R Users Group is committed to exposing researchers to the joy and power of R; developing beginner, intermediate, and advanced R skills; encouraging and supporting novel applications of R to public policy research; and building a diverse and supportive community of R users.
R Lunch Labs
Every Friday, we host casual, hands-on training sessions for R users of all abilities called R Lunch Labs. Each session begins with about 10 minutes of instruction to tackle either a problem encountered through on-call support that week or a recent blog post from the R community.
Next, attendees break into small groups. New users get instruction about the tidyverse or other functionality in R, and more experienced users work together on projects ranging from the exercises in Garrett Grolemund and Hadley Wickham’s book, R for Data Science, to strategies for cluster analysis and distributed-computing with Spark.
Traditionally, programming training at the Urban Institute has been dominated by five to six sessions during the summer when most new junior hires and interns join the organization. While this gives new hires the fast, crash-course introduction they need to get started, it has two problems. First, programming is hard. We set up new users to fail through frustration if we cover a job’s worth of skills in a handful of hours. It too often feels like the draw-an-owl meme:
Second, the more standard approach of a one-hour lunchtime seminar on what R can do and how to use it usually fails both the audience and the speaker. I have given several of these talks, but the typical lecture style — slides, screenshots, and code snippets — and the short time frame forces the audience to meet me where I thought they should be instead of where they are. Longer trainings, over longer periods where people can also apply their learning to ongoing work, is a better way to teach these tools. By comparison, the R Lunch Lab strips down a training session to its most useful components and repeats them over and over, stressing mastery instead of exposure.
R Lunch Labs took a few repetitions to feel natural. Now, experienced attendees — who are now experienced R users — volunteer to answer questions just as frequently as the group leaders, and we’ve developed a supportive community across Urban. Additionally, by finding common ground with R, researchers have found new collaborators across centers, which is often a struggle in research and analytics work.
On-call support
The biggest hurdle to R adoption is deadlines. It is difficult to innovate or change practices when trying to finish deadline-driven work. It’s like trying to replace a train engine while the train is trying to get to the next town on time.
The R Users Group offers on-call support so researchers can get same-day help when they encounter a problem. Having access to a partner for troubleshooting last-second issues reduces the pressure of deadlines and encourages researchers to step out of their comfort zones. In our experience, 80 percent of effort and stress comes from 20 percent of the work. Our goal is to ease the burden of this 20 percent with technical assistance and a little positive reinforcement.
For example, a coworker was recently analyzing survey data on a deadline. She knew how to analyze each question, but her R script was nearing 2,000 lines, she hadn’t finished, and time was running out. We met for 45 minutes, and I helped her write a function to generalize the process, save lines of code, and save time. This type of help saves time and money for Urban and gives researchers the confidence to try something they did not know how to do in advance.
Seed funding
The second biggest hurdle to R adoption is legacy code. The Urban Institute recently turned 50 years old, and the code bases for some projects, in languages such as FORTRAN, extend all the way back to Urban’s founding. Many projects have histories that stretch back more than a decade. Researchers on these projects are understandably more interested in using their models to find new insights about social and economic policy than rewriting the code, complicating their code bases, or adding new tools.
But projects that have been developed for more than 10 years are some of the best opportunities for innovation. R can be used to augment, not replace, the existing capabilities of these models. For example, R can build on the output of an existing model by adding improved data visualizations or creating fact sheets.
The R Users Group offers limited seed funding, mostly to junior staff, to create new tools, methods, or libraries that augment the capabilities of legacy code or that solve problems from legacy methods. As one example, Urban’s library(urbnmapr) R library started with seed funding through the R Users Group when a researcher shared her frustration moving clunky shape files around and using ArcGIS to make simple choropleth maps. Now, library(urbnmapr) is a tool researchers both inside and outside Urban use.
Resources and tools
R Lunch Labs, on-call support, seed funding, and daily programming require a lot of time from the users group leaders and don’t scale well. But these efforts create a lot of code and ideas as by-products, and we are deliberate about capturing the value of these by-products for others.
We scale our efforts and help more people inside and outside Urban by putting as much code as possible on GitHub, explaining popular methods on the R Users Group website, and creating R packages. Because researchers and their research assistants often create their code from scratch, these resources help save them time and energy.
The Urban Institute has internal and external GitHub accounts. On our internal GitHub account, we have an R Programming at the Urban Institute website with an intro to R guide, data visualization guide, mapping and geospatial analysis guide, tables guide, and optimization guide. The website was built with R Markdown and is hosted for free on GitHub pages. It is organized by outcomes so researchers can search for what they want, whether it is code for a bar plot or methods for geocoding addresses, and has collapsible code to encourage copying and pasting. We’ve found that providing code to allow users to copy, paste, and tweak is the best way to help early R adopters and allows for the subtle communication of organizational standards and best practices.
Finally, we’ve spent a lot of 2018 writing R packages:
• library(urbnthemes) is a ggplot2 theme that styles data visualizations to meet Urban Institute standards
• library(urbnmapr) makes it easy to create state and county choropleths in R
• library(urbntemplates) simplifies the process of sharing code that will be frequently used with tools like R Markdown and R Shiny
R packages allow us to scale our efforts, subtlety propagate best practices, and create standards around styles or methods.
You make what you measure
We track several metrics to see if our efforts are worthwhile — tickets for on-call support, attendance to R Lunch Labs and other events, R Users Group website views, and all the Urban Institute publications that use R.
Most importantly, we get a weekly report of two measures of use of all statistical packages at the Urban Institute. The first measure captures the number of unique R users in the previous 90 days. The second is the number of unique users who have opened a given statistical package in at least 4 of the past 10 weeks. This helps us understand whether these approaches are working, and if not, to understand how we might enhance or restructure our efforts to better encourage R adoption.
All these metrics agree — R is the fastest-growing programming language at the Urban Institute and is quickly becoming a primary tool for social and economic policy research.