Learning R: A Conversation
Jon Schwabish and Aaron Williams
Learning a new programming language can be difficult. Even if you have coding experience, a new language introduces a new environment, styles, and syntax and makes you feel like a beginner again.
So how do you go about learning a new programming language? In this post, we share a collaborative effort in which we tried a learning “sprint” for R programming.
We were encouraged to publish more details about our experience after this podcast discussion and Twitter thread seemed to resonate with people. How people learn new skills — programming languages in particular — is something we are both interested in, and we hope this post — a conversation between the two of us — will help you consider different ways you might learn or teach a new skill.
Let’s first set the stage.
Jon Schwabish (senior fellow in the Income and Benefits Policy Center): I am an economist and senior fellow at Urban. I spent nearly a decade at the Congressional Budget Office prior to moving to Urban. I can code to varying degrees of proficiency in Fortran, SAS, and Stata. As the R language has gained traction, and as someone who spends a lot of time in the data visualization field, I’ve found it increasingly important to learn the language. Programming languages like R have certain advantages over click-based or drop-and-drag tools like Excel and Tableau, and I wanted to expand my skill set to capitalize on those advantages.
Aaron Williams (data scientist in the Income and Benefits Policy Center): I am a data scientist at Urban. In addition to my research efforts, I also lead Urban’s R Users Group and assist researchers across Urban with projects that use R for statistical analysis, data visualization, mapping, and automation.
Jon: I’ve tried to learn R in the past, without much success. I’ve tried several Massive Online Open Classes and online courses, the most recent being the excellent course offered by Andrew Tran at the Washington Post through the Knight Foundation. But two or three weeks into the program, something would come up or life would get in the way, and the class would go to the back burner, never to be revisited.
Last November, we had the following email exchange that, in abbreviated form, went like this:
Jon: Hey Aaron, do you know of any in-person two- or three-day courses in R (in DC)? Any thoughts?
Aaron: Jon, I don’t, but I can look around. Honestly, buy me a Chipotle burrito, and I’ll hang out with you for a couple of days.
Deciding what to learn
Jon: And that’s where it began. A few follow-up emails set the two-day learning “sprint.” To help set the agenda, I sent an email with a set of topics:
- set up the basic R workspace (e.g., R script versus R project)
- read in data from CSV, Excel, and Stata
- write data (CSV, Excel)
- how to reshape
- run simple tabs (means, variances — equivalent of Stata’s ‘summarize’ and ‘tabulate’ commands)
- how to switch between different datasets that are loaded into R
- working with strings and number formats
- dataviz:
- standard lines, bars, scatterplot, bubbleplot
- coloring different data values (e.g., assigning two different colors to two different groups in a scatterplot)
- small multiples (e.g., trellis display)
- maps (a better way than we did it before, which was more manual)
- more advanced charts: box and whisker, beeswarm, raincloud
- tile grid map (I don’t really want to make this but just want to understand how to read and use other user-created libraries)
- save graphs
Aaron: It was an ambitious set of material for two days. I was concerned about fatigue. To his credit, Jon turned off his phone and email when we were working. We also took an extended taco break on the first day and a long burrito break on the second day.
Jon: For two days we worked through R, from setting up scripts to projects and, eventually, to RMarkdown. Though I am familiar with the basic workings of R, I thought it was helpful to imagine I was starting from scratch. With R and RStudio installed, we talked about the basic differences between scripts and projects and the overall RStudio environment.
We started with some of the built-in R datasets and began working our way through my list. We worked through the basics, such as filtering, arranging, selecting, and subsetting the data, and we talked about the pipe syntax, simple summary statistics, grouping, and joining. Once the basics were covered, we turned to a simple dataset I had on my computer with hockey data that we could use for fun. We cleaned and explored the data, created new variables, merged datasets, and visualized different metrics.
Aaron: My approach is hands-on and rhetorical. Each skill or concept is taught as the direct result of a motivating question. For example, “The data are in a rectangular format. What if we only want a subset of columns from this rectangle? What if we only want a subset of rows?”
Jon: We worked our way through almost everything on the list. At first, the questions were simple and usually ended up as one function or line of code. As we progressed, the questions were less about programming and more about analysis. Finally, we were stringing together many lines of code for questions about hockey to which neither of us knew the answers.
Aaron: Together, programming and analysis compose the process of asking a question and then finding an answer. At the beginning, I ask the questions and offer the answers. As time progresses, the responsibilities shift. By the end, the goal is to have the student ask the questions and find the answers. At that point, my job is to troubleshoot, limit frustration, offer advice on coding style, and learn from the “student.” Jon knows a lot about analysis and data visualization. I learned a lot from working with him for two days.
Jon: The thing I valued most was being shown how to do something, then trying to do it on my own, but then being immediately helped when I got stuck. This may sound like a minor thing, but it’s not. Struggling for a few minutes is one thing, but struggling for hours is another. Aaron’s approach of letting me try for a bit and then jumping in when it was clear I was stuck was immensely helpful.
Aaron: Learning programming is challenging, and there is a level of frustration that is helpful but another level where you can get turned off by the task. I find that when the frustration means you’re not having fun, it’s time for a break or to seek help.
Jon: This part of our sprint really shouldn’t be underestimated. The MOOCs and online classes often have forums or discussion boards, and I have found that instructors and other classmates will generally try to weigh in with advice and suggestions. But I’m not too patient, and it means that if I get stuck on something, I post a question to the forum and then need to wait for an answer. With someone sitting next to me who can answer my questions immediately — or at least know how to search for the answer — I can get immediate feedback to keep moving.
Aaron: Asking for help is a big part of learning. People often ask me for answers I don’t have, but then they can see how I go about finding the answers (and vice versa). For example, Google can return a lot of old Stack Overflow posts. I always set the Google search window to the past year to get more recent responses. This is small but makes a big difference because more recent examples tend to be more efficient and easier to implement. People can also learn how I phrase questions for search and how to post reproducible examples. Finally, creating little experiments to test my predictions about the functionality of code versus what the code actually does is a valuable way to learn and to solve problems.
Jon: Aaron hadn’t seen the hockey data before we started, so there was a variety of data issues and specific tasks and commands that Aaron either didn’t remember or know how to handle. Going through this process together is a big advantage because he is not only teaching me how to solve the problem directly but also showing me how to find the answers.
Aaron: Programmers are often portrayed as omniscient speed-typers. Going through this process together also enabled Jon to see how much I look at documentation and the frequency with which I Google. I think it’s important to demonstrate a little struggle to put others at ease.
Jon: Aaron makes a really interesting point here, that when you’re learning a new skill, you may feel like your instructor has all the answers. But that’s rarely true. And seeing how an expert searches for answers and struggles with an unknown solution can be especially valuable. You don’t always get this from a class (online or in person), but this kind of one-on-one environment can be really helpful in that respect.
Aaron: One-on-one training isn’t available to everyone, but it’s important to find a community and mentors when learning a new programming language because the little details are best learned through proximity to experienced programmers. It’s easy to fall into the trap of spending all day alone behind the screen. Engaging with the larger R community is one of the biggest selling points of R, and it makes learning R easier and more enjoyable.
Summary
We recognize that this one-on-one training isn’t available or possible for everyone, and it’s not a scalable solution. But in the individual or small-group sense, it may be an approach worth considering and exploring. If you’re the expert in an organization in some skill set that you think others should learn, consider reaching out to them and encouraging them to engage with you and learn the skill for themselves. Perhaps offer regular individual or group training sessions, provide additional resources, and be a sounding board for ideas and attempts. In this way, you not only build capabilities throughout your organization, but you can also help build a collaborative and supporting culture in your organization that goes beyond skills training.
If you’re interested in learning R, we can recommend these resources: