Illustration by Photo Royalty/Shutterstock

Iterated PDFs with R Markdown

Data@Urban
5 min readAug 21, 2018

Getting evidence into decisionmakers’ hands elevates the debate. As I outlined in Iterated fact sheets with R Markdown, R Markdown can be used to iterate fact sheets that distill large amounts of research into many smaller documents based on geographies, time periods, or organizations that are catered directly to individual decisionmakers’ interests.

The defaults and design of R Markdown are best for making styled HTML documents, but audiences are often better served by PDFs. Unfortunately, heavily-styled PDFs are tougher to make in R Markdown. To match the ease of making HTML documents, we created some tools for making branded PDF fact sheets with R Markdown at the Urban Institute.

HTML vs. PDF

R Markdown is best for creating HTML documents because it is based on Markdown, a simple markup language for generating HTML that is easy to learn. Standardized CSS can be used to style documents so they match an organization’s branding standards with little effort. HTML is more flexible than a PDF and can include floating tables of contents, tabbed sections that save space by layering many images, and interactive elements like HTML widgets. HTML is also responsive on different devices.

But not everyone is comfortable with HTML documents. Even people who effortlessly browse the internet are sometimes uncomfortable managing and navigating HTML files outside a web browser. For this audience, a PDF’s inflexibility is a strength — these files have a single, set layout that does not vary when opened by different software or on different devices, and they are easy to print.

This is a major benefit for researchers and policymakers. So built tools for creating iterated, branded PDFs in R Markdown that are as simple as the tools that exist for creating iterated HTML documents.

The tools needed to achieve a few goals:

  • The tools needed to be simple enough that authors could focus on analytic R code and narrative instead of getting bogged down in complicated LaTeX.
  • The output needed to consistently match Urban Institute publication standards on project after project. That meant matching stylings like using Lato font and appropriate margins and tougher demands like perfectly positioned Urban Institute contact information, logo images, and boilerplate.
  • The document needed to be reproducible. The point is to save time iterating the fact sheets. Changes need to be made against tight deadlines, and fact-checkers need to be confident in the output.

The existing process for creating fact sheets uses Microsoft Word documents exported as PDFs. To create a template that can match the existing templates and be automated, we used a combination of a .Rmd template, a large LaTeX preamble, and LaTeX macros and environments. All of the resources are available in this public GitHub repository.

New tools

Template

We wanted to use as much markdown and YAML (Yet Another Markup Language) as possible because R Markdown users are comfortable with markdown and YAML. We also wanted to make it easy to switch between HTML and PDF to allow user flexibility. The file simple-factsheet.Rmd picks the font, sets the font size, assigns the margin sizes, and picks a URL color. It also contains calls to all the custom macros and environments so users can copy-and-paste and replace filler language with their content.

Preamble

The file preamble.tex accomplishes what can’t be done in the YAML header. LaTeX preambles, the closest replacement for CSS, load LaTeX packages, updates document defaults, and defines macros and environments.

Some settings in our preamble are simple. \definecolor{urbanblue}{HTML}{1696D2} defines the color urbanblue as the hex color code #1696d2. \pagenumbering{gobble} disables page numbering. Some settings are more subtle. \usepackage[hang,flushmargin]{footmisc} drops the indentation of the footnotes.

All of this code is automatically included when Urban Institute researchers use the template because of the following code in the YAML header:

Macros and environments

CSS is powerful because of HTML IDs and classes, which allow programmers to define styles and apply them to specific elements in the document. LaTeX doesn’t have a Document Object Model like HTML, so we created LaTeX macros for the common elements of styled PDF documents, somewhat like CSS. For example, if a researchers wants to add a black, centered, Lato 14-point title, she need only wrap the text in \urbantitle{}. Under the hood, this macro calls the following code.

If a researcher wants to add an Urban blue, centered, Lato 12-point subtitle, she need only wrap the text in urbansubtitle{}. This macro calls the following code:

We provide macros for contact information, title, subtitle, authors, two types of headers, figure numbers, figure titles, figure sources, figure notes, and the boilerplate that appears on every Urban Institute publication.

LaTeX environments, which are similar to macros, are used to style bulleted and number lists. For example, the following code adds blue bullet points:

TinyTeX

Creating PDFs from LaTeX requires installing a LaTeX distribution like MiKTeX, MacTeX, or TeX. These distributions are large, are clunky, and often consume IT resources to maintain. Worse, they don’t work that well.

Yihui Xie’s new tinytex R package changes everything. It is lightweight, is low maintenance, and can be installed like any other R package using install.packages(). Furthermore, Xie is a gifted technical writer and prolific debugger, so the package is clearly documented and works well. If you don’t have a LaTeX distribution, run the following code and you won’t have to think about your LaTeX distribution again:

cairo_pdf

library(ggplot2) creates images that are embedded in output documents. Embedded images in R Markdown HTML documents are usually .png files that can easily handle custom fonts. Custom fonts don’t work as well when you include them in PDFs, but Cairo, a graphics library included in R, solves this problem by embedding custom fonts in the images created by library(ggplot2). Include knitr::opts_chunk$set(dev = “cairo_pdf”) in an R chunk at the top of the .Rmd file to permanently resolve this issue.

ggplot2 theme

Data visualization is a major motivation for using R to create fact sheets. Our clear style guide and custom ggplot2 theme are fundamental to creating consistent styles and saving editing time. The process outlined above is useful because of the work done to set visual expectations with the style guide and standardize those expectations with the theme.

PDF fact sheets cater to the wide and numerous insights of Urban Institute researchers and to the narrower needs of decisionmakers.

Under the hood, these tools are complex. In practice, users are pleasantly ignorant of most of the complexity. A researcher can copy the repository, edit the template, and style her fact sheet with macros reminiscent of R code.

Researchers can focus on their insights instead of LaTeX, the product closely mirrors the needs and desires of our communications team, and the output clearly communicates Urban Institute brands and styles while remaining reproducible and easy to scale.

-Aaron Williams

Want to learn more? Sign-up for the Data@Urban newsletter.

--

--

Data@Urban

Data@Urban is a place to explore the code, data, products, and processes that bring Urban Institute research to life.