A screenshot from our recent feature on “Exploring Alternatives to the Tax Cuts and Jobs Act”

Bringing Cloud Computing Power to Microsimulation

Data@Urban
6 min readMar 26, 2019

The Urban Institute maintains several microsimulation models to help researchers and decisionmakers better understand issues such as the Social Security system, tax policy, and health insurance policy. Microsimulation is a technique that takes a person-level (or other small unit-level) file as input to simulate behavior, such as a policy or government program, and generates output at the input unit-level and provides an overall summary of the results. Because of the level of detail in both the units and in the program applying the rules, microsimulation is a powerful tool that lets researchers explore a myriad of “what-if” scenarios.

The Urban-Brookings Tax Policy Center’s (TPC) microsimulation model uses an input file of representative tax filers and contains detailed code to simulate revenue and distribution estimates of the US federal tax system. It can simulate both current law and alternative policies. Users can modify the tax schedule to add tax brackets, modify tax rates, or change the way the child tax credit is phased out for married units. Running the model with these alternative parameters enables researchers to compare those results with those under current law. This capability allows for a detailed comparison between an alternative scenario and the current law baseline. Researchers can investigate who would be affected by such changes, what revenue effects would be, and so on.

Until recently, researchers have done much of this work manually. They translate alternative policy proposals into values and rules that are specified in an input file — called a parameter file — that defines the various policy options. With this parameter file prepared, the research team can then run the model and analyze the output. The model has been used for more than a decade, but over the past year, Urban’s Technology and Data Science team has joined forces with TPC researchers to move the model into the cloud, helping the model run faster, longer, and with more data.

But what if instead of just running a few alternative scenarios, we ran thousands? What if we didn’t look at just the set of options proposed but other similar options? Are there other scenarios that produce similar summary results but affect the population differently?

That is exactly what we’ve built a cloud-based architecture to do.

With this new capacity, we open up new opportunities to better inform researchers, policymakers, and the public. Instead of thinking about a single model result, we can now explore the output of thousands of different proposed policies. This means we can look at the space of results and discover which policy or types of policies produced a particular set of outcomes. We could eventually cull this field automatically for decisionmakers to produce a small set of the most impactful policies, based on their desired outcomes.

In one such example, researchers used the TPC microsimulation model to explore the workings of the Tax Cuts and Jobs Act (TCJA) with 9,216 model runs. Each model run specified a variation on some of the TCJA’s major provisions to help people understand the current law and the potential effect of alternative provisions and details. This ability allowed the research team to explore the model runs against each other and against the TCJA. They found key insights that would have been otherwise difficult to see. For example, they found that for low-income families with children, the child tax credit has the largest effect on their change in after-tax income. For this group, the other tax parameters we change have a dramatically smaller effect on income (although they do affect the total change in federal revenue). You can read more about this analysis in TPC’s feature, “The TJCA, What Might Have Been” and the corresponding data visualization feature.

With this understanding of the microsimulation model, let me now walk you through how the Tech and Data team built the cloud-based architecture.

Modeling in the cloud saves time and money

We began this journey confident that we could decrease run time of the model in a cost-effective manner. In its new state, the architecture reduces total run time by at least a factor of 65, and operation costs are under $0.02 per model run.

We can set up and run the model over 2,500 times in about four and a half hours. This time includes not just the model run itself but also the setup required to define the parameter file and the synthesis of those model run results — keeping results linked to their parameter options and generating estimates of revenues for each.

Without the ability to run in the cloud, we estimate it would have taken 260.4 hours — almost two weeks — to just set up and run the model 2,500 times. This time still does not account for organizing model output and constructing an analysis file, a task that is hard to estimate or imagine without bringing together many of the components that the cloud provides. Such long run times restrict the research team’s flexibility to respond to different policy proposals and the quick turnaround demands of a rapidly changing policy environment.

Cloud computing isn’t new, but creating a powerful new system was challenging

Using cloud computing to boost and supplement resources is not a new idea — it is, in fact, one of the main draws to running calculations on the cloud — and neither is the idea that we could discover something interesting or important from running a model several hundred or thousand times. High-performance computing has been used for years in government and across academia for physics, astronomy, and engineering. In many of these subject areas, data have always been quite large, whereas in social sciences, smaller data have historically been used, and personal computers and smaller servers have generally been well-suited for the task. Now that resources are more easily accessible in the cloud, we can replace personal computers with more powerful cloud-based compute resources.

As we thought about running a microsimulation model thousands of times, we kept several goals in mind:

· We wanted to make very few code changes to the actual model. The TPC team has a well-established model that is fully functional and produces reliable and accurate results. We want to keep the code easily readable and understandable to those who use it daily.

· We wanted the capacity to handle bigger datasets (scale vertically) and the capacity to handle large amounts of runs (scale horizontally).

· We wanted the process to be cost-effective.

· We wanted to allow for focus of expertise — letting the technical team focus on the technical aspects and letting the analyst team focus on analyzing the results.

Our challenge was to meet each of these objectives while working through an entirely new architecture. Each of the components in our new infrastructure — creating the parameter file that defines run time options, doing the model run, and creating the analysis dataset — was its own unique challenge requiring a technical solution. In a forthcoming post, I’ll walk you through how we resolved these challenges.

Moving forward

With these challenges behind us, we take the first step toward creating a full space of results to explore for optimal solutions. Instead of being reactive to the policy environment, we can proactively test policy levers, assess demographic and economic changes, and explore other options to find mechanisms that may reach similar results via a different path.

In the near future, we imagine using this microsimulation modeling at scale for other models, for alignments, and for creating other spaces to explore. Many models use an alignment process to fine-tune estimates and to ensure that they hit known targets — such as making sure total income in the model matches total income in the actual economy. In many cases, this is a manual process that takes time to set up and run. In the same way we were able to programmatically generate parameter files with small variations for the TPC microsimulation model, we could also begin to programmatically perform alignments and targets. We envision beginning to apply optimization and learning techniques to both explore and create result spaces.

-Jessica Kelly

Want to learn more? Sign-up for the Data@Urban newsletter.

--

--

Data@Urban
Data@Urban

Written by Data@Urban

Data@Urban is a place to explore the code, data, products, and processes that bring Urban Institute research to life.

No responses yet