Illustration by Alysheia ShawDansby

Designing a New Access Measure for the Spatial Equity Data Tool API

Data@Urban
8 min read1 day ago

The Urban Institute’s Spatial Equity Data Tool (SEDT) is a free, powerful software that allows users to upload point spatial data — such as the locations of parks, child care centers, public Wi-Fi hot spots, and electric vehicle charging stations — and analyze whether these resources are equitably distributed across geographies and demographic groups. Last year, we expanded the tool’s functionality by launching a public application programming interface (API), which Los Angeles, California, has already used to center equity in its budget process.

But the tool still has room to grow. Until now, the SEDT has assumed that each point in an uploaded dataset is only accessible to the residents of the census tract where the point is located. This simplifying assumption is useful because of its clarity and broad applicability to a wide range of datasets. However, this simplification can differ from how people in reality access certain resources, limiting the SEDT’s suitability for some datasets.

As an example, consider a hypothetical city with three census tracts and five resources, shown as stars. The traditional SEDT approach would deem resources R1 and R2 as accessible to tract 1. It would deem all other resources as inaccessible to tract 1 even though R3 and R5 are extremely close to the border of tract 1 and, in practice, may very well provide services to the residents of tract 1.

In this blog post, we describe a prototype expansion of the SEDT’s API, which is currently available for city- and county-scale analyses in Maryland, Virginia, and the District of Columbia and which we plan to launch at a national scale in the future. This expansion incorporates a new measure to better capture access to resources and service provision across census tract boundaries. We summarize how we collaborated with SEDT users to develop this approach and provide a technical summary.

Working with users to design a new access measure

To measure spatial access across a wide range of datasets and geographies, we began by reviewing existing literature. Data science and urban planning researchers have developed many approaches to quantify spatial access, but there wasn’t a clear consensus in the literature: most researchers selected a method based on the use case.

To understand how a new access measure could help users, we hosted five user-design sessions with SEDT users from various institutions (i.e., academic, government, and nonprofit) and focus areas (i.e., climate, housing, small business, and workforce development). We walked them through three possible access measures we identified in our research and asked which measure was most user-friendly and actionable.

We presented them with these methodologies:

1. Euclidean distance (“as-the-crow-flies” distances): Create a buffer either around the center of a census tract or the tract itself. Deem all points within the buffered area as accessible to the residents of the census tract.

2. Travel sheds (drive or walk times): For each tract, determine the population-weighted tract centroid. Calculate the travel sheds, meaning all locations a person can walk to (the walk shed) or drive to (the drive shed) from the centroid in a given amount of time. Deem all points within that walk or drive shed as accessible to residents of the census tract.

3. Travel time: Calculate the travel time between every tract centroid. For a given tract, deem all points in all census tracts whose centroids are within a given threshold set by the user as accessible.

To demonstrate methodologies, we mocked up interactive maps showing these three access measures applied to mental and substance abuse treatment center data in Fulton County, Georgia. These maps illustrated step-by-step how each access measure was calculated and what insights they uncovered in the data.

Users advocated against Euclidean buffers because they felt they could create similar buffers themselves and because Euclidean distance does not model true travel behavior. Users also viewed the third methodology, travel time, unfavorably because they didn’t want to define access by considering only tract centroids as a destination.

Participants unanimously agreed that using travel sheds to calculate access added the most value to the API. Users liked that they could use different modes of travel and travel times and thought this alternative had the fewest and most-reasonable assumptions. Still, there wasn’t a clear consensus on which travel modes to include. Participants whose work focused on rural areas emphasized the importance of having driving as an option.

Centering users in the design process helped us identify the methodology that would accommodate the widest range of users and applications.

How the new spatial access measure works

Based on this user feedback and our literature review, we selected the travel shed approach. We used walking and driving as travel modes, reflecting previous research on spatial access in urban and rural settings. We selected 10-, 15-, and 20-minute walk times and 15- and 30-minute drive times, which are commonly used in existing research. We also used 60 minutes as the upper bound for drive times in extremely rural areas.

We used the popular spatial data analysis package r5r and the R programming language to calculate travel sheds. In the steps and images below, we show the process of developing the travel sheds using r5r in the hypothetical three-tract city we described earlier.

  1. First, we determined a central point for each tract, shown in light blue.

2. We used an r5r function to “snap” those centroids to the nearest road from a road network dataset to obtain the final origin points, shown in dark blue below.

3. For each tract, we calculated the travel time to destination points. The image below uses tract 1 as an example; the triangles represent destination points. We then removed all destination points outside of the specified travel time (shown as red triangles).

4. Next we connected the accessible destinations (shown as green triangles) to create a polygon representing the area a person can travel to from the origin point in a given time using the travel mode.

5. Finally, we built the new travel sheds into the SEDT.

With this measure, the tool can deem all resources that fall within the travel shed for a given tract as accessible. Returning to the five resources we discussed at the start of the blog, using a travel shed analysis, resources R2, R3, and R5 are deemed accessible to tract 1. Note that, with this methodology, some resources within a tract can be deemed inaccessible to that tract. R1 exemplifies this in the example.

We overcame two technical hurdles to implement this approach.

First, r5r calculates travel sheds for points, but the SEDT conducts analyses at the census tract level. Consequently, we needed to identify a point that represented each census tract. Where possible, we used a census tract’s population-weighted centroid. The US Census Bureau describes a population-weighted centroid as the “balance point” where a tract would balance if it were perfectly flat and an equal weight were placed at the location of the residence of every resident in a tract. When population-weighted centroids did not snap to the road network, we tried to snap to a different central point on the tract that we calculated using the PointOnSurface algorithm. If that failed, we used a point randomly drawn from the census tract.

Second, sometimes origin points have very few destination points within the allotted travel time. When this happens, r5r is unable to generate a shed, so we modified the r5r code. Between steps 3 and 4 above, we determined if an origin point had sufficient destination points nearby. If it didn’t, we added new, random destination points surrounding those origin points and then snapped those new destination points to the road network.

Creating hundreds of thousands of travel sheds is computationally difficult, so we limited the scope of our pilot on several dimensions. We currently only support travel shed analyses at the city and county scales within Maryland, Virginia, and the District of Columbia.

Though users expressed interest in public transportation travel sheds, we were unable to produce transit sheds because collecting transit data for the entire US would have been infeasible. Additionally, we did not try to capture traffic in our drive sheds, as this functionality isn’t incorporated into r5r.

Why it’s important to choose the right travel shed

The shed a user chooses has important implications. If a user selects a larger shed (e.g., 30- or 60-minute drive shed) or analyzes data from dense, urban areas with small census tracts, it is likely that a shed analysis will deem most points as accessible to many tracts. When this occurs, many demographic groups and census tracts will show positive disparity scores. Conversely, using a small travel shed (e.g., 10-minute walk shed), particularly in rural areas, may result in the SEDT deeming some points as inaccessible to all tracts. When this occurs, the SEDT will report many census tracts and demographic groups as having negative disparity scores.

As such, it is crucial to select the correct shed for a given analysis. We recommend users consider whether resource data serves an area that can be approximated by the census tract in which a given point is located. If so, users should choose the standard SEDT approach. If not, they should consider whether most residents in the area would access those resources by walking or driving and how long residents would be willing to travel to reach that resource.

To make this concrete, consider the following cases. The example 311 data in our web tool reflects requests for services like filling potholes and removing trash. These services mainly improve areas immediately surrounding the location of the request. This makes the standard (non-shed) SEDT approach the most suitable choice for this data. Now consider hospitals in a rural county. Hospitals serve areas well outside of the census tracts where they are located, and research indicates that most rural Americans live more than 10 miles from the nearest hospital. Consequently, drive sheds are the correct choice for an SEDT analysis of hospitals in a rural county.

For more details on how sheds change SEDT results and how to select the appropriate shed for analysis, see our travel sheds documentation.

Looking ahead

We’re excited to expand what insights the new travel shed access measure can offer SEDT users, and we’re grateful to the users who helped shape our new methodology.

We hope to expand the shed functionality to the entire US and add other travel modes. We’re eager to hear users’ thoughts on the new methodology and to partner with users to fundraise for further improvements. If you have feedback, questions, or are interested in partnering with us, reach out to sedt@urban.org.

-Gabe Morrison

-Sonia Torres Rodríguez

-Alena Stern

Want to learn more? Sign up for the Data@Urban newsletter.

--

--

Data@Urban
Data@Urban

Written by Data@Urban

Data@Urban is a place to explore the code, data, products, and processes that bring Urban Institute research to life.

No responses yet