Joining ACAG Annual Estimates of PM2.5 with Social Determinants of Health (SDOH) data
Link to code
Authors/Creators/Team Members: Zoé Haskell-Craig and Priyanka deSouza
Specific purpose of code: This code aggregates gridded annual average PM2.5 concentration estimates produced by the Atmospheric Composition Analysis Group (ACAG) to the census tract level, producing a variable containing the average PM2.5 exposure for each tract. This is then combined with socioeconomic and demographic data available at the tract level from the social determinants of health (SDOH) database produced by the Agency for Healthcare Research and Quality (AHRQ).
General Application: This code takes advantage of the `tigris` package to aggregate high resolution (fine spatial scale) modelled estimates of PM2.5 pollution to the administrative boundaries at which demographic and SDOH data are available. As an example, here we demonstrate computing the annual average PM2.5 concentrations in 2020 for census tracts and combining this with SDOH data on race/ethnicity and income. With minor changes, this code can be used for other years and temporal resolutions (i.e. monthly estimates of PM2.5) and for other administrative units (ZCTAs, blockgroups, counties, etc).
How does or could this code allow researchers to assess research questions related to aging or life course?: While the dataset output from this code does not contain variables on age, the raw SDOH dataset contains census information on age which could be included with minor edits to the code. Also, the aggregation of PM2.5 exposure to the census tract (or other administrative units) allows researchers to combine this exposure with other information available from the census, such as income by age breakdowns.
Data sets used:
- Social Determinants of Health dataset from AHRQo. Publicly available.
- Modelled PM2.5 concentration estimates from ACAG (version V5.GL.05.02)o. Publicly available.
Are all the data publicly available or are some restricted-access? Publicly available
Coding Language: R
Tools and Packages used: tidyverse, readxl, sf, raster, ncdf4, exactextractr, tigris, viridis
Output(s): Dataset with census tracts as unit of analysis and a map of the average PM2.5 per census tract in 2020.
Spatial extent: Continental US
Temporal extent: Single-year, 2020 (code can be modified to produce data for any year from 1998 – 2023, or monthly for any month in that period).
Additional Comments: Journal article using this code is forthcoming.
Published papers that use this code: Zoé Haskell-Craig, Kevin P. Josey, Patrick L. Kinney, and Priyanka deSouza. (2025). Equity in the Distribution of Regulatory PM2.5 Monitors. Environmental Science & Technology. DOI: 10.1021/acs.est.4c12915