Contextual Data Resource for Aging Surveys

Written by Julia Shipman on July 6, 2026. Posted in Uncategorized.

Contextual Data Resource for Aging Surveys

Link to data

USC/UCLA Center on Biodemography and Population Health (CBPH)

Click here

Prepared by: Alex Mikulas, PhD, CACHE postdoctoral associate

Date: July 2026

Principal Investigator: Dr. Jennifer Ailshire, University of Southern California

About the data:

(Paraphrased from CDR website) The Contextual Data Resource (CDR) is a collection of user-friendly datasets that integrates contextual data with several extensive datasets and surveys on health and aging. The CDR enables researchers to study the impact of place on health and well-being among older adults within the structure of existing and well-used data sets. Available contextual data within the CDR include measures on socioeconomic and demographic structures, economic conditions, social stressors, health care, physical hazards, amenities, and the built environment. Depending on the underlying data source, measures are available at multiple spatial scales over several decades.

The potential to link this contextual data to aging surveys will increase opportunities to analyze prospective effects of environmental conditions on health and aging, the effects of residential mobility on aging-related outcomes, and the ways environments change around older adults as they age in place.

Data are available on:

The CDR datasets can be accessed for use with several restricted-level datasets via the MiCDA Enclave Geographic Linkages Repository, including the Health and Retirement Study, Panel Study of Income Dynamics, and National Health and Aging Trends Study. The data is also available for use with the Understanding America Study via a Tier 3 Data User agreement and with the Hispanic EPESE (Established Population for the Epidemiological Study of the Elderly) via the Hispanic EPESE study team. Contact CDR administration for data access for use with other datasets, cdradmin@usc.edu.

The Contextual Data Resource gathers and processes data from multiple sources into user-friendly data that can be easily integrated into aging surveys. These datasets include the following:

USDA Food Environment Atlas and Food Access Atlas

Uniform Crime Report from the FBI and NACJD

US Census Bureau decennial counts and American Community Survey

Dartmouth Atlas of Healthcare

Air Pollution (0.3 and PMD2.5) from the Fused Air Quality Surface Using Downscaling Files (FAQSD)

Air Pollution (ATMOS) from the Atmospheric Composition Analysis Group (ACAG)

Street Connectivity from the US Census Bureau TIGEWR/Line shapefiles

Noise data from the National Parks Service Geospatial Sound Monitoring Files

Area Health Resource Files from the US Health Resources and Services Administration (HRSA)

Weather and environmental data from gridMET

What am I reading? Mapping Social Vulnerability and Uncertainty: An example for older adults at risk in the wildfire context

Written by Julia Shipman on June 29, 2026. Posted in Uncategorized.

What am I reading? Mapping Social Vulnerability and Uncertainty: An example for older adults at risk in the wildfire context

Link to article

If you’re interested in contributing a short What Am I Reading post, we’d love to hear from you! Email us at cache@colorado.edu.

Written by Sophia Arabadjis, MA, MSc, PhD. Institute for Implementation Science in Population Health, City University of New York Graduate School of Public Health and Health Policy

Have you ever considered linking population data to information from a thematic map on, for example, coastal flooding, social mobility, wildfire hazard or social vulnerability ? Maps are powerful communication tools, especially when interactive, allowing users to explore and visualize finely resolved geographies like ZIP code tabulation areas or census tracts or block groups. However, these maps and measures — and the data underlying them — are not always precisely measured, which can misconstrue spatial patterning of local risk and vulnerability.

These issues of measurement have very real implications for researchers and decision-makers who often use maps like these to provide environmental context or guide policy and resource allocation, respectively. Ignoring measurement issues can yield inaccurate insights.

This “What am I Reading?” entry provides an example of the implications of measurement error. Using material from our recent Annals of the American Association of Geographers paper, we explore how the sampling uncertainty in different data sources impacts wildfire vulnerability index construction. We describe various methods to construct vulnerability indices; discuss the sources of data used to create measures; and show how ignoring the sampling uncertainty can lead to false conclusions about vulnerability and a misallocation of resources[1].

Creating Composite Measures

In both academic literature and policy applications, it is common to summarize an area’s vulnerability using a singular score or number based on a scale or rubric, called a composite index. Composite indices are functions of several different measures recorded at each location or geographic unit across a landscape. Measures generally reflect population (social vulnerability components) and environmental context (locational vulnerability components) depending on the application. For example, the Center for Disease Control’s Social Vulnerability Index (SVI) combines national survey measures of socioeconomic status, household composition, racial and ethnic minority status, and local housing types/transportation to estimate overall vulnerability across a landscape[2]. For each location, or areal unit, the measures are summed to create a singular SVI score; scores between 0.75 and 1 indicate high vulnerability. The SVI and other similar composite indices (e.g. SoVI, SEDAC [3-6]) are increasingly used to prepare and direct resources towards areas in the top ~10% of scores, a.k.a. “highly vulnerable” areas[7-11].

Considering Measurement Error

While environmental measures may come from a variety of sources, the population measures are often from the same source: the American Community Survey (ACS). The ACS is a monthly survey of roughly 3.5 million people in the United States. The survey asks detailed questions about economic, social, and demographic characteristics, which are then aggregated (across areas and years) to different census geographies, from census block groups (600 – 3,000 people) all the way up to national estimates. Though 3.5 million people may seem like a large survey, this equates to roughly 14 people per census block group annually[12]. These small sample sizes mean that the errors within a given measurement may be quite large[12], which in turn means that the true value of the measurement could feasibly be any value within a large range of values — simply because of the sampling design.

Fortunately, along with the population measure point estimates (for example, the population of older adults in a given area), the ACS also provides an estimate of the uncertainty due to the sample design[12-13]. The ratio of the sampling uncertainty value to the point estimate, called the coefficient of variation, can give us a quick insight into the estimate’s precision. For example, suppose the ACS suggests that 15% of the population in a given census tract is above the age of 65 with a standard deviation of 1.8%; the ratio would be 0.018/0.15 = 0.12, which is a relatively low value, so we have relatively high confidence that ~15% of the population in that area is actually an older adult. However, if we have the same estimate with a standard deviation closer to 8%, then the ratio is over 0.5 and suddenly we have a lot less confidence that 15% of the population is accurate. Figure 1 maps the ACS coefficient of variation for a subset of urban census tracts and block groups in Santa Barbara County. A coefficient of variation (CV) greater than 0.12 is relatively good value; a CV of 0.5 or greater suggests much less confidence and more uncertainty. The table at right gives the percent of Census block groups that fall in each category for this urban subset. The maps show higher coefficient of variation values at smaller census block group geographies, which indicates more error within smaller geographic units.

[Caption: Figure 1 maps the coefficient of variation (standard deviation/estimate) for a subset of urban Census tracts and Census block groups.]

Environmental measures also have uncertainties and measurement issues, though these generally take a different form. The error or noise in physical measurements may come mechanistic constraints of instruments (for example specificity and tolerance in thermometers), from spatial interpolation (e.g. creating a rain surface from point measurements across a landscape), or model-based variation (e.g. gridded outputs of complex physical or statistical models.) Some of these uncertainties may be knowable, such as the tolerance of an instrument, but others are not so easy to quantify.

Simulation is one way to propagate measurement error and assess its effects on the stability of findings. In a simulation framework, samples of each variable are repeatedly drawn from a range of acceptable values with an assumed shape (e.g. a normal distribution), and then summarized across those values. For example, one could take the pixel values of an environmental variable and the tolerance of that instrument and treat those as the mean and variance of normal distributions across a landscape (see Figure 2). Then, over the course of several samples (perhaps 1000), trends in values can be summarized. In our case, we are interested in how many times a specific pixel appears in the top 10% of values across each simulation. This is the recurrence rate and provides us insight into the stability of the top 10% of the distribution (“highly vulnerable”) areas.

[Caption: Figure 2 Visible Atmospherically Resistant Index (VARI) is a satellite-derived measure that is closely correlated with live fuel moisture. Displayed are the VARI values per pixel in an urban area of Santa Barbara County in the period just before the Thomas Fire of 2017.[18]]

An example of the implications of sampling error using a wildfire risk index

To ground our investigation of wildfire risk, we chose the 2017 Thomas Fire as a case study. The Thomas Fire began December 4th, 2017 in Ventura County California and quickly spread into neighboring Santa Barbara County. Wildfires in this region of California are characterized by chaparral vegetation (burn-adapted shrublands), steep canyons, mesas, and unique wind patterns that make wildfires in this area both fast-moving and highly destructive[14-16]. By December 6th, 2024, the Thomas Fire had engulfed more than 100,000 acres; it would go on to consume 281,000 acres, destroy 1000 structures, and force more than 100,000 people to evacuate (see Figure 3)[17].

[Caption: Figure 3 The Thomas Fire final fire perimeter (red, hexagonal pattern) burned into densely populated coastal regions of Santa Barbara and Ventura counties (a-d). The estimated population aged 65 years or more per US Census block group (left column) and tract (right column) shows the proportion of older adults varies spatially across counties. Source: Arabadjis et al., 2025[1]]

With the Thomas Fire as our backdrop, we construct a simplified wildfire risk index with two measures common to the literature: a measure of the share of the population of older adults from the ACS (aged 65 years or older) and a satellite-based measure of vegetation moisture called the Visible Atmospherically Resistant Index (VARI). The VARI is derived from the green-to-red band signals and is strongly correlated with live fuel moisture which in turn, is strongly associated with wildfire ignition, spread and intensity in chaparral environments[18]. We take several steps.

First, we use a simulation framework within which samples are repeatedly drawn from the range of values (proportion of older adults) dictated by the sampling uncertainty. In mathematical terms, we take a draw from a normal distribution with mean as the logit-transformed point estimate $\hat{𝜇}$ ) and variance ( $𝜎^{2}$ specified from the variance estimate replicate tables for each Census geography (s). (See equation 1.)

We then summarize across the simulations to make sense of impacts of the sampling uncertainty of the ACS. We show that the selection of the top 10% most vulnerable areas — the areas that would likely be identified to receive resources — is sensitive to sampling uncertainty. We find that at smaller analytic scales (i.e. census block groups), the top 10% most vulnerable areas selected are not necessarily the same as if we took the proportion or share of older adults at face value.

Figure 4 provides a visual explanation of this finding, comparing the top 10% most vulnerable Census block groups (left column) and Census tracts (right column). Panels (a) and (b) show the top 10% areas according to the raw point estimates of the share of older adults in each area. Panels (c) and (d) show the recurrence rates of each Census block group or tract over the course of 1000 simulations. Areas shaded in brown-to-gold consistently appear in the top 10% most vulnerable (in 90+% of simulations); areas shaded in green-to-tan appear with less frequency in the top 10%, but any draw from the distribution is equally likely under the distributional assumptions. Panels (e) and (f) show the difference in the top 10% using the raw point estimates versus the simulation. Areas shaded in blue were ranked in the top 10% using the raw point estimates and the simulation. Areas in yellow were only ranked in the top 10% using the raw estimates, and areas in pink were only selected using the simulation method. Given the different sizes of Census tracts and block groups in Ventura and Santa Barbara county, these are difficult to see. For Census block group map (e), 2 areas are shaded pink; in the Census tract map (f) 1 area is shaded pink. These differences suggest that resources may be misallocated (or more resources may be needed). Areas in red are excluded from analysis.

[Caption: Figure 4 displays the Census tracts (right) and Census block groups (left) of Santa Barbara and Ventura counties. In the top panel, geographies with a top 10% share of the older population are highlighted in yellow. In the middle panel (c-d) show the recurrence rates for the top 10% from the simulations. The lowest panel (e-f) maps differences in which geographies are in the top 10% of older adult population share by simulation versus raw point estimate. Source: Arabadjis et al., 2025[1]]

Second, we propose a statistical model and simulation procedure that combines the older adult population measure and the vegetation measure to create an example composite wildfire risk index. (See equation 2.) Importantly, our statistical model and procedure accounts for the sampling uncertainty of both the population and environmental measures and has a closed form expression of the variance[1]. (See paper for details.)

Similarly to Figure 4, each point in panels (c) and (d) in Figure 5 are colored to represent the proportion of simulations for which the index value at that point appeared in the top 10% of each sample (i.e. vulnerable areas). Areas in green almost never appeared in the top 10%; areas in tan appeared in the top 10% in roughly ⅔ or more of samples; and areas in brown-to-gold appeared in the top 10% in ninety percent or more of the simulations. These brown-to-gold areas are consistently highly vulnerable per our index.

Ultimately, our simulations show that accounting for the sampling design makes a difference in which areas are designated as “highly vulnerable to wildfire,” and that the uncertainty in the older adult measure likely dwarfs any uncertainty in the vegetation measure.

In this way, wildfire risk, in particular, would benefit from more finely resolved person-level data, such as parcel-level indicators from county tax assessor data and specialized survey data measuring risk, mitigation, and specific demographic indicators. This is especially important for older adults who may live in wildfire prone areas, but have different risk profiles in terms of knowledge, capacity, and underlying health[19].

Our results extend beyond our simplified wildfire example. We show that ignoring the sampling uncertainty in any ACS population estimates used in an index may misidentify risk across a landscape. We also note that the more complex the index (more measures), the more complicated the statistical model needs to be to incorporate the uncertainty, and the more trouble the subsequent risk distribution may be (multimodal, for instance).

However, simulation is another powerful tool that can help practitioners overcome sampling design constraints. Generating maps of simulation summaries (such as Figures 4 and 5) are potentially just as interpretable, but more true to the underlying unknowns in the data.

[Caption: Figure 5 maps the recurrence rates for each point (s) in the top 10% of wildfire risk index values. Subfigure (c) shows a Census block group-based-index and subfigure (d) shows a Census tract-based index. Source: Arabadjis et al., 2025[1]]

The article contributes to a robust and growing literature on vulnerability to environmental hazards. We show that uncertainty in the underlying data can distort vulnerability indices, especially as areal units get smaller. However, simulation is a pragmatic tool to help practitioners rigorously identify vulnerable areas and improve resources targeting across a landscape.

The real take-home message is that the next time you click on a map that highlights certain geographies as ‘highly vulnerable’, interpret with care!

References:

[1] Arabadjis, S. D., Zheng, Z., Strange, L. P., Murray, A. T., & Sweeney, S. H. (2026). Social Vulnerability, Locational Vulnerability, and Uncertainty in Wildfire Risk Index Construction. Annals of the American Association of Geographers, 116(5), 1211–1234. https://doi.org/10.1080/24694452.2025.2604851

[2] Flanagan, Barry E. et al. (2011). A Social Vulnerability Index for Disaster Management. 8(1).

[3] Cutter, S.L., B.J. Boruff, and W.L. Shirley. (2003). Social Vulnerability to Environmental Hazards. Social Science Quarterly 84(2). DOI: https://doi.org/10.1111/1540-6237.8402002

[4] Cutter, S.L. (1996) Vulnerability to Environmental Hazards. Progress in Human Geography 20(4). https://doi.org/10.1177/030913259602000407

[5] Cutter, S.L. (2024) The Origin and Diffusion of the Social Vulnerability Index (SoVI). International Journal of Disaster Risk Reduction. 109(104567). https://doi.org/10.1016/j.ijdrr.2024.104576

[6] NASA SEDAC. 2023. Center for International Earth Science Information Network (CIESIN) Documentation for the U.S. Social Vulnerability Index Grids: NASA Socioeconomic Data and Applications Center (SEDAC). New York. Columbia University.

[7] South Carolina Office of Resilience. (2026). Retrieved June 11, 2026, from https://scor.sc.gov/

[8] State of California Governor’s Office of Land Use and Climate Innovation (2026). Retrieved June 11, 2026, from https://vcp.lci.ca.gov/

[9] Federal Emergency Management Agency, FEMA (2026). Retrieved June 11, 2026, from https://www.fema.gov/emergency-managers/practitioners/recovery-resource-library/social-vulnerability-environmental

[10] U.S. Department of Agriculture and Rural Development, USDA (pre-2024). Retrieved 2024 from https://www.rd.usda.gov/priority-points/equity-search (No longer available.)

[11] Maine Infrastructure and Adaptation Fund MIAF (2026). Retrieved June 11, 2026 from https://www.maine.gov/future/climate/community-resilience-partnership and https://www.maine.gov/future/sites/maine.gov.future/files/inline-files/CAG2026-7-ProgramStatement.pdf

[12] Spielman, S. E., Folch, D., & Nagle, N. (2014). Patterns and causes of uncertainty in the American Community Survey. Applied Geography, 46, 147–157. https://doi.org/10.1016/j.apgeog.2013.11.002

[13] U.S. Census Bureau. (2017). Documentation for the 2013-2017 variance replicate estimates tables. https://www2.census.gov/programs-surveys/acs/replicate_estimates/2017/documentation/5-year/2013-2017_Variance_Replicate_Tables_Documentation.pdf

[14] Murray, A. T., Carvalho, L., Church, R. L., Jones, C., Roberts, D., Xu, J., Zigner, K., & Nash, D. (2021). Coastal Vulnerability under Extreme Weather. Appl. Spat. Anal. Policy, 14(3), 497–523. https://doi.org/10.1007/s12061-020-09357-0

[15] Park, I., Fauss, K., & Moritz, M. A. (2022). Forecasting Live Fuel Moisture of Adenostema fasciculatum and Its Relationship to Regional Wildfire Dynamics across Southern California Shrublands. Fire, 5(4), Article 4. https://doi.org/10.3390/fire5040110

[16] Storey, E. A., Stow, D. A., Roberts, D. A., O’Leary, J. F., & Davis, F. W. (2021). Evaluating Drought Impact on Postfire Recovery of Chaparral Across Southern California. Ecosystems, 24(4), 806–824. https://doi.org/10.1007/s10021-020-00551-2

[17] CAL FIRE. 2025. Statistics–CAL FIRE. https://www.fire.ca.gov/our-impact/statistics

[18] Peterson, S., D. Roberts, and P. Dennison. 2008. Mapping live fuel moisture with MODIS data: A multiple regression approach. Remote Sensing of Environment112 (12):4272–84. doi: 10.1016/j.rse.2008.07.012.

[19] De Fries, C., C. Melton, R. Smith, L. Reyes Mason. (2022). The Impacts of Wildfires on Older Adults: A Scoping Review. Innovation in Aging, 6:Supplement 1. https://doi.org/10.1093/geroni/igac059.2307

[20] National Interagency Fire Center. 2025. National Interagency Fire Center. https://data-nifc.opendata.arcgis.com

What Am I Watching? Understanding the Health Impacts of Wildfire Smoke Exposure

Written by Julia Shipman on June 29, 2026. Posted in Uncategorized, What Am I Reading?.

What am I watching? Understanding the Health Impacts of Wildfire Smoke Exposure

Link to video

If you’re interested in contributing a short What Am I Reading post, we’d love to hear from you! Email us at cache@colorado.edu

Written by Elizabeth Sorensen Montoya, Ph.D. University of Colorado Boulder www.elizabethsorensenmontoya.com.

If you live in the Eastern U.S. or the Midwest, you’ve probably spent the last few days breathing in that now-familiar sign of summer: Canadian wildfire smoke. But this isn’t just a North American problem. In recent years, wildfires have become more frequent, more intense, and harder to suppress. Because wildfire smoke can travel long distances, the health impacts often reach far beyond the burn zone.

So, what does all this smoke actually mean for our health?

As part of the Climate and Health Research Coordinating Center’s (CAFÉ RCC) State of the Science webinar series, Dr. Michael Brauer, professor at the School of Population and Public Health at the University of British Columbia, delivered an excellent talk exploring just that. You can watch the full seminar here.

Below is a quick, high-level overview of some key takeaways from the presentation:

The “new normal”: Wildfires are becoming more frequent, larger, and harder to suppress. Not only that, but they’ve begun to extend beyond what we have traditionally thought of as “fire season”, with smoke events occurring well outside traditional summer months.

Health impacts: The talk covered a wide range of health outcomes linked to wildfire smoke exposure, from respiratory and cardiovascular impacts to emerging evidence on effects like dementia, reduced cognitive performance, and ambulance dispatches. A particularly interesting piece of the talk focused on recent research into the delayed impacts of wildfire smoke. For example, one study by Landguth and colleagues shows that smoke exposure during the summer can increase the risk of flu during the following winter.

Looking ahead: Dr. Brauer talked about how wildfire smoke could change in the years to come, not only as a result of climate change but also our response to it.

What can be done? Dr. Brauer ended the talk by outlining several approaches for reducing exposure, from individual-level interventions to community-level planning and preemptive actions.

The seminar is well worth watching in full. Dr. Brauer does a fantastic job of weaving together scientific evidence, real-world case studies, and forward-looking perspectives.

As wildfires continue to affect communities around the world, it’s increasingly important to understand the health risks and how we might reduce them. Dr. Brauer’s talk is a great starting point for those curious about wildfire smoke and health and a valuable resource for those already working in that field.

References:

Brauer, M. (2024) Understanding the health impacts of wildfire smoke exposure. Presented as part of the CAFÉ RCC State of the Science webinar series, 15 May. Available at: https://www.youtube.com/watch?v=2CViMQ-Xjuo

Landguth, E.L., Holden, Z.A., Graham, J., Stark, B., Mokhtari, E.B., Kaleczyc, E., Anderson, S., Urbanski, S., Jolly, M., Semmens, E.O. and Warren, D.A., 2020. The delayed effect of wildfire season particulate matter on subsequent influenza season in a mountain west region of the USA. Environment international, 139, p.105668.

The North Carolina Flood Extent Archive (NC-FLDEX)

Written by Julia Shipman on June 8, 2026. Posted in Code & Data, Uncategorized.

The North Carolina Flood Extent Archive (NC-FLDEX)

Link to data

Click here

Prepared by: Helena M. Garcia, University of North Carolina at Chapel Hill and Kathryn Foster, Cornell University

Date: June 2026

Original Authors: Helena M. Garcia, Antonia Sebastian, Kieran P. Fitzmaurice, Miyuki Hino, Elyssa L. Collins, Gregory W. Characklis

About the data:

The North Carolina Flood Extent Archive (NC-FLDEX) is a dataset that includes flood extent rasters for 78 flood events in North Carolina. The data are created using address-level NFIP Claims, NFIP Redacted Claims and Policies, USGS 30m Elevation, NLCD Fractional Impervious Surface, ERA5 Hourly Precipitation, NHD Coastline, NC OneMap Major Hydrography, Height Above Nearest Drainage (HAND), Soil Hydraulic Conductivity (ksat), and FEMA Special Flood Hazard Area (SFHA).

Data are available on:

The North Carolina Flood Extent Archive (NC-FLDEX) is a dataset that includes 30-meter-resolution flood extent rasters for 78 flood events that occurred in the eastern three-quarters of North Carolina between 1996 and 2020. The rasters represent binary flood extents (1= likely flooded) and are derived from address-scale National Flood Insurance Program (NFIP) claims and policy data. NFIP data were used to create flood presence and flood absence points to train machine learning models to estimate flood probabilities at every 30-meter cell in the study area. The NC-FLDEX archive includes event-specific rasters, contextual information (e.g., event name, date), and a cumulative exposure raster summarizing the flood frequency in each cell across all 78 events.

NC-FLDEX can be combined with other spatial datasets to quantify flood exposure at multiple spatial scales, examine historical flood patterns and frequencies, and compare exposure across specific events. The 30m NC-FLDEX rasters can be aggregated to census, ZCTA, watershed, or county boundaries (not limited to these, but these are common options) or used directly with building footprint or individual-level location data (e.g., residential address histories, mobile phone location data).

Citation:

Garcia, Helena M.; Sebastian, Antonia; Fitzmaurice, Kieran P.; Hino, Miyuki; Collins, Elyssa L.; Characklis, Gregory W., 2024, “Flood Extent Rasters (30m) for 78 NC-FLDEX Events 1996-2020”, https://doi.org/10.15139/S3/DOKK16, UNC Dataverse, V7, UNF:6:MbKrcmsVu0yVRbJ1Fl2IjA== [fileUNF]

Garcia, H. M., Sebastian, A., Fitzmaurice, K. P., Hino, M., Collins, E. L., & Characklis, G. W. (2025). Reconstructing repetitive flood exposure across 78 events from 1996 to 2020 in North Carolina, USA. Earth’s Future, 13, e2025EF006026. https://doi.org/10.1029/2025EF006026

The North Carolina Flood Extent Archive (NC-FLDEX) Example Code

Written by Julia Shipman on June 8, 2026. Posted in Code & Data, Uncategorized.

The North Carolina Flood Extent Archive (NC-FLDEX) Example Code

Link to code

Click here

Prepared by: Helena M. Garcia, University of North Carolina at Chapel Hill and Kathryn Foster, Cornell University

Date: June 2026

Original Authors: Helena M. Garcia, Antonia Sebastian, Kieran P. Fitzmaurice, Miyuki Hino, Elyssa L. Collins, Gregory W. Characklis

Specific purpose of code:

The code provides an example of how to generate flood extent data, modeled after the process used to create the North Carolina Flood Extent Archive dataset. Due to privacy restrictions from the National Flood Insurance Program (NFIP), the address-level records used to create NC-FLDEX, a full code cannot be shared. However, an example code is shared using randomly generated NFIP claims and policy locations from the NC Building Footprint 2010 data. The event dates for the code align with Hurricane Florence (2018). The purpose of the example code is 1) to represent how NC-FLDEX was developed, and 2) to provide a guide for replicating similar products for different geographies.

The North Carolina Flood Extent Archive (NC-FLDEX) example code creates flood exent raster data. The rasters represent binary flood extents (1= likely flooded) and are created using random forest models trained on high-resolution geospatial predictors and address-level National Flood Insurance Program (NFIP) claims and policy data. NFIP claims locations are labeled as flood presence points and policy locations without claims are labeled as flood absence points. The flood presence and absence points are then used to estimate flood probabilities at every 30-meter cell in the study area. The NC-FLDEX example code includes a comparison of model outputs with publicly available physics-based and remote sensing-based model outputs for Hurricane Florence. The archive example code also includes North Carolina building footprint data to support building-level exposure summaries.

General Application:

This code can be adapted to estimate flood extent in other locations and different time frames that are relevant to aging populations. It can also be combined with other relevant spatial datasets to examine historical flooding events to compare exposure across events.

How does or could this code allow researchers to assess research questions related to aging or life course?:

The spatial 30m NC-FLDEX raster data can be spatially aggregated to demographic units (e.g., census tracts, block groups) and linked with datasets containing age or health-related variables.

Data sets used:

Population, socioeconomic, or health data: US Census Bureau Primary and Secondary Roads, US Census Bureau Census Tracts 2010, US Zip Code Tabulation Areas (ZCTAs) 2000, and North Carolina Building Footprints
Climate, weather, disaster or environment data: Address-level NFIP Claims, NFIP Redacted Claims and Policies, USGS 30m Elevation, NLCD Fractional Impervious Surface, ERA5 Hourly Precipitation, NHD Coastline, NC OneMap Major Hydrography, Height Above Nearest Drainage (HAND), Soil Hydraulic Conductivity (ksat), and FEMA Special Flood Hazard Area (SFHA)

Are all the data publicly available or are some restricted-access?

The address-level NFIP claims are restricted-access data.

Links to data:

US Census Bureau Primary and Secondary Roads:

https://www.fema.gov/openfema-data-page/fima-nfip-redacted-policies-v2

USGS 30m Elevation: https://data.usgs.gov/datacatalog/data/USGS:35f9c4d4-b113-4c8d-8691-47c428c29a5b

NLCD Fractional Impervious Surface: https://www.sciencebase.gov/catalog/item/655ceb8ad34ee4b6e05cc51a

ERA5 Hourly Precipitation: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview

NHD Coastline: https://www.usgs.gov/national-hydrography/national-hydrography-dataset

NC OneMap Major Hydrography: https://www.nconemap.gov/datasets/nconemap::major-hydrography-streams-rivers/about

Height Above Nearest Drainage (HAND): https://www.hydroshare.org/resource/73aaa3efcda2465ba6227f535400f36b/

Soil Hydraulic Conductivity (ksat): https://websoilsurvey.nrcs.usda.gov/app/

FEMA Special Flood Hazard Area (SFHA): https://www.nconemap.gov/maps/a178aae74ee347d786e853e5a442eea2/explore?location=35.121157%2C-79.918650%2C7.86

Coding Language: R was used to create these data, more information and example codes are available in the dataverse repository.

Tools and Packages used: Dplyr, sp, raster, tigris, ggplot2, sf, readxl, writexl, nngeo, lubridate, stringr, tidyr, raster, ggplot2, reshape2, data.table, randomForest, stats,ranger, caret, tuneRanger, mlr, stringr, pROC, ROCR, dismo

Output(s): Dataset and mapping

Spatial extent: Flood extent rasters span the coastal draining USGS HUC-6 watersheds within North Carolina (030101, 030102, 030201, 030202, 030203, 030300, 030401, 030402). The study area overlaps with 78 of North Carolina’s 100 counties. A shapefile of the study area is included in repository.

Temporal extent: The 78 flood events included in NC-FLDEX are based on National Flood Insurance Program damage records from 1/1/1996 to 9/30/2020.

Published papers that use this code:

Joyce Pak, Bradford E. Jackson, Shabbar I. Ranapurwala, Miyuki Hino, Lawrence S. Engel, Katherine E. Reeder-Hayes, Jillian L. Evans-Strong, Jennifer L. Lund; Impacts of Hurricane-Related Flooding on Time to Initial Cancer Directed Treatment in North Carolina. Cancer Epidemiol Biomarkers Prev 2026; https://doi.org/10.1158/1055-9965.EPI-25-1664

Graphic reproduced from Garcia et al (2025) published in Earth’s Future.

Introducing the Temperature Extremes in Europe (TEE) Datasets

Written by Julia Shipman on May 13, 2026. Posted in Code & Data, Uncategorized.

Introducing the Temperature Extremes in Europe (TEE) Datasets

Link to code

Click here

Date: November 2025

Authors/Creators/ Team Members: Sara R. Ronnkvist, Zoe Haskell-Craig, Risto Conte Keviabu, Abbie Robinson, Mathew E. Hauer, Domenico Bovienzo, Emilio Zagheni

Specific purpose of code: This GitHub repository contains a collection of scripts that generate datasets quantifying extreme temperature exposure in Europe using a variety of metrics at two sub-national spatial scales (NUTS 2 and NUTS 3) and three temporal scales (daily, extreme temperature wave, and yearly) from 1980-2024. These datasets capture the breadth of temperature metrics used in epidemiology, demography and environmental literature with 67 different metrics: including regionally-unusual temperature events (defined as temperatures above/below the 95th/5th percentile of historical temperatures) and periods of sustained (consecutive day) exposure to extreme temperatures. Additionally, these scripts can be adapted to construct temperature extremes for other geographic regions or scales with a few minor revisions.

General Application: The TEE datasets can be linked to any data that contains NUTS identifiers(e.g.Eurostat)using a simplemergeto study the impacts of extreme temperatures on populations.

How does or could this code allow researchers to assess research questions related to aging or life course?: The TEE datasets provide temperature data in a user-friendly format which can easily be linked to EuroStat or other datasets with NUTS identifiers. Additionally, our code is reproducible and easily adaptable to other geographic regions and/or time frames. Researchers who wish to study other regions may adapt our code to construct extreme temperature measures.

Data sets used:

Population, socioeconomic, or health data:
- Earth Science Data Systems, N. Gridded Population of the World, Version 4 (GPWv4): Population Count, Revision 11 | NASA Earthdata. (Earth Science Data Systems, NASA,2024,5), https://www.earthdata.nasa.gov/data/catalog/sedac-ciesin-sedac-gpwv4-popcount-r11-4.11
Climate, weather, disaster or environment data:
- Muñoz-Sabater, J. et al. ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth System Science Data 13, 4349–4383, https://doi.org/10.5194/essd-13-4349-2021 (2021).
- Di Napoli, C., Barnard, C., Prudhomme, C., Cloke, H. & Pappenberger, F. ERA5-HEAT: a global gridded historical dataset of human thermal comfort indices from climate reanalysis. Geoscience Data Journal 8, 2–10, https://doi.org/10.1002/gdj3.102 (2021).
- NUTS shapefiles from Eurostat: https://ec.europa.eu/eurostat/web/gisco/geodata/statistical-units/territorial-units-statistics

Are all the data publicly available or are some restricted-access? Publicly available.

Links to data: TEE dataset are available on FigShare: https://springernature.figshare.com/articles/dataset/Temperature_Extremes_in_Europe_TEE_/28063226

Coding Language: We use Python to download the extreme temperature data from Copernicus and aggregate the hourly data to daily temperature measures. We use R for everything else.

Tools and Packages used:

R: terra, exactextractr, sf, tidyverse, foreign, zoo

Python: os, glob, shutil, zipfile, cdsapi, xarray, numpy

Output(s):

16 datasets:

Annual measures of extreme temperature

tee_yearly_nuts2.csv

tee_yearly_nuts3.csv

Extreme temperature waves (consecutive days of temperature extremes)

tee_wave_nuts2.csv

tee_wave_nuts3.csv

Daily temperature measure datasets split into 10-year files to reduce file size

tee_daily_nuts2_[START YEAR]_[END YEAR].csv

tee_daily_nuts3_[START YEAR]_[END YEAR].csv

Spatial extent: European Union countries

Temporal extent: 1980-2024

Comments: Ronnkvist et al (2025) contains detailed information on how the datasets were constructed and the included metrics. We provide replication instructions in the TEE-dataset GitHub repository (https://github.com/haskellcraigz/TEE-dataset/tree/main).

Citation:

Ronnkvist, S.R., Haskell-Craig, Z., Robinson, A., Conte Keivabu, R., Hauer, M.E., Bovienzo, D., & Zagheni, E. (2025) What’s the TEE: Metrics of Temperature Extremes in Europe NUTS Regions (1980-2024). Scientific Data. https://doi.org/10.1038/s41597-025-05352-7

Published papers that use this code:

Scientific Data publication associated with the TEE datasets:

Aligning the Exposome and Health Equity with Brain Aging amidst a Changing Climate

Written by Julia Shipman on May 8, 2026. Posted in Uncategorized, What Am I Reading?.

What am I reading? Aligning the Exposome and Health Equity with Brain Aging amidst a Changing Climate

Link to article

If you’re interested in contributing a short What Am I Reading post, we’d love to hear from you! Email us at cache@colorado.edu.

Written by Kelly Perry and Jenna Merenstein

When people think about brain health and aging, they might first think about the role of genetics or lifestyle factors. Both factors are important, though a growing body of research shows something else is also playing a major role in shaping how our brains age: the physical, social, and structural exposures we experience across our lives—from before birth through older age. Scientists refer to this as the exposome, and understanding it is key to addressing cognitive decline in a changing climate (Li et al 2025). In our recent Alzheimer’s & Dementia Perspective, we argued that understanding both healthy aging and Alzheimer’s disease and related dementias (ADRDs) requires an exposome- and equity-centered lens that considers the environmental and social conditions (including systemic inequities and systems-level resilience strategies) that shape brain health. We include the framework introduced in our paper in Figure 1.

Figure 1. An equity- and justice-centered framework for linking the exposome and neurocognitive health across the life course (from Perry KE & Merenstein J, 2026)

A recent Nature Medicine paper by Legaz et al. titled “The Exposome of Brain Aging across 34 Countries,” provides a compelling example of how this framework can be applied in practice. The authors examined 73 physical and social exposomal factors (e.g., air pollution, temperature, greenspace, gender equality, democracy, and access to drinking water) and their relation to neuroimaging measures of brain structure and brain function for 18,701 participants that varied in cognitive status (cognitively normal, mild cognitive impairment, Alzheimer’s disease, and frontotemporal lobar degeneration). Participants were recruited from 34 countries across Latin America, North America, Europe, Asia, Africa, and Oceania, thereby helping address the systemic underrepresentation of low- and middle-income countries (LMICs) in neuroimaging research.

Legaz et al. demonstrated that aggregated exposome models explained significantly more variance in brain aging than any single exposure alone—up to 15.5 times more than individual factors. Furthermore, their results highlight the impact that structural inequities embedded within neighborhoods, institutions, and political systems have on brain aging: physical exposures (e.g., higher air pollution, reduced greenspace, and extreme temperatures) were strongly associated with accelerated structural brain aging. In contrast, social exposures (e.g., poverty, weaker rule of law, and decreased civic participation) were associated with accelerated functional brain aging. The latter was shown to be especially true for female participants in the study, where reduced rights-related factors and poor soil and water quality predicted accelerated aging in females more than males. Their findings underscore a central argument of our Perspective: that environmental neuroscience needs to overcome the siloed approaches that are traditional hallmarks of the discipline, e.g., focusing on a single exposure or a single high-income country cohort.

Tools such as the Area Deprivation Index (ADI) and the Neighborhood Atlas are helpful in elucidating these structural, systemic inequities that adversely impact brain aging and increase the risk of ADRDs. While Legaz et al. used country-level indicators, neighborhood-level metrics such as ADI can provide finer resolution for studying how disadvantage shapes brain health within countries and across communities. ADI captures social dimensions such as income, education, employment, and housing quality, allowing researchers to better characterize cumulative disadvantage at the local scale. Similarly, the Neighborhood Atlas (which hosts the ADI) provides standardized geospatial measures of neighborhood deprivation that can be linked to neuroimaging, cognition, and dementia outcomes (Hunt et al 2020; Kim et al 2024). These tools help operationalize the “social exposome” by connecting place-based inequities to measurable differences in brain structure and function.

Building better data in this field also means improving access to neuroimaging itself. As we discussed in our Perspective, one major barrier to exposome- and equity-informed neuroscience is that advanced MRI remains inaccessible in many under-resourced and rural settings. Emerging low-field MRI initiatives, including recent work at Cardiff University, are helping address this gap by developing lower-cost, portable systems capable of whole-brain imaging. These efforts are particularly important for LMICs, where the burden of environmental exposures is often highest but access to imaging infrastructure is lowest. Legaz et al.’s multi-country study underscores why this matters: without broader imaging access, we risk building evidence only from the least exposed populations.

Protecting brain health equitably requires shifting toward prevention and systems change. Policies that promote cleaner air and water, safer housing, and equitable urban design can reduce harmful exposures across the life course. Investments in expanding access to green spaces, pollution control, and community infrastructure benefit physical health and cognitive resilience. Legaz et al.’s study provides strong empirical support for the notion that environmental and social conditions fundamentally shape brain aging outcomes. As climate change intensifies these exposures globally, building inclusive datasets (e.g., The Neighborhood Atlas), improving access to neuroimaging (e.g., investing in low-field scanners), and centering health equity in brain health research and policy will be essential for ensuring healthy cognitive aging for communities worldwide.

Mapping Flooding and Population Exposure: The Global Flood Database

Written by Julia Shipman on April 22, 2026. Posted in Code & Data, Uncategorized.

Mapping Flooding and Population Exposure: The Global Flood Database

Link to data

Click here

Prepared by: Kathryn Foster, Cornell University

Date: April 2026

Original Authors: B. Tellman, J.A Sullivan, C.S Doyle, C. Kuhn, A.J Kettner, G.R Brakenridge, T.A. Erikson, D.A. Slayback

About: The data combines satellite flood and human settlement data to identify population exposure to flooding and flood risk from 2000 onward. The data were created using NASA satellite imagery to identify and map flood events recorded by the Dartmouth Flood Observatory (DFO), which were then intersected with global watershed data (HydroSHEDS) and daily precipitation estimates (PERSIANN-CDR). The maps were then overlaid with population data from the Global Human Settlement Layer (GHSL) to derive population exposure estimates.

Data are available on: Flood events, population exposure per flooding event, population exposed per country-event, and population displaced per event. The data include 913 flood events from 169 countries from 2000 to 2018. Utilizing satellite data, this database covers 2.23 million square kilometers of inundated land and helps track flood exposure, accounting for population change in flood-prone regions.

Spatial data (GeoTIFF files) by country and flooding events can be downloaded from the bottom of the Global Flood Database website in the interactive maps section. These maps outline the flooded areas and the duration of the flood for each event, as well as impacts on displacement and casualties. The DFO estimates both the cause of the flood event and the casualties. This is the number reported by the media or the government and could be much higher or lower than the estimated number of people exposed from satellite data.

Population exposure per flooding event is calculated by intersecting the observed inundated flood data with the population data. The population exposed per event is reported using the GHSL population estimated in 2000, and in 2015. Tabular data in csv format can be found in the “About the data” link at the top of the Global Flood Database website.

Although the database does not separate population exposure by age for each flooding event, researchers can still study older adults using the data. For example, researchers may combine the Global Flood Data with Census or population data of interest to assess the relationship between flood exposure and older adults (example found here).

Detailed GIS data descriptions and methodological notes can be found here: https://storage.googleapis.com/gfd_metadata/README_GFD.pdf

Citation: Tellman, B. et al. Satellite imaging reveals increased proportion of population exposed to floods. Nature 596, 80–86 (2021).

Creating “bins” for extreme temperature data and adjusting for known bias

Written by Julia Shipman on April 22, 2026. Posted in Code & Data, Uncategorized.

Creating “bins” for extreme temperature data and adjusting for known bias

Link to code

Click here

Prepared by: Alex Mikulas, PhD, CACHE postdoctoral associate

Date: April 2026

Original Authors:

Benjamin Jones, Northwestern University

Jacob Moscona, Massachusetts Institute of Technology

Benjamin A. Olken, Massachusetts Institute of Technology

Cristine von Dessauer, Massachusetts Institute of Technology

Specific purpose of code: This Stata code and program take temperature data at fine temporal and spatial resolution (ie: tract-day) and transform it to an aggregated panel dataset with temperature binned to year-place or month-place specificity. The program calculates both realized and expected number of days in each temperature bin. The data are then ready for use in regression and similar analysis using standard temperature bin specifications. The data are useful in better capturing the role of extreme temperatures by identifying extreme temperature days outside of what would be expected in a given area and are useful for assessing the causal role of increasing extreme temperature exposure.

General Application: Extreme temperature exposure is often operationalized in research using a binning procedure, wherein a researcher aggregates the number of extreme temperature days into binned temperature ranges. A common and serious bias can occur when using binned temperature data over time if the outcome variable of interest is associated with the baseline temperature, producing what are often called “U-shaped” results. When studying the impacts of climate change and increasing extreme temperature exposure, common binning procedures neglect the baseline temperature of a given area.

Let’s say that over a 20-year span, the average temperature increase is uniform across space. A place like Phoenix will see a large increase in extreme heat days, (say, 90-degree+ days) while a place like Boston will see a smaller increase in extreme heat days. This is because the baseline temperature for Phoenix 20 years ago was much warmer than the baseline temperature in Boston.

The bias arises when a given outcome is associated with both extreme heat days, and also the baseline temperature of a given area. If a baseline temperature of an area is closer to the bin thresholds for “extreme heat”, the outcome may be associated with the baseline temperature as well as the increase in extreme heat days, introducing statistical bias into an analysis.

This code addresses the issue by providing the “expected” number of days in each temperature bin, as well as the observed number of days in each temperature bin. These data can then control for different baseline temperatures and trends in warming for different areas. These expected and observed temperature bins allow the researcher to avoid regressing “trends on trends”, which estimate the biased U-shaped results. The estimated area-year and area-month temperature bin data can be used in a wide variety of extreme temperature exposure studies.

How does or could this code allow researchers to assess research questions related to aging or life course?: This code can be applied to spatially and temporally specific temperature datasets to bin temperature exposure data while also accounting for varying baseline temperatures and varying trends in extreme temperature over time. Health and aging scholars can then integrate these data as a weather exposure variable and more accurately predict the impact of extreme temperature on age and health related outcomes.

Data sets used:

Population, socioeconomic, or health data: code generates binned temperature data that can be integrated with any data that has spatial and temporal specificity (such as lat/long, geographic administrative identifiers, panel data, or data with observation dates).
Climate, weather, disaster or environment data: code is applicable to any temperature data with detailed place/day resolution.

Are all the data publicly available or are some restricted-access? NA

Links to data: NA

Coding Language: Stata

Tools and Packages used: cftemp, a custom Stata command in an ado. file.

Output(s): Dataset of observed and counterfactual data of binned counts of extreme temperature days at the place/year or place/month level. Output comes from any user supplied, hyper-specific daily temperature datasets.

Spatial extent: Flexible. Depends on user supplied data.

Temporal extent: Flexible. Depends on user supplied data.

Published papers that use this code: Benjamin Jones, Jacob Moscona, Benjamin A. Olken, and Cristine von Dessauer, “With or Without U? Binning Bias and the Causal Effects of Temperature Extremes,” NBER Working Paper 34671 (2026), https://doi.org/10.3386/w34671.

Code linking NCHS mortality data with GFD flood event data

Written by Julia Shipman on April 14, 2026. Posted in Code & Data, Uncategorized.

Code linking NCHS mortality data with GFD flood event data

Link to code

Click here

Date: April 2026

Authors/Creators/ Team Members: Victoria D. Lynch, Jonathan A. Sullivan, Aaron B. Flores, Xicheng Xie, Sarika Aggarwal, Rachel C. Nethery, Marianthi-Anna Kioumourtzoglou, Anne E. Nigra, and Robbie M. Parks

Specific purpose of code: This code links National Center for Health Statistics (NCHS) mortality data with Global Flood Database (GFD) flood event data from 2001 – 2018 by US county. We used the NCHS data to identify monthly total and cause-specific deaths by age group, sex, and county and used Global Human Settlement Layer (GHSL) population data to calculate county-level flood exposure and mortality rates. We used a Bayesian formulation of the conditional quasi-Poisson model to analyze the county-level association between the number of flood events per month and monthly death rates, accounting for overdispersion in the mortality data. The conditional approach examines differences within matched strata (here, county-months) like a case-crossover study design, which removes confounding bias due to factors that vary across strata. Bayesian inference enables the ‘borrowing of information’ across county units and for the full distributional estimation of the parameters of interest.
All-cause and cause-specific mortality associated with flood events is likely differential by flood cause and severity; therefore, we conducted analyses separately by all-cause and cause-specific mortality (cancers, cardiovascular diseases, infectious and parasitic diseases, injuries, neuropsychiatric conditions, and respiratory diseases), flood cause (all floods, heavy rain, tropical cyclone, snowmelt, and ice jam or dam break), and flood severity (mild, moderate, severe, and very severe). Because very severe flood events were most strongly associated with increased mortality across all flood causes and mortality groups, we further analyzed associations stratified by age group (0-64 and 65+ years) and sex (female and male) for very severe floods only.

General Application: This code links county-level flood exposure, categorized by flood type, with county-level mortality rates for the six primary causes of death in the US: cancers, cardiovascular disease, infectious and parasitic diseases, injuries, neuropsychiatric conditions, and respiratory diseases. The code could be used with any county-level health outcome and, with modification, with other county-level environmental exposures. The code specifically categorized exposure by flood type and severity, which would not apply to other exposures.

How does or could this code allow researchers to assess research questions related to aging or life course?: The code is written to assess the association between flood exposure and mortality by age category; in our paper, we specifically stratified by age category (0-64 years old and 65+ years old) to examine flood exposure-related mortality among older adults. The NCHS data include individual-level age at death and would enable analyses with any subset of age groups.

Data sets used:

Population, socioeconomic, or health data:
National Center for Health Statistics (NCHS) mortality data; Global Human Settlement Layer (GHSL) population exposure.
Climate, weather, disaster or environment data:
Global Flood Database (GFD) flood event data; Dartmouth Flood Observatory (DFO) flood classification data; Parameter-elevation Regression on Independent Slopes Model (PRISM) temperature data.

Are all the data publicly available or are some restricted-access? Data on flood exposure are available without restrictions for individual flooding events. Temperature data and population data are also publicly available.

NCHS mortality data are restricted. To access the NCHS mortality data, applicants must submit a project review form:(https://www.cdc.gov/nchs/data/nvss/nchs-research-review-application.pdf) to nvssrestricteddata@cdc.gov and allow four to six weeks for processing.

Links to data:

Coding Language: R

Tools and Packages used:

R: acs, BiocManager, dlnm, dplyr, ecm, Epi, fiftystater, foreign, fst, ggpubr, ggplot2, graph, graticule, haven, here, janitor, lubridate, mapproj, maptools, mapview, MetBrewer, pipeR, raster, RColorBrewer, readxl, rgdal, rgeos, rnaturalearth, rnaturalearthdata, scales, sf, sp, sqldf, survival, splines, table1, tidycensus, tidyverse, totalcensus, usmap, zipcodeR, zoo, INLA, Rgraphviz, fmesher

Output(s): Exploratory data analysis of flood and mortality data (maps, figures, tables), and output of statistical analysis (figures, tables)

Spatial extent: United States

Temporal extent: 2001-2018

Published papers that use this code: Lynch, Victoria D., et al. “Large floods drive changes in cause-specific mortality in the United States.” Nature Medicine 31.2 (2025): 663-671. doi: https://doi.org/10.1038/s41591-024-03358-z