The North Carolina Flood Extent Archive (NC-FLDEX)

Written by Julia Shipman on June 8, 2026. Posted in Code & Data, Uncategorized.

The North Carolina Flood Extent Archive (NC-FLDEX)

Link to data

Prepared by: Helena M. Garcia, University of North Carolina at Chapel Hill and Kathryn Foster, Cornell University

Date: June 2026

Original Authors: Helena M. Garcia, Antonia Sebastian, Kieran P. Fitzmaurice, Miyuki Hino, Elyssa L. Collins, Gregory W. Characklis

About the data:

The North Carolina Flood Extent Archive (NC-FLDEX) is a dataset that includes flood extent rasters for 78 flood events in North Carolina. The data are created using address-level NFIP Claims, NFIP Redacted Claims and Policies, USGS 30m Elevation, NLCD Fractional Impervious Surface, ERA5 Hourly Precipitation, NHD Coastline, NC OneMap Major Hydrography, Height Above Nearest Drainage (HAND), Soil Hydraulic Conductivity (ksat), and FEMA Special Flood Hazard Area (SFHA).

Data are available on:

The North Carolina Flood Extent Archive (NC-FLDEX) is a dataset that includes 30-meter-resolution flood extent rasters for 78 flood events that occurred in the eastern three-quarters of North Carolina between 1996 and 2020. The rasters represent binary flood extents (1= likely flooded) and are derived from address-scale National Flood Insurance Program (NFIP) claims and policy data. NFIP data were used to create flood presence and flood absence points to train machine learning models to estimate flood probabilities at every 30-meter cell in the study area. The NC-FLDEX archive includes event-specific rasters, contextual information (e.g., event name, date), and a cumulative exposure raster summarizing the flood frequency in each cell across all 78 events.

NC-FLDEX can be combined with other spatial datasets to quantify flood exposure at multiple spatial scales, examine historical flood patterns and frequencies, and compare exposure across specific events. The 30m NC-FLDEX rasters can be aggregated to census, ZCTA, watershed, or county boundaries (not limited to these, but these are common options) or used directly with building footprint or individual-level location data (e.g., residential address histories, mobile phone location data).

Citation:

Garcia, Helena M.; Sebastian, Antonia; Fitzmaurice, Kieran P.; Hino, Miyuki; Collins, Elyssa L.; Characklis, Gregory W., 2024, “Flood Extent Rasters (30m) for 78 NC-FLDEX Events 1996-2020”, https://doi.org/10.15139/S3/DOKK16, UNC Dataverse, V7, UNF:6:MbKrcmsVu0yVRbJ1Fl2IjA== [fileUNF]

Garcia, H. M., Sebastian, A., Fitzmaurice, K. P., Hino, M., Collins, E. L., & Characklis, G. W. (2025). Reconstructing repetitive flood exposure across 78 events from 1996 to 2020 in North Carolina, USA. Earth’s Future, 13, e2025EF006026. https://doi.org/10.1029/2025EF006026

The North Carolina Flood Extent Archive (NC-FLDEX) Example Code

Written by Julia Shipman on June 8, 2026. Posted in Code & Data, Uncategorized.

The North Carolina Flood Extent Archive (NC-FLDEX) Example Code

Link to code

Click here

Prepared by: Helena M. Garcia, University of North Carolina at Chapel Hill and Kathryn Foster, Cornell University

Date: June 2026

Original Authors: Helena M. Garcia, Antonia Sebastian, Kieran P. Fitzmaurice, Miyuki Hino, Elyssa L. Collins, Gregory W. Characklis

Specific purpose of code:

The code provides an example of how to generate flood extent data, modeled after the process used to create the North Carolina Flood Extent Archive dataset. Due to privacy restrictions from the National Flood Insurance Program (NFIP), the address-level records used to create NC-FLDEX, a full code cannot be shared. However, an example code is shared using randomly generated NFIP claims and policy locations from the NC Building Footprint 2010 data. The event dates for the code align with Hurricane Florence (2018). The purpose of the example code is 1) to represent how NC-FLDEX was developed, and 2) to provide a guide for replicating similar products for different geographies.

The North Carolina Flood Extent Archive (NC-FLDEX) example code creates flood exent raster data. The rasters represent binary flood extents (1= likely flooded) and are created using random forest models trained on high-resolution geospatial predictors and address-level National Flood Insurance Program (NFIP) claims and policy data. NFIP claims locations are labeled as flood presence points and policy locations without claims are labeled as flood absence points. The flood presence and absence points are then used to estimate flood probabilities at every 30-meter cell in the study area. The NC-FLDEX example code includes a comparison of model outputs with publicly available physics-based and remote sensing-based model outputs for Hurricane Florence. The archive example code also includes North Carolina building footprint data to support building-level exposure summaries.

General Application:

This code can be adapted to estimate flood extent in other locations and different time frames that are relevant to aging populations. It can also be combined with other relevant spatial datasets to examine historical flooding events to compare exposure across events.

How does or could this code allow researchers to assess research questions related to aging or life course?:

The spatial 30m NC-FLDEX raster data can be spatially aggregated to demographic units (e.g., census tracts, block groups) and linked with datasets containing age or health-related variables.

Data sets used:

Population, socioeconomic, or health data: US Census Bureau Primary and Secondary Roads, US Census Bureau Census Tracts 2010, US Zip Code Tabulation Areas (ZCTAs) 2000, and North Carolina Building Footprints
Climate, weather, disaster or environment data: Address-level NFIP Claims, NFIP Redacted Claims and Policies, USGS 30m Elevation, NLCD Fractional Impervious Surface, ERA5 Hourly Precipitation, NHD Coastline, NC OneMap Major Hydrography, Height Above Nearest Drainage (HAND), Soil Hydraulic Conductivity (ksat), and FEMA Special Flood Hazard Area (SFHA)

Are all the data publicly available or are some restricted-access?

The address-level NFIP claims are restricted-access data.

Links to data:

US Census Bureau Primary and Secondary Roads:

https://www.fema.gov/openfema-data-page/fima-nfip-redacted-policies-v2

USGS 30m Elevation: https://data.usgs.gov/datacatalog/data/USGS:35f9c4d4-b113-4c8d-8691-47c428c29a5b

NLCD Fractional Impervious Surface: https://www.sciencebase.gov/catalog/item/655ceb8ad34ee4b6e05cc51a

ERA5 Hourly Precipitation: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview

NHD Coastline: https://www.usgs.gov/national-hydrography/national-hydrography-dataset

NC OneMap Major Hydrography: https://www.nconemap.gov/datasets/nconemap::major-hydrography-streams-rivers/about

Height Above Nearest Drainage (HAND): https://www.hydroshare.org/resource/73aaa3efcda2465ba6227f535400f36b/

Soil Hydraulic Conductivity (ksat): https://websoilsurvey.nrcs.usda.gov/app/

FEMA Special Flood Hazard Area (SFHA): https://www.nconemap.gov/maps/a178aae74ee347d786e853e5a442eea2/explore?location=35.121157%2C-79.918650%2C7.86

Coding Language: R was used to create these data, more information and example codes are available in the dataverse repository.

Tools and Packages used: Dplyr, sp, raster, tigris, ggplot2, sf, readxl, writexl, nngeo, lubridate, stringr, tidyr, raster, ggplot2, reshape2, data.table, randomForest, stats,ranger, caret, tuneRanger, mlr, stringr, pROC, ROCR, dismo

Output(s): Dataset and mapping

Spatial extent: Flood extent rasters span the coastal draining USGS HUC-6 watersheds within North Carolina (030101, 030102, 030201, 030202, 030203, 030300, 030401, 030402). The study area overlaps with 78 of North Carolina’s 100 counties. A shapefile of the study area is included in repository.

Temporal extent: The 78 flood events included in NC-FLDEX are based on National Flood Insurance Program damage records from 1/1/1996 to 9/30/2020.

Published papers that use this code:

Joyce Pak, Bradford E. Jackson, Shabbar I. Ranapurwala, Miyuki Hino, Lawrence S. Engel, Katherine E. Reeder-Hayes, Jillian L. Evans-Strong, Jennifer L. Lund; Impacts of Hurricane-Related Flooding on Time to Initial Cancer Directed Treatment in North Carolina. Cancer Epidemiol Biomarkers Prev 2026; https://doi.org/10.1158/1055-9965.EPI-25-1664

Graphic reproduced from Garcia et al (2025) published in Earth’s Future.

Introducing the Temperature Extremes in Europe (TEE) Datasets

Written by Julia Shipman on May 13, 2026. Posted in Code & Data, Uncategorized.

Introducing the Temperature Extremes in Europe (TEE) Datasets

Link to code

Click here

Date: November 2025

Authors/Creators/ Team Members: Sara R. Ronnkvist, Zoe Haskell-Craig, Risto Conte Keviabu, Abbie Robinson, Mathew E. Hauer, Domenico Bovienzo, Emilio Zagheni

Specific purpose of code: This GitHub repository contains a collection of scripts that generate datasets quantifying extreme temperature exposure in Europe using a variety of metrics at two sub-national spatial scales (NUTS 2 and NUTS 3) and three temporal scales (daily, extreme temperature wave, and yearly) from 1980-2024. These datasets capture the breadth of temperature metrics used in epidemiology, demography and environmental literature with 67 different metrics: including regionally-unusual temperature events (defined as temperatures above/below the 95th/5th percentile of historical temperatures) and periods of sustained (consecutive day) exposure to extreme temperatures. Additionally, these scripts can be adapted to construct temperature extremes for other geographic regions or scales with a few minor revisions.

General Application: The TEE datasets can be linked to any data that contains NUTS identifiers(e.g.Eurostat)using a simplemergeto study the impacts of extreme temperatures on populations.

How does or could this code allow researchers to assess research questions related to aging or life course?: The TEE datasets provide temperature data in a user-friendly format which can easily be linked to EuroStat or other datasets with NUTS identifiers. Additionally, our code is reproducible and easily adaptable to other geographic regions and/or time frames. Researchers who wish to study other regions may adapt our code to construct extreme temperature measures.

Data sets used:

Population, socioeconomic, or health data:
- Earth Science Data Systems, N. Gridded Population of the World, Version 4 (GPWv4): Population Count, Revision 11 | NASA Earthdata. (Earth Science Data Systems, NASA,2024,5), https://www.earthdata.nasa.gov/data/catalog/sedac-ciesin-sedac-gpwv4-popcount-r11-4.11
Climate, weather, disaster or environment data:
- Muñoz-Sabater, J. et al. ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth System Science Data 13, 4349–4383, https://doi.org/10.5194/essd-13-4349-2021 (2021).
- Di Napoli, C., Barnard, C., Prudhomme, C., Cloke, H. & Pappenberger, F. ERA5-HEAT: a global gridded historical dataset of human thermal comfort indices from climate reanalysis. Geoscience Data Journal 8, 2–10, https://doi.org/10.1002/gdj3.102 (2021).
- NUTS shapefiles from Eurostat: https://ec.europa.eu/eurostat/web/gisco/geodata/statistical-units/territorial-units-statistics

Are all the data publicly available or are some restricted-access? Publicly available.

Links to data: TEE dataset are available on FigShare: https://springernature.figshare.com/articles/dataset/Temperature_Extremes_in_Europe_TEE_/28063226

Coding Language: We use Python to download the extreme temperature data from Copernicus and aggregate the hourly data to daily temperature measures. We use R for everything else.

Tools and Packages used:

R: terra, exactextractr, sf, tidyverse, foreign, zoo

Python: os, glob, shutil, zipfile, cdsapi, xarray, numpy

Output(s):

16 datasets:

Annual measures of extreme temperature

tee_yearly_nuts2.csv

tee_yearly_nuts3.csv

Extreme temperature waves (consecutive days of temperature extremes)

tee_wave_nuts2.csv

tee_wave_nuts3.csv

Daily temperature measure datasets split into 10-year files to reduce file size

tee_daily_nuts2_[START YEAR]_[END YEAR].csv

tee_daily_nuts3_[START YEAR]_[END YEAR].csv

Spatial extent: European Union countries

Temporal extent: 1980-2024

Comments: Ronnkvist et al (2025) contains detailed information on how the datasets were constructed and the included metrics. We provide replication instructions in the TEE-dataset GitHub repository (https://github.com/haskellcraigz/TEE-dataset/tree/main).

Citation:

Ronnkvist, S.R., Haskell-Craig, Z., Robinson, A., Conte Keivabu, R., Hauer, M.E., Bovienzo, D., & Zagheni, E. (2025) What’s the TEE: Metrics of Temperature Extremes in Europe NUTS Regions (1980-2024). Scientific Data. https://doi.org/10.1038/s41597-025-05352-7

Published papers that use this code:

Scientific Data publication associated with the TEE datasets:

Mapping Flooding and Population Exposure: The Global Flood Database

Written by Julia Shipman on April 22, 2026. Posted in Code & Data, Uncategorized.

Mapping Flooding and Population Exposure: The Global Flood Database

Link to data

Click here

Prepared by: Kathryn Foster, Cornell University

Date: April 2026

Original Authors: B. Tellman, J.A Sullivan, C.S Doyle, C. Kuhn, A.J Kettner, G.R Brakenridge, T.A. Erikson, D.A. Slayback

About: The data combines satellite flood and human settlement data to identify population exposure to flooding and flood risk from 2000 onward. The data were created using NASA satellite imagery to identify and map flood events recorded by the Dartmouth Flood Observatory (DFO), which were then intersected with global watershed data (HydroSHEDS) and daily precipitation estimates (PERSIANN-CDR). The maps were then overlaid with population data from the Global Human Settlement Layer (GHSL) to derive population exposure estimates.

Data are available on: Flood events, population exposure per flooding event, population exposed per country-event, and population displaced per event. The data include 913 flood events from 169 countries from 2000 to 2018. Utilizing satellite data, this database covers 2.23 million square kilometers of inundated land and helps track flood exposure, accounting for population change in flood-prone regions.

Spatial data (GeoTIFF files) by country and flooding events can be downloaded from the bottom of the Global Flood Database website in the interactive maps section. These maps outline the flooded areas and the duration of the flood for each event, as well as impacts on displacement and casualties. The DFO estimates both the cause of the flood event and the casualties. This is the number reported by the media or the government and could be much higher or lower than the estimated number of people exposed from satellite data.

Population exposure per flooding event is calculated by intersecting the observed inundated flood data with the population data. The population exposed per event is reported using the GHSL population estimated in 2000, and in 2015. Tabular data in csv format can be found in the “About the data” link at the top of the Global Flood Database website.

Although the database does not separate population exposure by age for each flooding event, researchers can still study older adults using the data. For example, researchers may combine the Global Flood Data with Census or population data of interest to assess the relationship between flood exposure and older adults (example found here).

Detailed GIS data descriptions and methodological notes can be found here: https://storage.googleapis.com/gfd_metadata/README_GFD.pdf

Citation: Tellman, B. et al. Satellite imaging reveals increased proportion of population exposed to floods. Nature 596, 80–86 (2021).

Creating “bins” for extreme temperature data and adjusting for known bias

Written by Julia Shipman on April 22, 2026. Posted in Code & Data, Uncategorized.

Creating “bins” for extreme temperature data and adjusting for known bias

Link to code

Click here

Prepared by: Alex Mikulas, PhD, CACHE postdoctoral associate

Date: April 2026

Original Authors:

Benjamin Jones, Northwestern University

Jacob Moscona, Massachusetts Institute of Technology

Benjamin A. Olken, Massachusetts Institute of Technology

Cristine von Dessauer, Massachusetts Institute of Technology

Specific purpose of code: This Stata code and program take temperature data at fine temporal and spatial resolution (ie: tract-day) and transform it to an aggregated panel dataset with temperature binned to year-place or month-place specificity. The program calculates both realized and expected number of days in each temperature bin. The data are then ready for use in regression and similar analysis using standard temperature bin specifications. The data are useful in better capturing the role of extreme temperatures by identifying extreme temperature days outside of what would be expected in a given area and are useful for assessing the causal role of increasing extreme temperature exposure.

General Application: Extreme temperature exposure is often operationalized in research using a binning procedure, wherein a researcher aggregates the number of extreme temperature days into binned temperature ranges. A common and serious bias can occur when using binned temperature data over time if the outcome variable of interest is associated with the baseline temperature, producing what are often called “U-shaped” results. When studying the impacts of climate change and increasing extreme temperature exposure, common binning procedures neglect the baseline temperature of a given area.

Let’s say that over a 20-year span, the average temperature increase is uniform across space. A place like Phoenix will see a large increase in extreme heat days, (say, 90-degree+ days) while a place like Boston will see a smaller increase in extreme heat days. This is because the baseline temperature for Phoenix 20 years ago was much warmer than the baseline temperature in Boston.

The bias arises when a given outcome is associated with both extreme heat days, and also the baseline temperature of a given area. If a baseline temperature of an area is closer to the bin thresholds for “extreme heat”, the outcome may be associated with the baseline temperature as well as the increase in extreme heat days, introducing statistical bias into an analysis.

This code addresses the issue by providing the “expected” number of days in each temperature bin, as well as the observed number of days in each temperature bin. These data can then control for different baseline temperatures and trends in warming for different areas. These expected and observed temperature bins allow the researcher to avoid regressing “trends on trends”, which estimate the biased U-shaped results. The estimated area-year and area-month temperature bin data can be used in a wide variety of extreme temperature exposure studies.

How does or could this code allow researchers to assess research questions related to aging or life course?: This code can be applied to spatially and temporally specific temperature datasets to bin temperature exposure data while also accounting for varying baseline temperatures and varying trends in extreme temperature over time. Health and aging scholars can then integrate these data as a weather exposure variable and more accurately predict the impact of extreme temperature on age and health related outcomes.

Data sets used:

Population, socioeconomic, or health data: code generates binned temperature data that can be integrated with any data that has spatial and temporal specificity (such as lat/long, geographic administrative identifiers, panel data, or data with observation dates).
Climate, weather, disaster or environment data: code is applicable to any temperature data with detailed place/day resolution.

Are all the data publicly available or are some restricted-access? NA

Links to data: NA

Coding Language: Stata

Tools and Packages used: cftemp, a custom Stata command in an ado. file.

Output(s): Dataset of observed and counterfactual data of binned counts of extreme temperature days at the place/year or place/month level. Output comes from any user supplied, hyper-specific daily temperature datasets.

Spatial extent: Flexible. Depends on user supplied data.

Temporal extent: Flexible. Depends on user supplied data.

Published papers that use this code: Benjamin Jones, Jacob Moscona, Benjamin A. Olken, and Cristine von Dessauer, “With or Without U? Binning Bias and the Causal Effects of Temperature Extremes,” NBER Working Paper 34671 (2026), https://doi.org/10.3386/w34671.

Code linking NCHS mortality data with GFD flood event data

Written by Julia Shipman on April 14, 2026. Posted in Code & Data, Uncategorized.

Code linking NCHS mortality data with GFD flood event data

Link to code

Click here

Date: April 2026

Authors/Creators/ Team Members: Victoria D. Lynch, Jonathan A. Sullivan, Aaron B. Flores, Xicheng Xie, Sarika Aggarwal, Rachel C. Nethery, Marianthi-Anna Kioumourtzoglou, Anne E. Nigra, and Robbie M. Parks

Specific purpose of code: This code links National Center for Health Statistics (NCHS) mortality data with Global Flood Database (GFD) flood event data from 2001 – 2018 by US county. We used the NCHS data to identify monthly total and cause-specific deaths by age group, sex, and county and used Global Human Settlement Layer (GHSL) population data to calculate county-level flood exposure and mortality rates. We used a Bayesian formulation of the conditional quasi-Poisson model to analyze the county-level association between the number of flood events per month and monthly death rates, accounting for overdispersion in the mortality data. The conditional approach examines differences within matched strata (here, county-months) like a case-crossover study design, which removes confounding bias due to factors that vary across strata. Bayesian inference enables the ‘borrowing of information’ across county units and for the full distributional estimation of the parameters of interest.
All-cause and cause-specific mortality associated with flood events is likely differential by flood cause and severity; therefore, we conducted analyses separately by all-cause and cause-specific mortality (cancers, cardiovascular diseases, infectious and parasitic diseases, injuries, neuropsychiatric conditions, and respiratory diseases), flood cause (all floods, heavy rain, tropical cyclone, snowmelt, and ice jam or dam break), and flood severity (mild, moderate, severe, and very severe). Because very severe flood events were most strongly associated with increased mortality across all flood causes and mortality groups, we further analyzed associations stratified by age group (0-64 and 65+ years) and sex (female and male) for very severe floods only.

General Application: This code links county-level flood exposure, categorized by flood type, with county-level mortality rates for the six primary causes of death in the US: cancers, cardiovascular disease, infectious and parasitic diseases, injuries, neuropsychiatric conditions, and respiratory diseases. The code could be used with any county-level health outcome and, with modification, with other county-level environmental exposures. The code specifically categorized exposure by flood type and severity, which would not apply to other exposures.

How does or could this code allow researchers to assess research questions related to aging or life course?: The code is written to assess the association between flood exposure and mortality by age category; in our paper, we specifically stratified by age category (0-64 years old and 65+ years old) to examine flood exposure-related mortality among older adults. The NCHS data include individual-level age at death and would enable analyses with any subset of age groups.

Data sets used:

Population, socioeconomic, or health data:
National Center for Health Statistics (NCHS) mortality data; Global Human Settlement Layer (GHSL) population exposure.
Climate, weather, disaster or environment data:
Global Flood Database (GFD) flood event data; Dartmouth Flood Observatory (DFO) flood classification data; Parameter-elevation Regression on Independent Slopes Model (PRISM) temperature data.

Are all the data publicly available or are some restricted-access? Data on flood exposure are available without restrictions for individual flooding events. Temperature data and population data are also publicly available.

NCHS mortality data are restricted. To access the NCHS mortality data, applicants must submit a project review form:(https://www.cdc.gov/nchs/data/nvss/nchs-research-review-application.pdf) to nvssrestricteddata@cdc.gov and allow four to six weeks for processing.

Links to data:

Coding Language: R

Tools and Packages used:

R: acs, BiocManager, dlnm, dplyr, ecm, Epi, fiftystater, foreign, fst, ggpubr, ggplot2, graph, graticule, haven, here, janitor, lubridate, mapproj, maptools, mapview, MetBrewer, pipeR, raster, RColorBrewer, readxl, rgdal, rgeos, rnaturalearth, rnaturalearthdata, scales, sf, sp, sqldf, survival, splines, table1, tidycensus, tidyverse, totalcensus, usmap, zipcodeR, zoo, INLA, Rgraphviz, fmesher

Output(s): Exploratory data analysis of flood and mortality data (maps, figures, tables), and output of statistical analysis (figures, tables)

Spatial extent: United States

Temporal extent: 2001-2018

Published papers that use this code: Lynch, Victoria D., et al. “Large floods drive changes in cause-specific mortality in the United States.” Nature Medicine 31.2 (2025): 663-671. doi: https://doi.org/10.1038/s41591-024-03358-z

Processing NDVI and VIIRS vegetation data for use in population health research

Written by Julia Shipman on March 17, 2026. Posted in Code & Data, Uncategorized.

Processing NDVI and VIIRS vegetation data for use in population health research

Link to code

Click here

Prepared by: Alex Mikulas, PhD, CACHE postdoctoral associate

Date: March 2026

Original Authors:

Finn Roberts, IPUMS Senior Data Analyst

Rebecca Luttinen, IPUMS Global Health Data Analyst

Devon Kristiansen, IPUMS Global Health Research Manager

Jude Mikal, Senior Research Fellow, University of Minnesota College of Pharmacy

Specific purpose of code: The below code resources offer a comprehensive outline for downloading, processing, aggregating, and integrating global vegetation coverage data for use in demographic and health research. Vegetation data come from the Normalized Difference Vegetation Index (NDVI), the Visible Infrared Imaging Radiometer Suite (VIIRS), and Moderate Resolution Imaging Spectroradiometer (MODUS).

Ultimately, these resources allow users to aggregate environmental data into spatially relevant scales and integrate it into a variety of social and health data sources to better measure environmental context or exposure.

The IPUMS DHS Spatial Analysis and Health Research Hub has numerous resources on using environmental data in health research. While many resources in the hub are used with DHS data integration, the data, code, and analysis resources can be altered for data integration into any spatially identified aging and health dataset.

Link to code:

[Start here!] From MODIS to VIIRS: The Latest Source for NDVI Data

Introducing NDVI as a Tool for Population Health Research

An Iterative Workflow for Loading NDVI Data in R

Aggregation Methods for NDVI Data

Vegetation and Land Cover Part 1: Vegetative Indices

Vegetation and Land Cover Part 2: Environmental Moisture

General Application: This code and associated resources allow researchers to build a vegetation coverage dataset that can be integrated into any individual or aggregate dataset that has temporal and spatial specificity. The data extend from 1981 to current, with 10 to 20-day increments and up to 20-meter raster resolution.

How does or could this code allow researchers to assess research questions related to aging or life course?: This code can be used to create environmental context and exposure to greenspace and vegetation variables that can be used cross-sectionally or longitudinally, and at spatially detailed scales. It can be integrated into health surveys to provide environmental context, aggregated data to identify locations with high concentrations of aging adults and changing vegetation or greenspace, etc. In longitudinal datasets, researchers could chart an individual’s longitudinal exposure to vegetation and other relevant environmental features over the life course.

Data sets used:

Publicly available climate and weather data.

Links to data (also repeated in code examples):

Normalized Difference Vegetation Index (NDVI)

Visible Infrared Imaging Radiometer Suite (VIIRS)

Moderate Resolution Imaging Spectroradiometer (MODUS)

Coding Language: R

Tools and Packages used: terra, sf, dplyr, ggplot2, ggspatial, patchwork, lubridate (likely others)

Output(s): datasets, maps

Spatial extent: Global dataset, raster data at 20 – 250-meter resolution

Temporal extent: 1981 to current; 10-20 day increments

Published papers that use this code:

Moisa, M., Roba, Z., Purohit, S., Deribew, K., & Gemeda, D. (2025). Evaluating the impact of land use and land cover change on soil moisture variability using GIS and remote sensing technology in southwestern Ethiopia. Environmental Monitoring and Assessment, 197. https://doi.org/10.1007/s10661-025-14301-1

Grace, K., Kristiansen, D., Boyle, E. H., & Luetke, M. (2023). Investigating Seasonal Agriculture, Contraceptive Use, and Pregnancy in Burkina Faso. The Professional Geographer. https://www.tandfonline.com/doi/full/10.1080/00330124.2023.2199316

Resources and Data from the IPUMS DHS Spatial Analysis and Health Research Hub

Written by Julia Shipman on March 17, 2026. Posted in Code & Data, Uncategorized.

Resources and Data from the IPUMS DHS Spatial Analysis and Health Research Hub

Link to data

Click here

Prepared by: Alex Mikulas, PhD, CACHE postdoctoral associate

Date: March 2026

Author: The IPUMS DHS Spatial Analysis and Health Research Hub is designed to be a resource for researchers who are familiar with IPUMS DHS population health survey data but new to weather, environment, and disaster research that uses spatial data sources. Such resources include conceptual frameworks for environment/health research, introductions to datasets, spatial data processing, and analysis code. The code and data resources in the hub use R scripting to demonstrate basic spatial data processing techniques for integrating numerous environmental and weather-related data with social and health data in the DHS.

IPUMS Demographic and Health Surveys (IPUMS-DHS) is a database of thousands of consistently coded variables on the health and well-being of men, women, children, and births of randomly selected households in 42 African countries and 9 Asian countries. Data include records of all household members, effectively capturing social and demographic data across the life course and age groups for low- and middle-income countries.

The guides and resources in the IPUMS-DHS Spatial Analysis and Health Research Hub are oriented toward data integration with IPUMS-DHS data. However, many scripts can be applied to any other social, health, and aging datasets that have geographic data identifiers. This includes datasets that have variables for administrative geographies (unique identifiers or spatial data polygons), respondent address or lat/long variables, or other gridded and raster datasets examining aging and health.

To support such research, the IPUMS Global Health team received a 2023 supplemental grant from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, or NICHD (3R01HD069471-12S1).

Data are available on: Hub resources include data integration walkthroughs, dataset explanations, spatial data processes, and more. Datasets referenced include CHIRPS, CHIRTS, NDVI, VIIRS, and more. See a sample of the numerous resources below:

Flexible Workflows with CHIRTS Temperature Data

An Iterative Workflow for Loading NDVI Data in R

Vegetation and Land Cover Part 1: Vegetative Indices

Vegetation and Land Cover Part 2: Environmental Moisture

Incorporating Qualitative Methods into Spatial Health Research

More frameworks, walkthroughs, datasets, and sample analysis coding here

Citation: IPUMS. (2026, March 16). Supporting Research on Extreme Weather and Health. IPUMS DHS Spatial Analysis and Health Research Hub. https://tech.popdata.org/dhs-research-hub/about.html

HRS Workshop

Written by Mandy Loader on February 9, 2026. Posted in Code & Data, HRS.

Workshops

Thank you to everyone who participated in the October 2025 Health and Retirement Study (HRS) Workshop!

About the workshop: Climate change is influencing human health and is particularly challenging for older adults. HRS holds tremendous potential to facilitate important research on aging-health-environment. This 1.5-day workshop introduced the HRS and reviewed examples of environmental data that can be integrated for this research.

View the full HRS Workshop agenda, with links to slide decks for each day’s overview and presentations. You can also view and download our speakers’ insightful presentations, listed alphabetically below.

Sara Adar, University of Michigan: EPOCH and the Gateway to Global Aging

Jennifer Alshire, University of Southern California
Contextual & Environmental Data: Resources for HRS and Other Aging Surveys

Deborah Balk, Mara Sheftel, Jennifer Brite, and Na Yin, City University New York
Scorching Circumstances: The Role of Extreme Heat in Disability Among Older Workers in Heat Sensitive Jobs

Zhirui Chen, Boston College
Connections among individual- and community-level housing characteristics and disaster preparedness in a national sample of low income U.S. adults

Eun Young Choi, University of Southern California
Aging under Climate Stress How Extreme Temperatures Shape Multi-System Biological Aging

Yanjun Dong, University at Albany
Aging, Climate, and the Social Determinants of Health: Disaster Preparedness and Inequities Among Older Adults

Jessica Finlay, University of Colorado Boulder
Contexts of Cognitive Health in the HRS

Melanie Gall, Arizona State University: Spatial Hazard Event & Loss Database for the US (SHELDUS)

Carina Gronlund, University of Michigan: Weather Resources for HRS in the Gateway to Global and NaNDA

Frank W. Heiland, City University New York
Retirement and Family Demography in the Wake of Disasters

Hannah Malak, UC Santa Barbara
Heat Exposure among Older Adults by Race/ethnicity: a multi-scale investigation of thermal inequity

Xi Pan, Texas State
Environment and Cognitive Aging

Fernando Riosmena, University of Texas – San Antonio
Cumulative Disadvantage & and the Aging of Mexican Immigrants in the United States

Hugh Roland, Alabama Birmingham
Climate Disaster Health Vulnerability Implications of Gulf Coast Demographic Dynamics

Amanda Sonnega, University of Michigan
HRS Overview

Jenna Tipaldo, City University New York
Mortality among disaster-exposed older adults in the US Health and Retirement Study

Roger Wong, State University of New York Upstate Medical University
Age Differences in Climate Event Exposures in a National U.S. Sample

Code analyzing population pyramids for counties exposed to Low Elevation Coastal Zones (LECZs) in Puerto Rico

Written by Mandy Loader on December 22, 2025. Posted in Code & Data, Population.

Code analyzing population pyramids for counties exposed to Low Elevation Coastal Zones (LECZs) in Puerto Rico

Link to code (Quarto markdown version)

Click here

Link to code (Github Pages Version)

Click here

Date: December 2025

Authors/Creators/ Team Members: Deborah Balk, Kytt MacManus, Hieu Tran, Camilla Greene, Shemontee Chowdhury, Juan F. Martinez

Specific purpose of code: Integration of Python programming with ArcGIS API to access NASA SEDAC Low Elevation Coastal Zone (LECZ) data, IPUMS API to access U.S. Census Decennial Census data of Age and Sex groups at the Block Group and County levels, create interactive maps, find insights about the changes in population pyramid structures, and compare these changes between areas inside and outside of the Low Elevation Coastal Zone (LECZ) in Puerto Rico.

General Application: This lesson demonstrates how to link U.S. Census data with the LECZ Merit-DEM dataset to analyze population and housing changes. It explores regional and local trends (at the county and block group levels) to highlight shifts in age groups within and outside of Low Elevation Coastal Zones (LECZ). The accompanying code enables users to explore census data at multiple geographic scales and integrate spatial environmental data to identify cohorts vulnerable to coastal flooding and observe how populations are changing in these areas.

How does or could this code allow researchers to assess research questions related to aging or life course?: This code could be used with the Decennial data to assess any 5-year age groups from under 5 to 85+ years of age and generate population pyramid charts for 2010 and 2020 to assess changes in age groups over time and space.

Data sets used:

Population, socioeconomic, or health data: Decennial Census Data on Age/Sex, Occupancy Status (Vacancy), Social Vulnerabilities in Community Resilience Estimates (CRE)
Climate, weather, disaster or environment data: Low Elevation Coastal Zone (LECZ)

Are all the data publicly available or are some restricted-access? Community Resilience Estimates (CRE). Author spoke with personnel at U.S Census regarding the restrictions and were advised to refer users to the first question on Community Resilience Estimates Frequently Asked Questions. Potential researchers are able to access the data with an approved project through the Federal Statistical Research Data Centers. If researchers would like to go that route, reach out to (sehsd.cre@census.gov) or refer to Federal Statistical Research Data Centers.

Links to data: Community Resilience Estimates, Decennial Census of Population and Housing Data, Low Elevation Coastal Zones derived from MERIT-DEM – Overview

Coding Language: Python

Tools and Packages used: Quarto Markdown, GitHub, Pandas, Numpy, Matplotlib, ipumspy, arcgis, matplotlib, folium.

Output(s): Maps, Scatterplot matrix, population pyramids, summary tables

Spatial extent: Puerto Rico

Temporal extent: 2010-2020