Daily Temperature Data Processing and Analysis: An Example for New York City
Link to code
Authors/Creators/ Team Members:
Author: Selen Ozdogan
Team Members: Frank Heiland, Deborah Balk, Jennifer Brite, Peter Marcotullio
Specific purpose of code:
This code aims to provide a comprehensive guide to acquiring and cleaning daily temperature and precipitation data for New York City between 2015-2022 using two primary data sources: Global Historical Climatology Network daily (GHCNd) from the U.S. National Centers for Environmental Information and ERA5-Land Reanalysis from the European Union’s Copernicus Project.
From these data sources, the code assembles daily air temperature and precipitation. It also calculates wet bulb temperature and creates temperature exposure variables with varying temporal resolutions. The extent of this example is New York City (NYC). Aggregation of the input data is necessary to generate estimates for all NYC.
The code is embedded in an R Markdown pdf file.
General Application:
This is a guide to obtaining climate data and creating different temperature measures and temporal exposure lags. With minor tweaks, the code could be used for other locations/time-periods and can be merged with any daily data set for data analysis. Note that our example here is from 2015-2022, but that time period can be extended (as we also did in the underlying research); a short time period is given in this R Markdown package to facilitate the demonstration.
How does or could this code allow researchers to assess research questions related to aging or life course?:
The output from this code, daily climate data, could be merged with any daily (or more aggregated temporal frequency) data to study the impact of extreme weather events on aging populations, so long as the underlying spatial resolution of the climate data and population data (from either administrative, census or survey data) are spatially and temporally compatible.
Data sets used:
- Climate, weather, disaster or environment data:
Global Historical Climatology Network daily (GHCNd) – point location format.
ERA5-Land Reanalysis data – grid format
- All data are publicly available
Links to data:
Coding Language: R, Python
Tools and Packages used:
R: tidyverse, lubridate, magrittr, here, sf, raster, exactextractr, openxlsx, fixest, slider
Python: os, cdsapi, time, Path
Output(s): Dataset
Spatial extent: New York City (roughly 300 sq. miles or 778 sq. km.)
Temporal extent: 2015-2022
Comments: Replication package for the Demography article will be available here.
Published papers that use this code:
Forthcoming paper “Extreme Weather and Mortality of Vulnerable Urban Populations: An Examination of Temperature and Unclaimed Deaths in New York City”, in Demography (2026).
Related Content:
Demonstration Project: Impact of Extreme Weather on Hard-to-Capture, Vulnerable Populations: Evidence from Hart Island — New York’s Public Burial Ground
Seminar: Measuring Extreme Temperatures and Thermal Comfort in Aging and Demographic Reseach