Skip to main content

Link to code

Prepared by: Alex Mikulas, PhD, CACHE postdoctoral associate 

Date: April 20, 2026


Original Authors:

Benjamin Jones, Northwestern University 

Jacob Moscona, Massachusetts Institute of Technology 

Benjamin A. Olken, Massachusetts Institute of Technology  

Cristine von Dessauer, Massachusetts Institute of Technology 

Specific purpose of code: This Stata code and program take temperature data at fine temporal and spatial resolution (ie: tract-day) and transform it to an aggregated panel dataset with temperature binned to year-place or month-place specificity. The program calculates both realized and expected number of days in each temperature bin. The data are then ready for use in regression and similar analysis using standard temperature bin specifications. The data are useful in better capturing the role of extreme temperatures by identifying extreme temperature days outside of what would be expected in a given area and are useful for assessing the causal role of increasing extreme temperature exposure.

General Application: Extreme temperature exposure is often operationalized in research using a binning procedure, wherein a researcher aggregates the number of extreme temperature days into binned temperature ranges. A common and serious bias can occur when using binned temperature data over time if the outcome variable of interest is associated with the baseline temperature, producing what are often called “U-shaped” results. When studying the impacts of climate change and increasing extreme temperature exposure, common binning procedures neglect the baseline temperature of a given area.

Let’s say that over a 20-year span, the average temperature increase is uniform across space. A place like Phoenix will see a large increase in extreme heat days, (say, 90-degree+ days) while a place like Boston will see a smaller increase in extreme heat days. This is because the baseline temperature for Phoenix 20 years ago was much warmer than the baseline temperature in Boston.

The bias arises when a given outcome is associated with both extreme heat days, and also the baseline temperature of a given area. If a baseline temperature of an area is closer to the bin thresholds for “extreme heat”, the outcome may be associated with the baseline temperature as well as the increase in extreme heat days, introducing statistical bias into an analysis.

This code addresses the issue by providing the “expected” number of days in each temperature bin, as well as the observed number of days in each temperature bin. These data can then control for different baseline temperatures and trends in warming for different areas. These expected and observed temperature bins allow the researcher to avoid regressing “trends on trends”, which estimate the biased U-shaped results. The estimated area-year and area-month temperature bin data can be used in a wide variety of extreme temperature exposure studies.

How does or could this code allow researchers to assess research questions related  to aging or life course?: This code can be applied to spatially and temporally specific temperature datasets to bin temperature exposure data while also accounting for varying baseline temperatures and varying trends in extreme temperature over time. Health and aging scholars can then integrate these data as a weather exposure variable and more accurately predict the impact of extreme temperature on age and health related outcomes.

Data sets used: 

  • Population, socioeconomic, or health data: code generates binned temperature data that can be integrated with any data that has spatial and temporal specificity (such as lat/long, geographic administrative identifiers, panel data, or data with observation dates).
  • Climate, weather, disaster or environment data: code is applicable to any temperature data with detailed place/day resolution.

Are all the data publicly available or are some restricted-access? NA

Links to data: NA

Coding Language: Stata 

Tools and Packages used: cftemp, a custom Stata command in an ado. file.

Output(s): Dataset of observed and counterfactual data of binned counts of extreme temperature days at the place/year or place/month level. Output comes from any user supplied, hyper-specific daily temperature datasets. 

Spatial extent: Flexible. Depends on user supplied data.

Temporal extent: Flexible. Depends on user supplied data.

Published papers that use this code: Benjamin Jones, Jacob Moscona, Benjamin A. Olken, and Cristine von Dessauer, “With or Without U? Binning Bias and the Causal Effects of Temperature Extremes,” NBER Working Paper 34671 (2026), https://doi.org/10.3386/w34671.