Skip to main content

Using Health and Retirement Study Data: A Guide for New Users

Link to code

Authors/Creators/ Team Members: Amanda Sonnega, with statistical code provided by Ryan McCammon, Chichun Fang, Christopher Greene, and Sergio Martinez.

Specific purpose of code: This code provides code examples in four programming languages for working with the Health and Retirement Study. A dozen examples provide sample code demonstrating how to merge/join various types of data files at the respondent and household levels, combine household members, summarize information from a file that may have multiple rows per respondent, combining strata (for variance estimation), and to conduct analyses such as two-way tables and logistic regression.

General Application: This code provides code examples for working with the Health and Retirement Study. While intended for use with the HRS files specified, the code could be adapted and applied to other files within the HRS or other longitudinal surveys with individual-level responses and survey weights and strata.

How does or could this code allow researchers to assess research questions related  to aging or life course?: The data for which this sample code was created for, the Health and Retirement Study, is commonly used by researchers to study older adults in the United States.  The survey is nationally representative of the U.S. population over age 50 and contains many questions related to aging as well as modules that capture early life exposures.

Data sets used:

  • Population, socioeconomic, or health data: Health and Retirement Study (HRS)
  • Climate, weather, disaster or environment data: N/A

Are all the data publicly available or are some restricted-access?  Some HRS data is publicly available and researchers can apply for access to restricted data.

Links to data: https://hrsdata.isr.umich.edu/data-products/public-survey-data

Coding Language: R, SAS, STATA, SPSS

Tools and Packages used

  • R: srvyr, survey, haven, knitr, kableExtra, tidyverse
  • STATA: svy, merge
  • SAS: data MERGE, proc SURVEYLOGISTIC, proc SURVEYFREQ
  • SPSS: MATCH FILES, VARSTOCASES, SUMMARIZE, CSPLAN ANALYSIS, CSLOGISTIC

Output(s): Tables, datasets (joined/merged/stacked, reshaped)

Spatial extent: United States 

Temporal extent: 1992-2022 (span of the HRS data)