Using Health and Retirement Study Data: A Guide for New Users
Link to code
Authors/Creators/ Team Members: Amanda Sonnega, with statistical code provided by Ryan McCammon, Chichun Fang, Christopher Greene, and Sergio Martinez.
Specific purpose of code: This code provides code examples in four programming languages for working with the Health and Retirement Study. A dozen examples provide sample code demonstrating how to merge/join various types of data files at the respondent and household levels, combine household members, summarize information from a file that may have multiple rows per respondent, combining strata (for variance estimation), and to conduct analyses such as two-way tables and logistic regression.
General Application: This code provides code examples for working with the Health and Retirement Study. While intended for use with the HRS files specified, the code could be adapted and applied to other files within the HRS or other longitudinal surveys with individual-level responses and survey weights and strata.
How does or could this code allow researchers to assess research questions related to aging or life course?: The data for which this sample code was created for, the Health and Retirement Study, is commonly used by researchers to study older adults in the United States. The survey is nationally representative of the U.S. population over age 50 and contains many questions related to aging as well as modules that capture early life exposures.
Data sets used:
- Population, socioeconomic, or health data: Health and Retirement Study (HRS)
- Climate, weather, disaster or environment data: N/A
Are all the data publicly available or are some restricted-access? Some HRS data is publicly available and researchers can apply for access to restricted data.
Links to data: https://hrsdata.isr.umich.edu/data-products/public-survey-data
Coding Language: R, SAS, STATA, SPSS
Tools and Packages used:
- R: srvyr, survey, haven, knitr, kableExtra, tidyverse
- STATA: svy, merge
- SAS: data MERGE, proc SURVEYLOGISTIC, proc SURVEYFREQ
- SPSS: MATCH FILES, VARSTOCASES, SUMMARIZE, CSPLAN ANALYSIS, CSLOGISTIC
Output(s): Tables, datasets (joined/merged/stacked, reshaped)
Spatial extent: United States
Temporal extent: 1992-2022 (span of the HRS data)