The Agency for Healthcare Research and Quality (AHRQ) hosted a panel at the 2022 AcademyHealth Annual Research Meeting (ARM), with moderator Dr. Herbert Wong, AHRQ Division of Markets and Systems Research Director, AHRQ panelists Drs. Patricia Keenan and Zeynal Karaca, and NORC staff Drs. Jennifer Smith and Quentin Brummet. The panel spotlighted three new AHRQ databases and their implications for future health services research (HSR).
Health services and policy researchers often use secondary data (i.e., data they did not collect themselves). Secondary data, such as those collected by the federal government, have historically been used to answer research questions about the U.S. health system and health outcomes such as patient safety or insurance coverage. However, some federal datasets are hindered by administrative burdens like steep learning curves and paywalls. AHRQ recognizes the need for innovative, high quality, and accessible data to support a well-informed research community. AHRQ offers well-known datasets, such as the Medical Expenditure Panel Survey (MEPS), a nationally representative study which collects data about health services use, cost, and health insurance coverage and Healthcare Cost Utilization Project (HCUP), a set of health care service databases used to describe health care delivery and patient outcomes. At this ARM panel, AHRQ and NORC staff specifically discussed three newly released or trial-phase databases: Synthetic Healthcare Data for Research (SyH-DR), Social Determinants of Health Database (SDOH-RD) and Physician and Physician Practice Research Database (3P-RD).
Synthetic Healthcare Data for Research (SyH-DR)
With advancements in computing and statistical capabilities, synthetic data is now being introduced in health care research. Just as it sounds, synthetic data is not entirely acquired from actual people or events but rather consists of statistical approximations of real events. Synthetic datasets range from partially synthetic, a mix of some real and some fabricated data, to fully synthetic data with entirely fabricated data. Both formats are cited as ways to protect the privacy of real people while reducing administrative burden and enhancing usability and educational opportunities.
AHRQ released their Synthetic Healthcare Data for Research (SyH-DR) this month. SyH-Dr originated from real Medicare, Medicaid, and commercial insurance claims data. Claims data contains detailed insurance billing codes which translate to health care procedures, diagnoses, and prescriptions. SyH-DR is considered partially synthetic as it is an aggregate of real data sources while masking actual person-level data. This database is considered nationally representative as of 2016 and is intended to be used by researchers and students who are familiar with health care claims data (i.e., insurance) and those who seek answers about health care utilization. While there is no cost, SyH-DR requires an application and data use agreement prior to use. AHRQ hopes SyH-DR will remove administrative overhead to typical health insurance claims data and allow for wide-ranging data analysis from diverse researchers.
Social Determinants of Health Database (SDOH-RD)
High-quality data has the potential to enhance the health service research field’s understanding of the social determinants of health (SDOH). SDOH are the external factors that affect health outcomes such as environmental contexts. AHRQ has developed an innovative beta research database, Social Determinants of Health Database (SDOH-RD), that will help researchers study the connections between SDOH and health outcomes. With funding from the Patient Centered Outcomes Research (PCOR) Trust Fund, SDOH-RD is a database derived from several national datasets such as American Community Survey, County Health Ratings, Social Vulnerability Index, U.S. Cancer Statistics, etc. While these individual datasets are extremely valuable, siloed data can cause undue burden on researchers due to learning curves and paywalls, thereby hindering data analysis opportunities. SDOH-RD has done the work for researchers by bringing these datasets into one file. The database currently contains five domains ranging from social, economic and health care contexts, education, and physical infrastructure. In addition, data can also be linked using county and zip code tabulation area (ZCTAs) with MEPS and HCUP. AHRQ hopes this database will enhance the field’s understanding of SDOH. For example, cross-comparison of county-level percentages of households with technology access and percentages of individuals in poverty can showcase disparities in telemedicine and remote health options.
Physician and Physician Practice Research Database (3P-RD)
The Physician and Physician Practice Research Database (3P-RD) currently lives in-house at AHRQ and is not yet available to the public. This database is a census of active U.S. physicians and practices. AHRQ is working with NORC to determine the viability of this database with the hopes of identifying the number of actively practicing physicians, determining their scope of practice and location of where they are practicing, and identifying workforce trends and areas of need.
As the health care system welcomes new technologies and researchers raise attention to the social context of health, AHRQ is responding to the need for innovative and accessible databases to answer the field’s toughest questions. The research community is free to explore the newly released SyH-DR and SDOH-RD. The viability of these databases depends on their utilization, therefore, AHRQ strongly encourages feedback and interaction from users.