Earlier this year, the U.S. Department of Health and Human Services (HHS) released the Medicaid Provider Spending dataset, which contains provider-level spending data that the Centers for Medicare and Medicaid (CMS) intends to allow the public to identify unusual billing patterns and waste, fraud, and abuse. Although Medicaid researchers and AcademyHealth have long supported identifying and eliminating low-value or wasteful care, there are significant concerns about whether this data can be used effectively.
The released dataset is designed to provide insight into provider-level Medicaid spending, using aggregated data from outpatient providers and facilities with Healthcare Common Procedure Coding System (HCPCS) Codes, to better understand how Medicaid funding was distributed nationwide.
What data are included and excluded?
Included Data:
- Provider information: Such as the National Provider Identifier (NPI) of the billing provider and the servicing provider for the servicing providers.
- Procedure Codes: Appear as the Healthcare Common Procedure Coding System (HCPCS) codes denoting which service(s) were delivered.
- Claim Month: Presented in YYYY-MM-01 format.
- Beneficiaries Seen: That accounts for the provider(NPI)/procedure(HCPCS)/month.
- Total Claims: The total number of claims (count of claims) for the provider(NPI)/procedure (HCPCS)/month.
- Total Paid by Medicaid
Excluded Data:
- Prescription Drug Data: This category is heavily impacted by high-cost specialty medication that increased spending by 72 percent between 2017-2023.
- Cost of emergency and long-term care: Medicaid spending is primarily driven by these two expenditures, where both combined account for over 70 percent of spending (Hospital services=~38%, and Long-term care=~ 36%).
- Place of Service: Such as whether care was provided in-person or through telemedicine channels.
- Payment Rates: That varies state-by-state.
- Enrollment Numbers: These numbers help to identify coverage gaps, especially among vulnerable populations (e.g. people living with disabilities, senior citizens, and children).
What Does This Mean?
- The volume of the over-227-million-row data set limits its practical use.
- Missing context on payment-per-claim metrics can be misleading.
- The dataset lacks user guides or standardized protocols, which can lead to mistaken conclusions.
However, there are some key initial concerns raised by AcademyHealth Medicaid researchers and members that warrant further investigation into the newly published dashboard. Medicaid experts who routinely use Medicaid data have noted that the online dataset does not appear to reliably produce information in a readily accessible metric. Instead, these experts have observed that the data gives expenditures across a 7-or-9-year timespan, which would be better served separated by each year. There is also concern about the presentation of incomplete data from the last few months of 2025, which wouldn’t provide enough reportable data for that year, drawing attention to poor data collection. The presentation of broad, aggregated data poses a challenge to the feasibility of hypothesis testing. Without this, scientific advancement cannot occur.
The data does not offer insight into how the Medicaid data were aggregated to produce the summary file. Knowing how the data was aggregated helps ensure its accuracy and validity, avoid misuse of sensitive information, and prevent flawed conclusions from the presented information. In addition to this, the KFF has observed that the data does not seem to address challenges related to the quality of the data in specific states for certain topics that is generated from the Transformed Medicaid Statistical Information System or T-MSIS. T-MSIS data are complex, with reporting variability across states. To assist with the reporting of Medicaid claims data, the Medicaid Data Learning Network (MDLN) created the T-MSIS Analytic Files (TAF) Analysis Reporting Checklist that outlines items that should be reported in the analysis of T-MSIS data to generate high-quality, reproducible research. Despite these challenges, this group of experts believes the data could be useful for policy and HHS program evaluation, but it would require further exploration to fully understand how. Below is an overview of the data included and missing in the dashboard.
An initial analysis of the presence or absence of various data points raised additional concerns among our Medicaid experts. By including NPIs that publicly identify providers and linking them to claim volumes and expenditures, there is concern that individual providers (such as those who offer gender-affirming care) or care groups could be targeted or disincentivized from participating in the Medicaid program. Independent analysts have observed that the 2024 data is likely incomplete after reviewing the total monthly paid amounts across the full timespan (2018-2024), it was reported that the total dollars dropped significantly in November and December 2024.
As people access the dataset, it becomes important to think critically around how the data were put together, mainly the methods used for developing the dataset, which leads to questions about the quality and appropriateness of the data presented. Medicaid experts at AcademyHealth raised a question about the methodology on whether providers would be required to link claims to the beneficiary’s enrollment record. If this is not linked, this would suggest that the claims identified in this dataset are part of the bulk supply of services that the experts note cannot be linked with beneficiaries.
Without offering guidance to the public on how to use the data, it raises another concern, as the data presented as it currently is, can lead to mistaken observations. For instance, the dataset covers the years 2018-2024 but does not appear to account for the impact of the COVID-19 pandemic on Medicaid spending. In the first year of the pandemic (2019-2020), there was a 10.4 percent increase in health spending, with the FY2020 Medicaid expenditure rising to $683 billion. Excluding Medicaid enrollment and spending data that occurred during the pandemic reduces confidence for the overall lack of context that dataset provides and increases concerns regarding the overall integrity of the data.
While the data provided in this release aims to promote transparency, the dataset, which still has only the original version available, requires thorough interpretation and validation to meet its claim of accessibility and transparency. Without additional guidance and clarification on how to best use the dataset, the current presentation of the aggregated Medicaid claims data may warp the transparency the HHS is striving for.