Medicaid Data
Dataset Website
Medicaid General Data Information and Medicaid Data Documentation
Vital Statistics
Sample size: Over 100 million
Sampling frame: Individuals from throughout the United States enrolled in Medicaid
Years of data: 2011 - 2019
Type of data: Public health insurance claims data
Dataset Description
The Medicaid data are claims data. Claims data are derived from billing data collected in the course of administering insurance programs. They can include things like patient demographics, procedure and diagnosis codes, prescription fills, insurance information, and other patient-provider communications.
The Medicaid program operates through a complex web of data management systems, with each state compiling information on enrollment, service utilization, and payments within their Medicaid Management Information System (MMIS) and related platforms. To standardize this varied state-level data, CMS furnishes states with a data dictionary to align their data elements. States then submit reconciled MSIS data files to CMS, which aggregates them into uniform annual state segment files known as the Medicaid Analytic eXtract (MAX) data files. MAX data, spanning from 1999 to 2014, were integral for research and policy analysis, being derived from MSIS.
However, with the phasing out of MSIS as states transitioned to the Transformed Medicaid Statistical Information System (T-MSIS), MAX data evolved accordingly. T-MSIS represents a leap forward in Medicaid and CHIP data, boasting monthly submissions and an expanded data scope, nearly quadrupling the number of reported data elements compared to MSIS. States now compile their Medicaid and CHIP data in the T-MSIS format, providing these files to CMS.
From the T-MSIS data provided by states, CMS creates the T-MSIS Analytic Files (TAFs), representing the next-generation national data source for Medicaid and CHIP beneficiaries and their service utilization. The TAFs undergo further processing by the CMS Chronic Conditions Data Warehouse (CCW) team, who load them into a database and generate Research Identifiable Files (RIFs) containing claims and enrollment information. These TAF RIFs are made accessible to academic researchers and select government agencies under approved Data Use Agreements (DUAs) for research purposes.
Strengths and limitations of the dataset
Medicaid data are excellent for studying the health of children, pregnancy, vulnerable or low income populations as these are well represented in the dataset.
You can find more information here: Strengths and Limitations of CMS Data
Demographics
In the 2019 TAF data (approximately 98 million beneficiaries):
Age: Mean 28.8 (sd 22.9), Missing 1.9%
Sex: Male 44%, Female 54%, Missing 2%
Selected Publications
There are not yet any publications arising from the Stanford copy of the Medicaid data.
Process for accessing the data and associated costs
The process to access the data can be found here: Medicaid Data Access Instructions. You will need an IRB (Expedited category 5) and Centers for Medicare and Medicaid Services approval. Access takes about 6 - 12 months.
Data Support
PHS has Medicaid office hours, slack channels, data documentation and recorded trainings.
ResDAC also provides trainings: ResDAC Medicaid Trainings
Other Datasets: