Medicare Data
Medicare
Dataset Website: Medicare Data Documentation
Vital Statistics
Sample size: Over 19 million
Sampling frame: Individuals from throughout the United States enrolled in Medicare
Years of data: 2006 - 2020
Type of data: Public health insurance claims data
Dataset Description
The Medicare data are administrative claims data. Claims data are derived from billing data collected in the course of administering insurance programs. They can include attributes such as patient demographics, procedure and diagnosis codes, prescription fills, insurance benefit design information, and other patient-provider communications.
Overview
The Centers for Medicare and Medicaid Services (CMS) offers a wide variety of data products on Medicare enrollees including inpatient and outpatient utilization, prescription drug purchases through the Part D program, home health services, skilled nursing facilities, rehabilitative services, and more. CMS collects several other types of data as well, for example, the Medicare Cost and Beneficiary Survey (MCBS, a national, longitudinal panel survey), Consumer Assessment of Providers and Systems (CAHPS) survey data, Healthcare Effectiveness and Information Set (HEDIS) data, and extended data on nursing home residents.
Sampling Frame
Stanford hosts a 20% random, representative sample of Medicare beneficiaries. Stanford has the Research Identifiable Files (RIF) files which are the most rich, detailed and high-risk. Medicare is public health insurance for adults over the age 65 in the United States and some younger populations including people with permanent disability and end-stage renal disease. These data also include individuals that maintain partial or full dual-eligibility status with their state-sponsored Medicaid programs amongst beneficiaries who qualify.
Data Types
Beneficiary enrollment information includes information on dates of Medicare enrollment, services in which the patient is enrolled, and basic demographic information (e.g., demographics such as age, sex, race, postal ZIP code). Also included are indicators for a variety of chronic conditions, each defined using criteria applied to other Medicare files, and indicators for a series of “other chronic or potentially disabling conditions” including disability-related conditions, mental health, and substance use disorders.
MBSF Files includes information from inpatient utilization, including summary information from hospitalizations, detailed hospital claims, and claims information for Medicare-related expenses from skilled nursing facilities (for example, the post-hospitalization SNF benefit). Part A also covers the Medicare hospice benefit and claims information for hospice services are available.
Part B includes information from ambulatory visits, including claims from community-based and hospital-based doctors’ offices, as well as claims from other ambulatory services such as billing data from clinical laboratories and suppliers of durable medical equipment. Outpatient IV medications that are administered in doctors’ offices or infusion centers are also typically covered under Medicare Part B. These can be evaluated from Part B data files using Healthcare Common Procedure Coding System (HCPCS) codes.
Part D includes information from the Medicare prescription drug benefit, in particular claims for drug dispensing given in outpatient settings (e.g. retail pharmacies). Data include information such as the name of medication dispensed, dose, quantity supplied, days supplied, and cost/payment data. Most drug dispensing given in inpatient settings are covered under Medicare Part A bundled payment mechanisms and cannot be evaluated using Medicare data sources.
Strengths and limitations of the dataset
Medicare data are excellent for studying the health of Americans over the age of 65. Although some individuals elect to have private insurance in the form of Medicare Part C (Medicare Advantage), they tend to be a convenience sample of more affluent individuals and less representative of the overall population. Medicare enables a full breadth of utilization of services amongst beneficiaries that are enrolled in both Part A and B. These data also afford the opportunity to study long-term outcomes of aging, service utilization, costs that are episodic in nature, and identify key populations that are representative of older U.S. adults. Another strength is that these administrative claims, compared to private payer datasets, afford slightly more geographic granularity (e.g. postal ZIP code), demographic specificity (e.g. race and ethnicity), and the opportunity for linkage to other registries. Some linkages enable researchers to obtain specific information on resources by geography (e.g. American Hospital Association linkage based on hospital identifier, Area Health Resource File linkage by county codes, American Community Survey linkage by postal ZIP code, etc.), information concerning physicians and other providers of care (e.g. American Medical Association MasterFile by individual physician National Provider Identifier), and cancer staging and other clinically relevant variables for oncologic studies (e.g. Medicare linked with Surveillance, Epidemiology, and End Results (Medicare-SEER)).
While Medicare data afford amazing opportunities for clinical depth and breadth amongst older adults and other vulnerable populations (e.g. people with disabilities and individuals with End-Stage Renal Disease (ESRD)), there are a few noteworthy limitations. Beneficiaries that elect for Part C coverage, while documented in the enrollment, will not have reliable tracking of service utilization. Also, due to the nature of beneficiaries’ enrollment, incident disease is difficult to quantify as is remission or the time when treatment for a condition may have stopped. Due to differences in coding practices across region and by provider, certain types of conditions or diagnoses may not be well-documented in structured billing data and would be more appropriately documented in clinical notes.
You can find more information here: Strengths and Limitations of CMS Data
Demographics
In the 2020 data (approximately 13 million beneficiaries)
- Age: Mean 71.7 (sd= 10.9)
- Sex: Male 45.3%, Female 54.7%
Process for accessing the data and associated costs
The process to access the data can be found here: Medicare Data Access Instructions. You will need an IRB (Expedited category 5) and Centers for Medicare and Medicaid Services approval. Access takes about 6 - 12 months.
Data Support
PHS has Medicare office hours, slack channels, data documentation and recorded trainings.
ResDAC also provides trainings: ResDAC Medicare Trainings
Selected Publications
- Baker, L. C., Kessler, D. P., & Vaska, G. K. (2022). The Relationship Between Provider Age and Opioid Prescribing Behavior. American Journal of Managed Care, 28(5).
- Einav, L., Finkelstein, A., Mullainathan, S., & Obermeyer, Z. (2018). Predictive modeling of US health care spending in late life. Science, 360(6396), 1462-1465.
- Ratakonda S, Lin P, Kamdar N, Meade M, McKee M, Mahmoudi E. Potentially Preventable Hospitalization Among Adults with Hearing, Vision, and Dual Sensory Loss: A Case and Control Study. Mayo Clin Proc Innov Qual Outcomes. 2023 Jul 21;7(4):327-336. doi: 10.1016/j.mayocpiqo.2023.06.004. PMID: 37533599; PMCID: PMC10391598.
- Conner BC, Xu T, Kamdar NS, Haapala H, Whitney DG. Physical and occupational therapy utilization and associated factors among adults with cerebral palsy: Longitudinal modelling to capture distinct utilization groups. Disabil Health J. 2022 Jul;15(3):101279. doi: 10.1016/j.dhjo.2022.101279. Epub 2022 Feb 15. PMID: 35264292; PMCID: PMC9308687.
- Mahmoudi E, Kamdar N, Furgal A, Sen A, Zazove P, Bynum J. Potentially Preventable Hospitalizations Among Older Adults: 2010-2014. Ann Fam Med. 2020 Nov;18(6):511-519. doi: 10.1370/afm.2605. PMID: 33168679; PMCID: PMC7708283.
Other Datasets: