MarketScan Data
Dataset Website
Vital Statistics
Sample size: 250 million patients
Sampling frame: United States
Years of data: 2007 - 2022
Type of data: Primarily private insurance (many payers). Some Medicare advantage.
Dataset Description
MarketScan Claims data are primarily derived from commercial claims. Claims data are derived from billing data collected in the course of administering insurance programs. They can include patient demographics, procedure and diagnosis codes, prescription fills, insurance information, and other patient-provider communications.
The MarketScan Research Databases integrate many types of data for healthcare research, including:
- De-identified records of more than 250 million patients (medical, drug and dental)
- Laboratory results derived from laboratory claims with selected documentation of values
- Hospital discharge records including inpatient rehabilitation and skilled nursing facility
- Dental claims
The MarketScan data include person-specific clinical utilization, expenditures, and enrollment across inpatient, outpatient, prescription drug, and carve-out services. The data come from a selection of large employers, health plans, and government and public organizations. The MarketScan Research Databases link paid claims and encounter data to detailed patient information across sites and types of providers and over time. The annual medical databases include private-sector health data from approximately 350 payers. Historically, more than 20 billion service records are available in the MarketScan databases. These data represent the medical experience of insured employees and their dependents for active employees, early retirees, Consolidated Omnibus Budget Reconciliation Act (COBRA) continues, and Medicare-eligible retirees with employer-provided Medicare Supplemental plans. The Merative MarketScan Research Databases are composed of six individual databases available on the PHS Data Portal.
Strengths and limitations of the dataset
Since many medical record systems are optimized for billing, these data are very complete and accurate to document covered, billable services (including both the care received and the costs associated with that care). The large sample size of these data also allows researchers to examine rare illnesses and medical conditions. Since individuals on private payer insurance plans usually stay with their insurance plan for extended periods of time, longitudinal follow-up enables measuring episode-based expenditures and clinical outcomes over time. Due to the extended follow-up time, drug adherence and inferred consumption are easily obtainable with the outpatient prescription fills.
While MarketScan affords the opportunity for breadth, some of the limitations are based primarily on the lack of clinical depth and/or demographic data. For instance, those conducting cancer studies/oncology, tumor stage, incidence of the cancer, and the point at which treatment results in remission are not directly obtainable from administrative claims. Demographic information is limited to some geographic and employment information to maintain de-identification, and there is no race/ethnicity data. Clinical notes are not available; therefore, billing codes (e.g. diagnosis, procedure, and other codesets) are used as a basis for documentation on the claim and subject to coding practices that encourage reimbursement. While MarketScan is a large private payer database, it represents a convenience sample of enrollees across the country representing private payers with varying market penetration in the insurance market within their region.
Process for accessing the data and associated costs
The MarketScan data are available on the PHS Data Portal. It takes about 3 - 7 days to get access.
Data Support
PHS has MarketScan office hours, slack channels. Data documentation and recorded trainings are available to data users who have completed meta-data access requirements.
Other Datasets: