Consulting groups provide access to a petascale data warehouse of over 170 datasets. This includes clinical data gathered by Stanford itself, as well as a variety of population health datasets.

Stanford Clinical Data

STARR, a research data repository with 20 years of fully identified clinical data (since 1998),  includes, but is not limited to, nightly clinical data, Epic Clarity, from both Stanford Health Care (SHC aka adult hospital) and Stanford Children’s health (SCH aka Lucile Packard Children’s Hospital or LKSC). There have been several changes in the Electronic Health Record systems at the Hospitals over the years:

  • Legacy EMRs at SHC before 2008
  • Legacy EMRs at SCH before 2014
  • Epic Clarity at SHC since 2008
  • Epic Clarity at SCH since 2014


STARR contains not only fully identified and up-to-date (within the last 24-36 hours) Epic data from both SHC and Packard but also fully identified and current imaging data from Radiology as well as some historic clinical data from earlier EMRs that is not present in the current EMR systems.

Learn more...

Population Health Data

These are datasets that the Center for Population Health Sciences (PHS) has acquired, and are accessed via the PHS Data Portal, where you can find the most up-to-date information. Approximately 50 terabytes of data.

Learn more...

  • MarketScan
    The Truven Health MarketScan Research Databases capture person-specific clinical utilization, expenditures, and enrollment across inpatient, outpatient, prescription drug, and carve-out services.
  • MarketScan OMOP
    Truven MarketScan datafiles in the OMOP Common Data Model (CDM) format.
  • Optum Clinformatics
    Optum Clinformatics Data Mart Commercial health plan data and historic claims for Medicare + Choice comprised of administrative health claims for members of a nation-wide managed care company affiliated with Optum.
  • Health Inequality Project
    The Health Inequality Project uses big data to measure differences in life expectancy by income across areas and identify strategies to improve health outcomes for low-income Americans.
  • CMS Public Use Files
    From the Centers for Medicare & Medicaid Services, these enable researchers to evaluate geographic variation in the utilization and quality of health care services for the Medicare fee-for-service population.
  • CMS Research Identifiable Files
    A 20% sample of the entire Medicare population from the Centers for Medicare & Medicaid Services, these contain beneficiary-level and provider-level data.
  • STARR Tahoe
    A de-identified subset of data from STARR, with clinical data from both hospitals, updated annually and pre-approved by the IRB. For identified or up-to-date (24-36 hours old) data, see STARR under Stanford Clinical Data above.
  • American Manufacturing Cohort (AMC)
    A collection of interlinked corporate datasets, including human resources, payroll, medical claims, biometrics, injuries, industrial hygiene, and limited survey data on job demand for manufacturing workers.
  • HCUP
    The Healthcare Cost and Utilization Project (HCUP) includes the largest collection of longitudinal hospital data in the United States. HCUP's state-specific databases can be used to investigate state-specific and multi-state trends in health care.
  • IPUMS
    The Integrated Public Use Microdata Series (IPUMS) Complete Count Data are historic individual and household census records and are a unique source for research on social and economic change.
  • Partner datasets
    Stanford Center for Population Health Sciences has cultivated partnerships with a number of national and international collaborative organizations who are in possession of large, rich datasets useful for research. This includes the list below:
         • Born in Bradford
         • Clalit Research Institute
         • Danish population registries