Data

We offer a diverse collection of high-value data sets and partner with several data sources around the world. 

PHS Data

Effective population health research requires rich and diverse data with opportunities for potential linkage and long-term follow-up. Stanford PHS offers access to a growing portfolio of population-level data to Stanford researchers and affiliates. These diverse data covering precision and representation of the population are a catalyst for transdisciplinary research and enable researchers to study a myriad of vital outcomes.

An overview of the datasets available from PHS is below.

DATASET DATASET TYPE POPULATION SMALLEST GEO UNIT SAMPLE SIZE DATE RANGE TIME TO ACCESS STRENGTH
American Family Cohort (AFC) EMR - Primary Care US Census Block 8 million  2010 - 2024 1 month linkable by individual
MarketScan Claims - Commercially Insured US Metropolitan Area 149 million 2006 - 2022 7 days prices, variability in insurance type
Medicare 20% sample Claims - Medicare US 9 digit zip 11 million 2006 - 2020 6 - 9 months representative of Americans over 65;
rich, longitudinal
Medicaid 100% Claims - Medicaid US 5 digit zip Over 100 million 2011 - 2019 6 - 9 months representative of Americans enrolled in Medicaid
SEER and CA Cancer Registry -
CMS linked data
SEER and CA Cancer Registry will do linkages w/CMS US 5 digit zip Varies Varies 3 - 6 months Linked dx/treatment data
Aarhus Danish Registers National cohort, Surveys Administrative data, Biologic samples Denmark Census Block 5 million 1968 - 2020 No direct access. 
3 - 6 months
Rich, longitudinal,
Individual linkages

Data Portal

The Data Core at the Stanford Center for Population Health Sciences offers researchers 

  1. A central hub to efficiently access, link, visualize and analyze data from a wide variety of sources; and,
  2. A library of data assets to facilitate transdisciplinary population health science projects and collaboration.

Powered by Redivis, our Data Portal enables you to explore and learn about tools to manipulate millions of records. Once you have identified data of interest you will need to meet several requirements to ensure responsible use of sensitive data. You can read more about requirements in the access section of each dataset on the PHS Data Portal.

Getting started

On our Data Portal, you can apply for membership and access, explore datasets, and use the Redivis tool to acquire your analytical sample. After cutting your analytical sample and learning about the data, you will run your analyses on our secure PHS server or Nero. Please consult our PHS data docs resource to get started.

Data Requests

You can file a data request if you are interested in datasets we currently do not yet offer. Please use our Data Request Dashboard to submit.

New Data Portal Training

Learn how to utilize Redivis, a data platform used to store and query data on the PHS Data Portal, for every stage of your analytical workflow. This presentation showcases common methodologies in working with large claims datasets, including scalable cohort generation and analytical workflows in R, Python, Stata and SAS. The session concludes with an exploration of using modern ML techniques to classify patient notes and other unstructured data. 

Watch presentation

Access presentation slides

Explore a sample project on Redivis

Getting help

We offer several sources of support.

PHS data docs

Read all about how working with PHS data from start to finish in this step-by-step guide, including more information about our systems and FAQs.


Slack user channel

Your second line of support is more interactive and great for quick questions that you can't resolve from the PHS data docs alone. You can also search the channels for your questions as it may have been asked before.


Office hours

Your third line of support is to schedule a meeting with us. We are happy to sit down with you for more complicated questions and issues that are best resolved in conversation.


Contact us

We are happy to support your data questions and suggestions. You can contact us at phsdatacore@stanford.edu