PHS offers a diverse collection of high-value data sets and partners with several data custodians around the world. 

Effective population health research requires rich and diverse data with opportunities for linkage to social and environmental determinants of health and long-term follow-up. Stanford PHS offers access to a growing portfolio of population-level data to Stanford researchers and affiliates. These diverse data are a catalyst for transdisciplinary research and enable researchers to study a myriad of vital outcomes.

An overview of the datasets available at PHS is below.

 

DATASET DATASET TYPE SMALLEST GEO UNIT SAMPLE SIZE DATE RANGE
American Family Cohort (AFC) EMR - Primary Care Census Block 8 million  2010 - 2024
MarketScan Claims - Commercially Insured Metropolitan Area 149 million 2006 - 2022
Medicare 20% RIF Claims - Medicare 9 digit zip 11 million 2006 - 2020
Medicaid 100% RIF Claims - Medicaid 5 digit zip Over 100 million 2011 - 2019
SEER and CA Cancer Registry -
CMS linked data
SEER and CA Cancer Registry will do linkages w/CMS 5 digit zip Varies Varies
Aarhus Danish Registers National cohort, Surveys Administrative data, Biologic samples Census Block 5 million 1968 - 2020

Data Portal

The Data Core at the Stanford Center for Population Health Sciences offers researchers:

  • A central hub to efficiently access, link, visualize and analyze data from a wide variety of sources;

  • A library of data assets to facilitate transdisciplinary population health science projects and collaboration.

Powered by Redivis, our Data Portal includes tools optimized for large health datasets that can query billions of records in seconds. Because these are high-value health data, you will need to complete several requirements to ensure responsible use of sensitive data. You can read more about requirements in the access section of each dataset on the PHS Data Portal.

Getting started

On our Data Portal, you can apply for membership and access, explore datasets, and use the Redivis tool to identify your analytical sample. After cutting your analytical sample and learning about the data, you can run your analyses on a variety of compliant, secure computational environments. We encourage you to use the Redivis Native Jupyter Notebooks. Please consult our PHS Documentation resources to get started.

New Data Portal Training

Learn how to utilize Redivis, a data platform used to store and query data on the PHS Data Portal, for every stage of your analytical workflow. This presentation showcases common methodologies in working with large claims datasets, including scalable cohort generation and analytical workflows in R, Python, Stata and SAS. The session concludes with an exploration of using modern ML techniques to classify patient notes and other unstructured data.

Watch presentation

Access presentation slides

Explore a sample project on Redivis

Getting help

We offer several sources of support.

PHS data docs

Read all about how working with PHS data from start to finish in this step-by-step guide, including more information about our systems and FAQs.


Slack user channel

Your second line of support is more interactive and great for quick questions that you can't resolve from the PHS data docs alone. You can also search the channels for your questions as it may have been asked before.


Office hours

Your third line of support is to schedule a meeting with us. We are happy to sit down with you for more complicated questions and issues that are best resolved in conversation.


We are happy to support your data questions and suggestions.