PHS offers a diverse collection of high-value data sets and partners with several data sources around the world.
Effective population health research requires rich and diverse data with opportunities for potential linkage and long-term follow-up. Stanford PHS offers access to a growing portfolio of population-level data to Stanford researchers and affiliates. These diverse data covering precision and representation of the population are a catalyst for transdisciplinary research and enable researchers to study a myriad of vital outcomes.
An overview of the datasets available from PHS is below.
DATASET | DATASET TYPE | POPULATION | SMALLEST GEO UNIT | SAMPLE SIZE | DATE RANGE | TIME TO ACCESS | STRENGTH |
American Family Cohort (AFC) | EMR - Primary Care | US | Census Block | 8 million | 2010 - 2024 | 1 month | linkable by individual |
MarketScan | Claims - Commercially Insured | US | Metropolitan Area | 149 million | 2006 - 2022 | 7 days | prices, variability in insurance type |
Medicare 20% sample | Claims - Medicare | US | 9 digit zip | 11 million | 2006 - 2020 | 6 - 9 months | representative of Americans over 65; rich, longitudinal |
Medicaid 100% | Claims - Medicaid | US | 5 digit zip | Over 100 million | 2011 - 2019 | 6 - 9 months | representative of Americans enrolled in Medicaid |
SEER and CA Cancer Registry - CMS linked data |
SEER and CA Cancer Registry will do linkages w/CMS | US | 5 digit zip | Varies | Varies | 3 - 6 months | Linked dx/treatment data |
Aarhus Danish Registers | National cohort, Surveys Administrative data, Biologic samples | Denmark | Census Block | 5 million | 1968 - 2020 | No direct access. 3 - 6 months |
Rich, longitudinal, Individual linkages |
Data Portal
The Data Core at the Stanford Center for Population Health Sciences offers researchers
- A central hub to efficiently access, link, visualize and analyze data from a wide variety of sources; and,
- A library of data assets to facilitate transdisciplinary population health science projects and collaboration.
Powered by Redivis, our Data Portal enables you to explore and learn about tools to manipulate millions of records. Once you have identified data of interest you will need to meet several requirements to ensure responsible use of sensitive data. You can read more about requirements in the access section of each dataset on the PHS Data Portal.
Getting started
On our Data Portal, you can apply for membership and access, explore datasets, and use the Redivis tool to acquire your analytical sample. After cutting your analytical sample and learning about the data, you will run your analyses on our secure PHS server or Nero. Please consult our PHS data docs resource to get started.
New Data Portal Training
Learn how to utilize Redivis, a data platform used to store and query data on the PHS Data Portal, for every stage of your analytical workflow. This presentation showcases common methodologies in working with large claims datasets, including scalable cohort generation and analytical workflows in R, Python, Stata and SAS. The session concludes with an exploration of using modern ML techniques to classify patient notes and other unstructured data.
Getting help
We offer several sources of support.
PHS data docs
Read all about how working with PHS data from start to finish in this step-by-step guide, including more information about our systems and FAQs.
Slack user channel
Your second line of support is more interactive and great for quick questions that you can't resolve from the PHS data docs alone. You can also search the channels for your questions as it may have been asked before.
Office hours
Your third line of support is to schedule a meeting with us. We are happy to sit down with you for more complicated questions and issues that are best resolved in conversation.
Contact us
We are happy to support your data questions and suggestions. You can contact us at phsdatacore@stanford.edu.
Have suggestions for additional datasets PHS should offer? Submit requests here.