New Stanford data science portal to accelerate innovative biomedical research

The newly launched Stanford Data Science Resources web portal is a central hub from which data scientists can access advanced tools, data platforms, and experts in diverse methodologies for conducting biomedical research. It also provides researchers with an overview of the more than 170 health-related datasets available to Stanford researchers.

At the heart of this portal is an automated request form that quickly connects researchers with the experts, advice and resources needed for a given project. Requestors may submit a research question, a data need, or a query on study design or methodology, and the staff will match requestors with the appropriate Stanford research support team. Experts are available from the Center for Population Health Sciences, Department of Biomedical Data Science, Research Informatics Center, Research Information Technologies group and Quantitative Sciences Unit. Their specialists can provide advice in a variety of areas, including general project support, biostatistics, informatics, mobile solutions, and research information technologies. The portal also provides researchers with a well-organized starting point for selecting secure electronic data capture and research management platforms.

On the datasets webpage, there is a summary of the wealth of clinical information stored in the STAnford Research Repository, called STARR, which includes 20 years of patient data from Stanford Health Care and Stanford Children’s Health. There’s also an overview of the Center for Population Health Sciences’ ever-growing number of population-level datasets, which enable Stanford researchers to better analyze factors such as poverty, inequality, climage change and forced migration on health and wellbeing.

Some of these data sources include the:

  • IBM MarketScan Research Database, which has person-specific clinical expenditures segmented into inpatient, outpatient, prescription drug and service categories for more than 150 million covered people.
  • Optum Clinformatics Data Mart stores administrative health claims for more than 72 million members of a large national managed-care company affiliated with Optum. It includes data from Medicare + Choice enrollees.
  • The Health Inequality Project has data on the differences in life expectancy by income and identifies strategies to improve health outcomes for low-income Americans.
  • Centers for Medicare and Medicaid Services allows researchers to evaluate geographic variations in the use and quality of health-care services for the Medicare fee-for-service population in a 20 percent sample of the national Medicare population.
  • The Integrated Public Use Microdata Series, which stores individual and household census records, is a good source for research on social and economic changes.
  • The “Born in Bradford” cohort study has data on 12,500 pregnant women from 2007 to 2010 and subsequent data on 13,500 offspring, all from a resource-poor town in the United Kingdom.


For this beta test version of the portal, the design team is actively soliciting feedback to make the site even better. Please send any suggestions to Stacyann Forrester at

The development of this web portal was funded by an NIH Clinical and Translational Science Award and the School of Medicine Dean’s office.