STARR (formerly known as STRIDE)

Research IT team has managed STRIDE (Stanford Translational Research Integrated Database Environment) Data Warehouse since inception in 2009. The scope of the data entering the warehouse has expanded well beyond the initial STRIDE vision and capabilties. To capture the new types of biomedical Big Data in the data warehouse, Research IT is designing the next generation data warehouse to support petascale data to support radiology PACS, pathology slides, genomics, biobank, mhealth and other sources. We are leveraging cloud technologies to support the scale.

At the core of Stanford Medicine Research Data Repository (STARR) clinical data warehouse is the Electronic Medical Record (EMR) data going back to 1998. This clinical data warehouse contains data from both hospitals and spans multiple EMR implementations: Epic Clarity at Stanford Hospitals and Clinics (SHC) since 2005, Epic Clarity at Stanford Children's Health (SCH) since 2015, Cerner at SCH 2005-2014, and CareCast and LastWord at both SHC and SCH prior to 2005.

STARR uses standardized terminologies, such as SNOMED, RxNorm, ICD, and CPT to represent biomedical concepts and their relationships and adheres to the FAIR Guiding Principles for scientific data management. STARR database contains data from ~4 million patients spanning over 75 million visits and represents over 100 million diagnosis (ICD9, ICD10), over 67 million procedures (CPT-4, HCPCS, ICD), over 350 million Lab (LOINC), 55+million medications (RxNorm) and over 65 million notes. The content of STARR database is no more than 24 hours older than live Hospital data and the data dictionary is readily available on Research IT website.


  1. STARR Data Synopsis (Link)
  2. STARR Data Dictionary (Link)

The overall vision for the STARR data warehouse is represented in the figure below.

Getting access to STARR data

Research Informatics Center

Consultants at the Research Informatics Center review the clinical data needs of your research project, provide advice on requesting IRB approval to obtain clinical data from the Clinical Data Warehouse and discuss options for clinical data abstraction, reporting and storage to meet your research needs.

Center for Population Health Science

Center for Population Health Science (PHS) has a de-identified subset of STARR structured data (STARR-Tahoe). PHS researchers can access the data on the PHS portal after meeting PHS compliance requirements. The data can be explored in an intuitive GUI running on the Google Big Query platform.