How to Get Data

Refer to the guide below for obtaining clinical data for research.

On This Page

  • Guide to Obtaining Clinical Data for Research
  • Step 1
  • Step 2
  • Step 3

Guide to Obtaining Clinical Data for Research

We encourage you to use the STARR self-service tools as much as possible to avoid incurring costs for your research endeavors. If none of the self-service options work for you, then see the section on obtaining additional support.

The choice of self-service tool to use is guided by three considerations:

  1. Your familiarity with common practices in conducting data science work (such as using Jupyter notebooks, working with cluster computing).
  2. Whether you want raw data or standardized data in a common data model.
  3. Whether you envision taking your study beyond Stanford.

Step 1: How many patients meet my criteria of interest?

If you do not plan for your study to be run at non-Stanford sites, then use the STARR-STRIDE-web tools. Available since 2008, the STRIDE tools provide access to raw data, which are not standardized into a common data model.

  • The STARR Cohort Discovery Tool will help you find out how many Stanford patients match your criteria of interest and summary characteristics about these patients. This tool provides aggregated, approximate results for the preparatory to research phase. Before running queries, familiarize yourself with the documentation.

If you envision taking your study to sites beyond Stanford, are familiar with common practices in conducting data science work, or want data in a standardized form, then use the STARR-OMOP-deid database. Launched in late-2019, the STARR-OMOP-deid database is a pre-IRB direct access to de-identified dataset in the OMOP Common Data Model. The dataset contains both de-identified clinical notes as well as outputs from advanced text processing.

  • To count patients meeting the criteria of interest you can either use the ATLAS Cohort tool or query the STARR-OMOP database directly via SQL.
  • There is free training for researchers to learn about the OMOP CDM, how to use the Atlas Cohort tool, and how to analyze data in the STARR-OMOP database. Check out the section STARR-OMOP and ATLAS, online and hands-on learning to find out more.

Note that data for research obtained from RIC is subject to compliance agreements implemented by Stanford Health Care (SHC) or Stanford Children’s Health (SCH). Such agreements change over time and may result in the filtering of the patient and/or encounter records. This means that a data record obtained on one date may be filtered out at a later date, even for the same query.

Step 2: Enough patients meet my criteria! How do I begin my study?

If you are not familiar with common practices in conducting data science work, then read up on compliance steps, obtain IRB approval, and complete a Data Privacy Attestation (DPA).

  • You can then use the Cohort Discovery Tool to send a list of patient records for review in the Chart Review Tool using this step-by-step guide. The Chart Review Tool can also export data as .csv files for further analysis.

If you are familiar with common practices in conducting data science work, then access the STARR-OMOP database.

Step 3: What if my needs are not satisfied by the self-service options?

If you have created a de-identified cohort using the STARR-OMOP-deid database and would like to identify patients or obtain additional data from their records, you will need to go through the compliance steps, obtain IRB approval, and complete a Data Privacy Attestation (DPA).

After this is done, request help from the Research Informatics Center. RIC is the designated team to provide clinical data for research use at Stanford. The team can also extract data that are in the EHR but not available through either the STARR-STRIDE tools or via the STARR-OMOP database. Note that such custom requests will typically require you to have funding for the work you ask RIC team members to do for you.

If you need to download images (X-ray, MRI, CT scans), please request a consultation with RIC after going through the compliance steps. Currently, image data are not available via self-service tools.

If you are interested in data from sources not yet listed, such as the Stanford Cancer Registry (SCIRDB), the Bone Marrow Transplant / CAR-T Database, the Solid Tumor and Lymphoma Databases, other specialized oncology clinical datasets (e.g., OnCore, Pathology, Mutations, Imaging, Radiation Therapy), or ancillary systems such as ARIA (Radiation Oncology), PowerPath (Pathology), TraceMasterVue (EKG) , XCelera (Cardiac imaging), or Bedside Monitoring Data (LPCH only), please request a consultation with the RIC after going through the necessary compliance steps.