Access to OMOP data, tools, training and more

OMOP and STARR-OMOP documentation

Also review the training section and publication section.

OMOP and ATLAS documentation by OHDSI (public access):

  • OMOP CDM Specification: OHDSI consortium provided detailed specification of the OMOP CDM v5.3.1 data model.
  • Themis rules: These rules guide the CDM development with the goal of  developing standardized tools and methods and drive quality, reproducibility and efficiency.
  • Book Of OHDSI: A central knowledge repository for OHDSI describing the OHDSI community, OHDSI data standards, and OHDSI tools.
  • ATLAS wiki and github: ATLAS Wiki and github repository maintained by the OHDSi community


STARR-OMOP documentation by Research IT:
  • STARR-OMOP Technical Specifications document: This document provides details regarding the underlying STARR-OMOP data, transformations, quality metrics and techniques such as de-identification. This g-doc is accessible with SUNetID.
  • Questions users like you have asked: FAQ and much more. This g-doc is accessible with SUNetID.
  • STARR-OMOP manuscript: We specifically recommend reading the Supplementary Material for methodologies and analytics.
  • STARR-OMOP data dictionary: In addition to tables required for OMOP CDM 5.3.1, the dataset contains some extra columns which are not strictly part of the CDM definition, but have been added for increased source/patient traceability. The STARR OMOP identified dataset is created using Clarity tables which only include the patient and encounter data that is permissible for research. This deid dataset does not contain psychiatric notes, or other confidential notes. This g-sheet is publicly accesible.

Get access to STARR-OMOP-deid dataset

The STARR-OMOP-deid is a fully de-identified pre-IRB dataset. It is High Risk by University Privacy Office (UPO) determination. It contains fully de-identified text. We also have STARR-OMOP-deid-lite, a fully deidentified pre-IRB dataset that doesn't contain the clinical text. There are also 1% versions of these datasets to ease query development. 

Two prerequisites for access to STARR-OMOP-deid (and variants) is a data privacy attestation and a Nero account. Think of Nero as a secure research enclave for working with the sensitive data. 

  • Sign a Data Privacy Attestation (DPA): Click the link, and "new" tab and in the “Project Type”, please select “STARR Nav deID data”. Note that “Nav” stands for Navigator and signifies a collection of pre-IRB clinical datasets.
  • Getting started on Nero: A Stanford PI affiliation is required for a Nero account. In some exceptional cases, Stanford Directors and C-suite executives can act as PI.
    • Specify that you are requesting access for STARR-OMOP-deid. You will need a Nero GCP account and a Nero on-premise account.
    • Once a Nero project has been created, you will get an email from Stanford Research Computing Center (SRCC) informing you that access has been granted with some introductory information. 
  • Email Priya Desai, Biomedical Informatics Product Manager, Research IT, Technology & Digital Solutions with “Requesting STARR OMOP deid data in BigQuery” in the subject line and your Nero GCP project name and your PIs ORCID.
  • Documentation: Review the step-by-step guide to getting access in the user guide. You need to be logged into Stanford network to access the contents of the guide. 
  • Get an ORCID: Share your ORCID with Priya.  This will help us track your publications resulting from access to the OMOP dataset.
  • Support: Documentation, trainings, slack channel and much more.

Get access to ATLAS cohort analysis tool

ATLAS is an open-source, web-based integrated software tool developed by the OHDSI community for database exploration, standardized vocabulary browsing, cohort definition, and population-level analysis on observational data that has been standardized to the OMOP Common Data Model. A public instance of ATLAS  is maintained by the OHDSI community and points to a subset of the Medicare Claims Synthetic Public Use (SynPUF) data in OMOP.

  • Stanford ATLAS supports STARR-OMOP-deid-lite, Optum-OMOP (from Population Health Science) and the SynPUF dataset.
  • A fully sponsored SUNetID is a minimum requirement to get access to Stanford ATLAS.
  • Log in the Stanford ATLAS with your SUNetID using the DUO authentication. Note: You will probably not be able to actually explore any cohorts- you will probably see an error. This step is necessary for the system to create a user id for you.
  • To gain access to STARR-OMOP-deid-lite dataset in ATLAS, sign Data Privacy Attestation (DPA). Click the new tab and in the “Project Type”, please select “STARR Nav deID data”. Do this step only if you haven’t signed STARR Nav DPA before. Note that “Nav” stands for Navigator and signifies a collection of pre-IRB clinical datasets. If you already have access to STARR-OMOP-deid data on BigQuery via Nero, you have already completed your DPA requirements.
  • To gain access to the Optum datasets in ATLAS, the user needs to have completed the requirements as required by PHS for Optum, and have access to Optum data on the Redivis platform. 
  • Once you have logged in, please email Priya Desai with “Requesting ATLAS access - STARR OMOP ” or “Requesting ATLAS access - OPTUM OMOP ” in the subject line along with your SUNetID.
  • You will receive an email with more information once you have been granted access.
  • Documentation: Review the step-by-step guide to getting access in the user guide. You need to be logged into Stanford network to access the contents of the guide. 
  • Support: Documentation, trainings, slack channel and much more.

STARR-OMOP and ALTAS online and hands-on learning

Also review the documentation section.
  • External training material by OHDSI community and EHDEN Academy:
    • ATLAS Tutorials: This is a set of video tutorials on the use and functionality of ATLAS and has been released by the OHDSI community. (link)
    • European Health Data and Evidence Network (EHDEN): EHDEN Academy has a number of training modules including, Extract, Transform, and Load (ETL) processes to go from raw observational data to the OMOP Common Data Model, OHDSI Tools, Deep diving into ATLAS with focus on phenotype definition, characterisation and evaluation, population-level effect estimation, and patient-level prediction. (link)
  • Research IT training:
    • We are now offering a clinical data science training program. The program is a series of four, 1-day hands-on tutorials with increasing degrees of complexity, to provide an introduction to modern data science tools and resources for analyzing the Clinical Data available to Stanford researchers. The focus of these tutorials are to familiarize researchers with the underlying data, tools, and resources. Check out the syllabus. Video recordings of all the 4 Tutorials have been uploaded to our Stanford Starr YouTube channel as playlists. Please visit us during our regular STARR OMOP/ATLAS Office hours- and  we will be happy to provide 1:1 help based on your specific questions.

Request Support

Once you have access to OMOP, you will also be added to starrdatauser slack channel. The slack channel is your primary mode of communication.
  • Slack channel: You can ask the community any STARR OMOP related questions on Research IT staff and Nero team monitors this channel. Your questions may be answered by other knowledgeable users. We also announce new releases, and features on the channel.
  • Code: Access to sample code in Stanford gitlab is made available to Stanford researchers with access to OMOP data. You get access to python notebooks using Synpuf( Medicare Claims Synthetic Public Use data files in OMOP CDM ver 5.3.1). Our goal is to help users gain familiarity with the OMOP CDM with publicly available data. The same code is applicable to STARR OMOP dataset.
  • Office hours: Office Hour details are available here. Office hour timings are also announced/updated on the slack channel.
  • Documentation: OHDSI and Research IT brings you online resources such as CDM specification, THEMIS rules, data dictionary, technical specifications and more.
  • Training: OHDSI community and Research IT brings you online and hands on resources.

Sharing Stanford patient data with non-Stanford entities

Process of sharing data with non-Stanford entities is complex. One or more of Stanford organizations will partner with you to support your research and data sharing needs. Due to the complexity of the landscape, we do not have a clear workflow, but we have compiled a list of resources for you.

  • Security and Privacy guidelines: Understanding security and privacy when working with patient data
  • Glossary: Terminologies, Stanford workflows, policies, and offices when working with patient data (or other similar High Risk data)

Request STARR-OMOP PHI data

You need concierge service for access to human subject data (IRB approval needed) or limited data set (an eProtocol needed). If you identify a cohort using a SQL query in STARR-OMOP-deid, the same cohort (now identified) can be extracted using the identical SQL query in STARR-OMOP. Once you have a cohort with identified MRNs, you can join the cohort with other data types e.g. radiology DICOM.

  • File an eProtocol if you want access to a limited data set.
  • File an IRB: Follow Research Compliance Office guidelines to file for an IRB
  • If sharing data with non-Stanford entities, meet data sharing requirements
  • If data is leaving Nero or Box environment, request a Data Risk Assessment.
  • STARR concierge service: Request a concierge service with Research Informatics Center, they are one of Stanford’s Honest Broker service providers.

Request concierge service

When self-service tools do not do what you need, please request concierge service. If you are not sure what is the next step, request concierge service. 

  • STARR concierge service: Request a concierge service with Research Informatics Center, they are one of Stanford’s Honest Broker service providers.

STARR-OMOP Acknowledgement

STARR is subsidized by SoM Dean’s Office. Your citation is important for continuation of these subsidies. If you use OMOP database, ATLAS Analysis tool, training, Data extraction concierge services via Research Informatics Center - please use the following wording: “This research used data or services provided by STARR, “STAnford medicine Research data Repository,” a clinical data warehouse containing live Epic data from Stanford Health Care (SHC), the Stanford Children’s Hospital (SCH), the University Healthcare Alliance (UHA) and Packard Children's Health Alliance (PCHA) clinics and other auxiliary data from Hospital applications such as radiology PACS. STARR platform is developed and operated by Stanford Medicine Research IT team and is made possible by Stanford School of Medicine Research Office.”

STARR-OMOP Publication

A new paradigm for accelerating clinical data science at Stanford Medicine, Datta S, Posada J, Olson G, Li W, O'Reilly C, Balraj B, Mesterhazy J, Pallas J, Desai P, Shah N,, Mar 2020. (link)