OHDSI 2021 Symposium

Exploration, learning and participation

Research IT presented at the OHDSI 2021 Symposium and participated in a study-a-thon.

Oct 30, 2021: Jose Posada, Ph. D., and Priya Desai presented two posters to showcase Research IT work.

Linking Analysis Ready Multi-modal Clinical data

Priya Desai, Somalee Datta

Abstract: STAnford medicine Research data Repository or STARR, is a research ecosystem that contains a collection of linked research ready data warehouses from disparate clinical ancillary systems and a secure data science facility. The ecosystem is designed on the principles of Data Commons and contains reusable data processing pipelines, cohort and analysis tools, training, user support and much more. STARR data currently includes electronic medical records data, clinical images (radiology, cardiology) and text, bedside monitoring data, and near real time HL7 messages. Processed, “analysis ready” linked data is available for to all Stanford researchers in a “self-service” mode and currently consists of:

De-identified Electronic Health Records (EHR) from the two Stanford hospitals and clinics in the OMOP Common Data Model (CDM).
De-identified bedside Monitoring (Waveform) data from Stanford Children’s Hospital

Linked patient data in the ecosystem are primarily anchored using person_id, the auto generated identifier for the patient in the CDM from the OHDSI community. When the data is refreshed, the person_id stays stable. Other data such as imaging metadata from radiology (including MRI’s, X Rays, ultrasounds and CT scans), and cardiology are coming soon. These analysis-ready datasets reside in BigQuery, a cloud based data warehouse that leverages the infrastructure of the Google Cloud Platform and offers rapid SQL queries and interactive analysis of massive datasets.

Link to summary and the poster

ATLAS with a BigQuery backend running Execution Engine – a Software demo

Jose Posada, Priya Desai, Konstantin Yaroshovets, Gregory Klebanov

Abstract: Stanford has adopted an ecosystem view of the modern clinical research tools. Built on the foundation of STRIDE, the ecosystem has since expanded to STAnford medicine Research data Repository (STARR) ecosystem. The overall design principles of the ecosystem are based on Data Commons and includes compute and storage infrastructure, data lake, data warehouses, data processing pipelines, APIs, tools, user training, and support. Our overarching goal is to streamline science for researchers.

Backbone of the STARR ecosystem is STARR-OMOP, an analytical clinical data warehouse that uses OMOP Common Data Model. One of the reasons for Stanford to choose OMOP was OHDSI in its entirety, not just the data model, we wanted the tools, the network, the community. Another critical part of our ecosystem is our data center. The compute and storage infrastructure has grown from on-premise data center to embrace cloud, not just for its larger storage and compute capacity, but also for specialized solutions. One such specialized solution is Google BigQuery, a managed distributed data warehousing solution. Stanford had previously implemented Google Cloud BigQuery for a Big Data genetics initiative , so it was natural to try BigQuery for STARR-OMOP. BigQuery brings two very significant features, one is the fact that it is a managed service and unlike traditional databases, it doesn’t require DBA tinkering for performance. It is performance out-of-the-box. The data engineering team can focus on data standardization, completeness and quality instead of indexing, sharding, and scaling. The second big feature is the data science friendly APIs. Researchers can use their laptops or HPC environments to use their Jupyter Notebooks and never really get out of the tools they do data science with.

In a previously published manuscript, we show that ATLAS benchmarking suite using SynPUF runs 3 to 10x faster on BigQuery when compared to PostgreSQL (Manuscript, Supplementary Table S9.3). We also show that Achilles queries run in ATLAS using STARR-OMOP data present near real time user experience. Out of 725 total queries available in Achilles, 660 queries took less than 17 seconds, and median execution time was 3 sec (Manuscript, Supplementary Table S9.1). While direct or API based SQL query using BigQuery is highly performant, the OHDSI toolkits do not directly use BigQuery. Instead, the tools use shared libraries such as DatabaseConnector, and SQLRenderer that translate the query to BigQuery SQL dialect. Optimization of the OHDSI toolkits to run on BigQuery is a journey we embarked on nearly two years ago. This journey has since led to successful deployment and utilization of ATLAS at Stanford. We have also embraced the execution of ATLAS PLE and PLP analyses through ARACHNE Execution Engine. The engine allows us to fully execute estimation and prediction studies right inside ATLAS. This presentation will demonstrate Stanford ATLAS running on top of STARR-OMOP including the ARACHNE Execution Engine.

Link to summary and demo

View All News Articles

ResearchIT-2021-year-in-review

Jan 15, 2022: With shelter-in-place continuing, Research IT team has settled down in the new remote work cluture.
OHDSI-2021-Symposium

Oct 30, 2021: Jose Posada, Ph. D., and Priya Desai presented two posters to showcase Research IT work.
Manuscript on flowsheet data integration in STARR

Sep 21, 2021: We mapped several vital flowsheets and integrated the mapping in OMOP measurements table.
Manuscript on integration of Beside Monitoring data with STARR

Sep 21, 2021: For the several years, Research IT has helped LPCH archive pediatric bedside monitoring data.
Research IT Summer Internship

Sep 17, 2021: Ms Sonam Welekar joined us for a period of three months for an internship led by our NLP expert Dr.
A new SHC Clarity for SoM

Aug 21, 2021: For the last dozen years, Research IT had maintained a parallel workflow in Stanford School of Medicine (SoM) datacenter for generating a copy of SHC Clarity for research use.
Zipcode in de-identified OMOP data

May 17, 2021: In collaboration with University Privacy Office (UPO), Research IT has made five digit zipcode available in the de-identified STARR OMOP data.
Featured on Google higher education site

March 1, 2021: Our STARR-OMOP portfolio was highlighted on Google Higher Education website.
STARR Informatics Summit 2021

Priya Desai, R&D Manager Biomedical Informatics, Research IT, Innovation and Translation (IaT) at TDS, and Nigam Shah, Professor of Biomedical Informatics, and Data Science at IaT are inviting you to the first ever STARR informatics summit.
ResearchIT-2020-year-in-review

Dec 15, 2020: With shelter-in-place, research pace hasn't reduced. If anything, COVID-19 may have accelerated the pace of research and increased the sense of urgency.
Clinical Data Warehouse reimagined
OHDSI-2020-Symposium

Oct 30, 2020: Jose Posada, Ph. D., Sr. Clinical Data Scientist from Research IT and a member of TDS Data Science team presented on Research IT's clinical text de-identification method, TiDE at the symposium collaborator showcase.