OHDSI 2020 Symposium

Exploration, learning and participation

Research IT presented at the OHDSI 2020 Symposium and participated in a study-a-thon.

Oct 30, 2020: Jose Posada, Ph. D., Sr. Clinical Data Scientist from Research IT and a member of TDS Data Science team presented on Research IT's clinical text de-identification method, TiDE at the symposium collaborator showcase. De-identified clinical text data is an essential need in modern clinical informatics research. The cloud-based TiDE produces high-quality and cost efficient de-identified clinical text. His talk was one of the 12 lightning talks on data standards and methods research. There were more than 100 presentations of OHDSI research and collaboration at this year’s collaborator showcase. The TiDE pipeline is part of Stanford's STARR-OMOP portfolio.

Our work builds on the efforts led by the OHDSI NLP working group, which created the specifications for the OMOP-CDM v5.3.1 containing the NOTE and the NOTE_NLP tables. The former is the primary data source used in this work. Our work is complementary to efforts aimed at processing the note content to recognize mentions of biomedical concepts, and aims to facilitate broader sharing of clinical notes content for use with NLP tools.

TiDE: Open Source Text de-identification Pipeline

Abstract: TiDE combines a mix of pattern matching techniques and machine learning-based named entity recognition to find protected health information as well as techniques such as Hiding in Plain Sight as an additional privacy enhancement strategy. TiDE is built from easily accessible best-in-class methods deployed in cloud architecture, and is computationally resource-intensive yet cost efficient. TiDE can process approximately 100 million clinical notes in roughly ~7hr  by deploying 800 dataflow workers in parallel at a total cost of $440 USD. The total processing time translates to 0.00025s/note which is 3 orders of magnitude less than the recently reported fastest process (0.24s/note) by Heider et al.

Study-a-thon: atherosclerotic cardiovascular disease

Stanford also participated in a virtual Study-a-thon following the 2020 OHDSI Symposium. The Data Quality Assessment team included the Research IT team and was tasked with determining the data quality of participating databases and their fitness for use in two cardiovascular clinical prediction models routinely used in prevention of atherosclerotic cardiovascular disease. STARR-OMOP database quality met or exceeded the requirements based on the “DataQuality Dashboard’ package that runs 3000+ tests for conformance, plausibility, and completeness.  

In the coming months, the study teams will focus on two cardiovascular clinical prediction models (CPM) routinely used in clinical practice: the Revised Cardiac Risk Index (RCRI) and the Pooled Cohort Equations. The RCRI is used to predict 30-day risk of cardiovascular complications amongst patients undergoing non-emergent surgical procedures. The Pooled Cohort Equations predict 10-year risk of atherosclerotic cardiovascular disease among adults without pre-existing cardiovascular disease. The community will collaborate to implement these two existing prediction models against the OMOP Common Data Model using the OHDSI PatientLevelPrediction package. 

A Stanford research team is participating in this collaboration. Our new research CDW, STARR-OMOP, launched a year ago, was  previously used in a COVID-19 characterization study and published in Nature Communications. You can look at Stanford summary data on the interactive website.

Other highlights from OHDSI 2020 Symposium

Highlights of OHDSI 2020 symposium brought to you by Jose Posada:
  • Network studies Stanford is participating in right now
  • The growth of OHDSI community
  • OHDSI symposium panel on "Building Trust: Evidence and Communication"
  • Stanford speakers at the symposium
  • New tools and resources since last symposium