Electronic Medical Record (EMR) Data Models

OMOP

The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is an open community data standard, designed to standardize the structure and content of observational data and to enable efficient analyses that can produce reliable evidence.

By standardizing the capture of clinical information such as encounters, patients, providers, diagnoses, drugs, measurements and procedures, institutions who have implemented OMOP are able to participate in networked studies without sharing their data. By running the same queries on multiple sites, the results can be shared, while keeping the sensitive detailed data in-house.  

The most salient feature of the OMOP data model is that information is captured as much as possible in associated vocabulary terms. One simple example is diagnoses, referred to as 'condition occurrence' in OMOP. Rather than inserting the text of a diagnosis code into the condition occurrence table, the table instead contains references to the appropriate entries in the 'concept' table, where all vocabulary terms are documented.  The idea is that if all institutions implementing OMOP have the same set of concepts, and use the same mapping rules when linking to those concepts, you can run a query against any OMOP and get back results that should be similar enough to support a sound meta-analysis.

Concepts can have synonyms, and can also be related to one another. Ontology curation is a separate activity for OMOP sites wishing to augment the common data model. If your data does not map to existing ontologies, you can contribute your suggested enhancments to the consortium, and if approved and adopted, will be rolled out for use by all other sites.

For more information on Stanford's implemention of OMOP please see the STARR OMOP website.

Stanford In-House Data Model

Before OMOP became a popular standard, Stanford developed its own internal data model, originally referred to as STRIDE. This data model has been in use at Stanford since 2008.

While at first we endeavored to map Stanford data to controlled clinical vocabularies, over time we increasingly favored supplying the original clinical data to research projects. We do offer mappings of drugs to RxNorm and of lab results to LOINC codes, but that is the extent to which the in-house data model has vocabulary mappings.

OMOP v Stanford In-House

Both data models contain data from both Adult (SHC) and Childrens (SCH) Hospitals, with some data elements (notably billing codes) going back to the late 90s, suitably filtered for research purposes.

If you plan to either collaborate with other OMOP institutions or wish to publish your data algorithms for validation and verification by others, you should choose OMOP.

If on the other hand you are conducting a Stanford-internal project, you will likely find that the in-house data model bears a closer resemblance to the source system data and will be easier to work with.