In an interview, computational biologist Tina Hernandez-Boussard discusses analyzing the value of electronic health records as a source of information in the clinic.
September 3, 2019 - By Hanae Armitage
Electronic health records have paved the way for a new approach to biomedical research: the analysis of real-world data. The term “real-world data” comes from the idea that EHRs aren’t just health histories of single individuals; they’re a trove of information that can help guide clinical decisions, from diagnosis to treatment.
In 2016, the 21stCentury Cures Act became federal law. It helped bring about a new focus on harnessing real-world data in medicine, as opposed to letting clinical trials be the sole guiding source. Today, doctors and scientists are increasingly turning toward algorithms trained on EHRs to guide diagnoses and determine treatment options for patients.
The problem is that EHRs can be quite messy, making data extraction complex, said Tina Hernandez-Boussard, MD, PhD, associate professor of medicine, of biomedical data science and of surgery at the School of Medicine. So she and a team of scientists set up a case study to determine how accurately algorithms trained on real-world data can predict the answer to medically relevant questions. In the case study, Hernandez-Boussard and a team of scientists posed a simple question: How well can an EHR-trained algorithm predict if a patient has a cardiovascular condition?
In a paper published Aug. 15 in the Journal of the American Medical Informatics Association, Hernandez-Boussard, lead author of the study, shows that the answer is very well — and also not so well. It turns out the accuracy of the prediction depends on how the data is organized and gathered.
Science writer Hanae Armitage spoke with Hernandez-Boussard about the value of real-world data, how it fits into the current clinical care landscape and what her study revealed about harnessing this immense dataset.
1. What’s the most important contribution of real-world data to medical decision-making?
Hernandez-Boussard: It’s really the generalizability that’s key; pulling from real-world data provides information that’s representative of the way most the population receives their care. Real-world data doesn’t have strict inclusion or exclusion criteria, so it means that any clinical assertions made with that data can apply to a typical patient seeking routine care.
Very often “gold standard” clinical trials are looking at a particular drug for a particular disease, and patients can end up excluded if they have conflicting comorbidities, some of which can be quite common. So say, for example, there’s a trial for a cancer drug; it might exclude a patient who has hypertension or diabetes. But in reality, the patients with hypertension and diabetes might be the ones who really benefit from the drug. And so looser inclusion and exclusion criteria enables us to see how this drug works, or doesn’t work, in much broader populations and subpopulations of patients.
2. Your team set up a case study to evaluate real-word data in the context of cardiovascular medicine. What did the study investigate and what were the most significant takeaways?
Hernandez-Boussard: With our case study, we set out to see if and how real-world evidence could be used to guide clinical assertions, such as identifying a patient with a particular disease or guiding treatment options. It seems reasonable to expect that real-world data could help inform those clinical assertions, but this data largely comes from electronic health records and insurance claims. They’re not intended to guide care for others; they’re meant to capture information for billing and patient history purposes, and they’re really complex and messy. So, any data we pull out of them has to be done with great caution. We wanted to assess how accurately EHR-based algorithms are when predicting a cardiovascular condition.
In general, we saw that people using this EHR data are mostly accessing something called “structured data,” which are things like vitals and medical codes that denote different diseases. We tend to think of this data as more regularly curated, and therefore more accurate, but our case study showed us that the most rich information is actually in the clinical narrative text, or unstructured data — for example, the free text notes that a clinician takes during a patient visit.
So in our study we found that we were more accurately able to identify different aspects of the population — like diseases or procedures they’d had done — when using just the unstructured data. So for example, we wanted to see if we could identify the population of patients who had coronary artery disease. When we used structured data we could do so with 80% accuracy. But when we use the unstructured data, that jumped up to 95%.
When we think about how clinicians and patients interact, the clinician is an active listener and note taker, capturing much of their conversation in written narrative form. That’s where they’re likely able to provide the most details about the patient, as opposed to structured data, which fall more into the numbers and codes category. So we end up getting richer, more accurate information when we just analyze free text.
3. What were the limitations of using real-world data in your case study?
Hernandez-Boussard: I would say that the technology to actually run the algorithms that harness real-world data is, in many ways, a limiting factor. For this study, we used state-of-the-art artificial intelligence technologies. And to be able to harness real-world data effectively is a key part, but not everyone has access to it or the expertise to conduct that kind of analysis. The second limitation is actually having access to data on a large scale while being compliant with privacy laws. While we were able to conduct our study with the data set from Verantos, a collaborator on the study, data silos are a very real problem, and it’s not always easy to gain access to such a large amount of data that encompasses different health care settings from different geographical locations. These diverse datasets are important to address the generalizability of the technologies.
4. Why is there an increasing trend toward using real-word evidence instead of, or in conjunction with, the clinical trial data to inform medical decisions?
Hernandez-Boussard: There are a handful of reasons. First, clinical trials are pricey — we’re talking millions and millions of dollars — and they only encompass a very small portion of the population. Real-world data repurposes EHRs to guide clinical care at a fraction of the cost. The second big criticism of clinical trials is that they’re often not broadly generalizable. The patients that are included in trials often do not fully represent the range of patients that could benefit from a new drug or therapy.
Third, these clinical trials are highly controlled. Patients come in for an appointment or treatment and in, say, exactly two weeks they must come back for a follow-up or their next dose of medication. But that’s not really the way that patients receive care. In reality, patients aren’t always able to stick to a regimented calendar. Maybe they’re on vacation and need to wait three weeks until they schedule their next appointment.
5. How do you think real-world evidence can best fit into a clinical care context?
Hernandez-Boussard: Clinical trial data still provides the highest level of certainty for guiding clinical care. But if, for example, you run a clinical trial, and end up with a very low representation of a specific subpopulation, how would you know how well that drug works in that subpopulation? And that’s where we think real-world data could fill in a gap.
We’re not suggesting that real-world evidence should replace clinical trials by any means — clinical trials are still held as the gold standard. Instead, we see it as more of a hybrid situation, with patient care benefiting from both sides.
About Stanford Medicine
Stanford Medicine is an integrated academic health system comprising the Stanford School of Medicine and adult and pediatric health care delivery systems. Together, they harness the full potential of biomedicine through collaborative research, education and clinical care for patients. For more information, please visit med.stanford.edu.