Mining consumers' web searches can reveal unreported side effects of drugs, researchers say

- By Sarah Williams

David Miklos

Russ Altman

Researchers at the Stanford University School of Medicine and Microsoft Research have revealed that the Internet search history of consumers can yield information on the unreported side effects of drugs or drug combinations.

By analyzing 12 months of search history from 6 million Internet users who consented to share anonymized logs of their Web searches for research purposes, the team was able to pinpoint an interaction between two drugs that was unknown at the time of data collection.    

"Seeking health information is a major use of the Internet now," said co-author of the new paper Russ Altman, MD, PhD, Stanford professor of bioengineering, of genetics and of medicine. "So we thought people are likely typing in drugs they are taking and the side effects they are experiencing and that there must be a way for us to use this data."

The study was published March 6 in the Journal of the American Medical Informatics Association.

The goal of this and previous research is to find fast, accurate methods of determining when a drug or combination of drugs cause unexpected side effects in some patients. The U.S. Food and Drug Administration encourages physicians to report any possible side effects through the agency's Adverse Event Reporting System. Such reporting is voluntary, however, and relies on a patient or a physician noticing that something unusual has happened.

Altman's lab group had previously studied whether it was possible to comb through data from FDA reports to discover drug-drug interactions in an automated way. Using their data-mining methods on the FDA reports, the group reported in May 2011 that it had found a never-before-reported side effect of combining paroxetine, an antidepressant medication, and pravastatin, a cholesterol-lowering drug. When a patient was taking both paroxetine (marketed as Paxil) and pravastatin (marketed as Pravachol or Selektine), the researchers found that the patient's risk of developing hyperglycemia — high levels of blood glucose — was greater than the risk of hyperglycemia from taking either drug individually.

Altman and his colleagues wondered whether they could have pinpointed the side effect any other way, since the adverse-event reports they relied on for the original discovery are only generated when a doctor takes the initiative and believes the side effect warrants reporting.

"Historically, it's been really hard to detect synergistic effects of drug combinations that aren't necessarily side effects of any of the drugs alone," Altman said.

Public Internet search history had previously been used to track flu outbreaks - a 2010 paper concluded that looking at the location and frequency of flu and flu-symptom-related searches was as accurate at following the flu's spread as the hospital-based-tracking methods used by the U.S. Centers for Disease Control and Prevention. To see whether a similar approach could work for detecting drug interactions and side effects, Altman teamed up with Nigam Shah, MBBS, PhD, assistant professor of medicine at Stanford. They collaborated with Eric Horvitz, MD, PhD, distinguished scientist and managing co-director at Microsoft Research; senior Microsoft researcher Ryen White, PhD, who is the study's lead author; and a colleague at Columbia University.

The Microsoft team developed automated tools for mining anonymized data from 82 million drug, symptom and condition searches performed by 6 million Internet users who had agreed, when they installed a Microsoft browser plugin, that the company could use their search history for research purposes.

David Miklos

Nigam Shah

White said the team used the automated tools to identify searches for information on paroxetine, pravastatin or both during 2010. The tools then computed the likelihood that users in each group would also search for hyperglycemia — or almost 80 of its symptoms or descriptors, such as "high blood sugar," "blurry vision," "frequent urination" or "dehydration."

"We really had to take into consideration this difficulty in predicting people's language," said Altman. "We could miss things because, through no fault of their own, the public doesn't know medical jargon."

Among people who searched for the drug paroxetine or its brand names in 2010, about 5 percent also searched for one of the hyperglycemia-related terms. For pravastatin and its brand names, the rate was below 4 percent. But for those who searched for both drugs, suggesting that they might be taking both drugs, the search rate for hyperglycemia was 10 percent.

To test the accuracy of the search engine analysis, the team looked at 31 drug-drug interactions already known to cause hyperglycemia, and 31 interactions known to be safe. Overall, the drugs with known interactions led to more search queries on hyperglycemia. But the results also suggested that around 12 percent of users searching for drug combinations known to have no interactions also had an unusually high rate of hyperglycemia searches, which would lead researchers down dead ends if they pursued them.

"We were surprised how good the signal was," said Shah. "The challenge now is to figure out what application this has in continuous monitoring for such side effects."

One way to improve this false-positive rate, Altman and Shah agree, is to combine the search history data with other sources of data — social media, patient support forums and information from medical records and doctors.  Shah and his team are already studying how to search in an automated way through anonymized versions of patients' electronic medical records to find drug interactions. If the mining of consumer search engine data is combined with other information — such as reports to the FDA and searches by doctors on professional medical search programs — it could provide lists of potential drug-drug interactions for researchers to look into more closely through traditional clinical trials, they say.

"If we cross-reference multiple data sources, then we can triangulate based on what doctors and patients are both concerned about," said Shah.

The search data will always be messy, Shah admitted, since an Internet user's search history doesn't tell the complete story. Users could perform one search on their own symptoms, and the next on a symptom or drug related to someone else in their household, for example. In addition, a news story on a known or suspected drug-drug interaction could lead to excessive searches on that side effect, artificially inflating the results. But even if the data are messy, he said, enough messy data — like millions of search records — can reveal directions for researchers to pursue.

"I believe patients are telling us lots of things about drugs, and we need to figure out ways to listen," said Altman. "This is just one way of listening and one application."

The study was funded in part by the National Institutes of Health (grants HG004028 and GM61374). Information about Stanford's Department of Medicine, which also supported the work, is available at

About Stanford Medicine

Stanford Medicine is an integrated academic health system comprising the Stanford School of Medicine and adult and pediatric health care delivery systems. Together, they harness the full potential of biomedicine through collaborative research, education and clinical care for patients. For more information, please visit

2024 ISSUE 1

Psychiatry’s new frontiers