Scientists consider potential of abundant biomedical data

- By Bruce Goldman

Norbert von der Groeben Anne Wojcicki, speaking at Stanford's BigData Conference

Anne Wojcicki, co-founder and CEO of the personalized genomics company 23andme, spoke at the conference.

A “tsunami of digital data” now surges through medical research and health care, Lloyd Minor, MD, dean of the School of Medicine, told a packed auditorium May 22 at the start of the Big Data in Biomedicine Conference at Stanford.

“We have electronic patient records, DNA-sequencing data, comprehensive biological data on disease mechanisms, treatment monitoring reports, clinical trial results, pharmaceutical records, disease registries, and the list goes on,” Minor said. The goal of the conference, held May 22-24 in the Li Ka Shing Center for Learning and Knowledge, was to harness the power of that tsunami. The conference was presented by Stanford Medicine and the University of Oxford, and sponsored by the Li Ka Shing Foundation.

Minor described the undertaking as “a challenge so big and so complex that single individuals or companies or institutions cannot solve it alone.” It would, he said, require a collaboration between academia, philanthropy, industry and government.

More than 40 speakers from across the United States and several foreign countries brainstormed ways to improve health care by using “big data.” “If cars had made as much progress as computers over the past several decades, you’d be able to drive across the country for a dime in one of them and then pack it up and stick in your shirt pocket,” Stanford University President John Hennessy, PhD, a computer scientist, told the audience of roughly 300 people in the Li Ka Shing Center’s Berg Auditorium. (Another 400 people watched parts of the conference online.)

Vast increases in data-processing capacity have coupled with accelerated data-transmission capability to make possible prospects for improving patients’ compliance, providers’ diagnostic and therapeutic marksmanship and researchers’ ability to tease apart causality from mere correlation. “The amount of data being generated worldwide each year now falls in the zettabyte range,” said Atul Butte, MD, PhD, chief of systems medicine and associate professor of pediatrics and of genetics, and the conference’s principal organizer. The prefix “zetta-“ refers to the number 1 followed by 21 zeroes.

This data includes studies of substances and gene activity in healthy and diseased blood and tissue samples, as well as analyses showing the effects of drugs on such samples. Butte described one set of studies conducted in his lab as “a kind of for medical molecules”: Exploiting publicly available databases, his team found several instances in which a drug’s effects on gene activity in a particular tissue was the opposite of the changes in gene activity wrought by a specific disease. Never mind that this drug had never even been previously considered as a therapy for that disease: Pairing the drug and disease off produced promising therapeutic results. Moreover, the candidate drugs are  off-patent — therefore potentially cheap — and well-studied — therefore known to be safe — accelerating their potential progress through the development mill.

As many as 1.2 million “gene-expression analyses,” in which each gene in the genome is assessed by a high-tech, high-speed device as to how active it is, are already stored on the Internet, Butte said, and that number doubles every two to three years. “Instead of dissecting a frog, any high-school kid today who needs to do a science project can go to a public National Center for Biotechnology Information database, type ‘breast cancer’ in the appropriate field, and pull down 37,000 digital samples of breast cancer as easily as she could find a song on iTunes.”

Norbert von der Groeben Carlos Bustamante, speaking at Stanford's BigData Conference

Carlos Bustamante, professor of genetics, said that most genetic variants are rare. Learning about the existence, let alone the significance, of such variants requires large-scale studies, he said.

Nor is digitally accessible biomedical data limited to scientific studies. The United States is rapidly moving to a time when 100 percent of medical providers use electronic health records, Butte said. More than half of U.S. doctors have already made the switch to EHRs, according to a recent report.

“An entire revolution is coming from us measuring ourselves,” Butte told the crowd, noting that by using one such gadget to tally his caloric intake and exercise output, he has dropped more than 3 units in body mass index, a measure of whether a person’s weight is healthy for his or her height. Such gadgetry, which can also transmit personal medical data, such as pulse rate and blood sugar levels, in real time will become increasingly useful to medical practitioners and researchers, Butte said. “We don’t live with our patients. We get to see them only in the hospital or when they come for a visit to the doctor’s office. How can we track these patients? How can we keep them compliant?”

With patient medical records, census data, environmental samplings and more increasingly available, a big challenge will be to integrate those disparate sources. One speaker, John Bell, MD, of Oxford, referred to this challenge as “big data in the wild: not necessarily well-controlled, carefully collected or consistently organized.” Bell highlighted the need to create “safe data havens,” where data can be stored and retrieved for research purposes under conditions that absolutely ensure patients’ privacy — a precondition for obtaining their buy-in — and to give patients a sense that this research will pay dividends to them in the form of a more efficient health-care system and fewer adverse drug effects, for example.

Another challenge is cost. Carlos Bustamante, PhD, professor of genetics and a conference organizer, explained that most genetic variants are rare, occurring in perhaps one of every 3,000 individuals. Learning about the existence, let alone the significance, of such variants requires large-scale studies. “But epidemiological studies of 1 million people are extraordinarily expensive. To do 50 million or 200 million is all the more so.”

But there may be a way around that. Keynote speaker Anne Wojcicki, co-founder and CEO of the personalized genomics company 23andme, described her business’s success in connecting people with their genetic data, which she said has often been discouraged in the past.

“Keeping you from accessing your genetic data is like telling you that you can’t look in a mirror,” Wojcicki said. 23andme’s database now numbers 250,000 genotypes, each labeling the results of 1 million separate searches at specific spots on an individual’s DNA for genetic variants. 23andme hopes to have logged 1 million such genotypes in its database by year end, she said. These individuals’ genomes, in the aggregate, constitute a vast source of data for low-cost, large-scale studies.

People who have been genotyped, Wojcicki said, are remarkably willing to share their personal data if they can be sure that they’re still in control of who, exactly, makes use of it; if privacy is ensured; if it’s really going to do some good; and if they’re going to get feedback about the results of the studies in which their data plays a part. Wojcicki offered several examples of customers’ willingness to enroll in studies, answer survey questions and even, in some cases, submit to biopsies.

Wojcicki asserted that knowing about your genetic predispositions can change your behavior, potentially also reducing national health costs. Noting that her husband, Google co-founder Sergei Brin, carries a gene variant predisposing him to increased risk of Parkinson’s disease, Wojcicki said this has spurred their family to take steps to mitigate the risk via lifestyle changes. “We drink more coffee, which many studies have found to be protective. We exercise all the time.”

Michael Snyder, PhD, professor and chair of genetics, is another beneficiary of detailed research on his genome. He recounted the story of his voluntary submission to an ongoing analysis of the relative activity levels of each of his genes, myriad proteins and other blood-borne substances as part of a scientific experiment to determine the value of such intensive personalized analysis. “I’ve been getting profiled for more than three years now and had 60 samples taken, followed by the measurement of billions of different molecules,” he said. Revealed in the course of this series of assessments was Snyder’s blood-glucose level’s climb, shortly after a viral infection, into the diabetic range. Catching this medically significant development early on, modifying his diet and boosting his exercise regimen have resulted in the return of Snyder’s blood-glucose levels to the normal range — without any need for ongoing drug therapy, he said.

Norbert von der Groeben description of photo

Audience members packed Berg Hall, in the Li Ka Shing Center.

Euan Ashley, MD, PhD, assistant professor of cardiovascular medicine and one of the scientists monitoring Snyder’s molecules, discussed the difficulty inherent in ultra-detailed personalized analysis. “Most people are average in most things,” he said. “What you want to know is where people are not average. ... You’re looking for not needles in haystacks, but needles in stacks of needles.” With 6 million data points in each person’s genome, accuracy is an imperative. Fortunately, accuracy is increasing as costs are plummeting. Ferraris, Ashley pointed out, cost upward of $300,000 each. If that brand’s cost had dropped as much as that of gene sequencing has, you could get one for 40 cents.

Among this conference’s final announcements: the next one, in what is expected to become annual event, is already scheduled to take place in the same building on May 21-23, 2014.

Save the date.


About Stanford Medicine

Stanford Medicine is an integrated academic health system comprising the Stanford School of Medicine and adult and pediatric health care delivery systems. Together, they harness the full potential of biomedicine through collaborative research, education and clinical care for patients. For more information, please visit

2024 ISSUE 1

Psychiatry’s new frontiers