May 23, 2014 - By Bruce Goldman
Bioinformatics expert Atul Butte leads a panel discussion on the diverse sources of data at the Big Data in Biomedicine Conference on the Stanford campus.
Massive, ongoing advances in computational processing power and interconnectedness are already changing the way medical research is done. But even more-disruptive outcomes — including changes in the very practice of medicine at the day-to-day clinical level — lie just ahead.
That was the message conveyed by several speakers at Stanford's second annual Big Data in Biomedicine Conference, held on campus May 21-23.
"We're all here because we believe in the vast potential of technology, data and biomedicine to transform human health for the 21st century," Lloyd Minor, MD, dean of the School of Medicine, told an audience of close to 500 people at Stanford's Li Ka Shing Center for Learning and Knowledge and more than 1,000 virtual attendees who live-streamed the event.
Presented by Stanford Medicine and the University of Oxford and sponsored by the Li Ka Shing Foundation, the conference featured more than 60 speakers and panelists, including Stanford biologists and data scientists, researchers from universities around the world, and government and industry professionals.
Minor challenged the audience to rise to the challenge of harnessing computer technology, biomedical informatics and social media — collectively known as big data — to benefit clinical practice. "Data is not just numbers," he said. "It is also clinical notes and MRI scans. Gather as much data as you can, and let the data find patterns for you, before you instruct it on what patterns to look for."
Lloyd Minor, dean of the School of Medicine, gave introductory remarks at the conference.
Faces and cats
David Glazer, director of engineering at Google and one of several keynote speakers at the conference, made the pattern-recognition capabilities of computers palpable by describing how computers, which in the human sense understand absolutely nothing, can fish out recurring patterns from oceans of raw data.
When Glazer and his colleagues fed 10 million random YouTube videos to a computer network and told it to look for patterns, the network complied with one: a rendering of a generic human face — the most frequently occurring sequence of 1s and 0s in the cacophony the network scanned. (The second-most-frequent pattern? Why, the face of a cat, of course.)
When you add a few zeroes to the computer power you're willing to throw at a problem, "a simple, boring algorithm starts to work pretty well," Glazer said.
But it isn't just the ability of machines to find patterns. Several speakers throughout the three-day conference showed how computers had led them down unexpected paths.
Treasure in the trash
Stanford assistant professor of biochemistry Julia Salzman, PhD, described her discovery of a new and probably significant biological entity by giving her computers a long leash. "A huge amount of data is being thrown in the trash because the data don't fit our sense of what they should look like."
Close to 500 people attended the conference at Stanford's Li Ka Shing Center for Learning and Knowledge. More than 1,000 followed the conference online.
For instance, RNA is best known as a lengthy, linear, information-coding, intermediary substance that is analogous to DNA but — unlike its nucleus-bound counterpart — is free to float throughout the cell, where it can instruct massive, molecular machines to assemble specific sequences of raw material into one of that cell's myriad proteins. Other, profoundly different functions have also been identified for RNA in recent years. But still, Salzman said, a substantial portion of all RNA is ignored in most analyses.
"We looked more carefully at this data that was being thrown into the trash," she said.
Using computational pattern-recognition software, her team discovered numerous instances in which pieces of RNA that normally are stitched together in a particular linear sequence were, instead, assembled in the "wrong" order (with what's normally the final piece in the sequence preceding what's normally the first piece, for example). The anomaly was resolved with the realization that what Salzman and her group were seeing were breakdown products of circular RNA — a novel conformation of the molecule.
"This has been overlooked. The textbooks are incomplete," Salzman said.
In its circular form, she noted, an RNA molecule is much more impervious to degradation by ubiquitous RNA-snipping enzymes, so it is more likely than its linear RNA counterparts to persist in a person's blood. Every cell in the body produces circular RNA, she said, but it seems to be produced at greater levels in many human cancer cells. While its detailed functions remain to be revealed, these features of circular RNA may position it as an excellent target for a blood test, she said.
Aiding food-safety efforts
Another keynote speaker was Taha Kass-Hout, MD, chief health informatics officer at the Food and Drug Administration. "The FDA has made a large bet on Big Data," he said. As an example, he described FDA efforts to bring computational power to bear on food safety.
From left, Colin Mahony of HP joined Stanford's Stephen Quake, Julia Salzman and Michael Snyder on a panel moderated by Atul Butte.
"If you eat a salad, you're pretty much a global citizen," said Kass-Hout, noting that the ingredients of a typical salad may travel halfway around the world to get to our table. Unfortunately, the well-traveled salad can pick up a host of microbial free-riders en route. Over the last year the FDA has assembled a publicly accessible database holding the genomic sequences of more than 5,000 food-poisoning culprits such as Salmonella and listeria, he said.
In a new initiative, the FDA has been monitoring social media to enhance its surveillance capabilities. "Maybe we'll find that we can detect outbreaks earlier that way," he said. It may also be possible, using these methods, to draw inferences about beneficial or adverse effects of drugs prescribed for indications other than the ones for which they've been specifically approved. This could expedite new uses for existing drugs.
Michael Snyder, PhD, professor and chair of genetics at Stanford, recapped his first-person experience with the power of using a lot of multiple-parameter measurements — of the RNA and proteins produced by his cells, of metabolites and signaling molecules in his blood — to personalize medicine. (His experience is described in a paper published in 2012 in Cell.)
One of the biggest remaining lacunae in society's burgeoning body of biomedical data, Snyder said, is the accurate recording of what he called the "exposome": an individual's diet, exercise, infectious diseases and physical traumas, as well as social, educational and occupational experiences. But several speakers at the conference described advances in wearable-sensor technologies that may make that a moot concern.
For his part, Snyder offered a new twist on a more traditional technology. "In my world, there would be a toilet that measures everything that comes out of you. It's harder to track everything that goes into you, but we have ways of approaching that, too."
Stanford Medicine integrates research, medical education and health care at its three institutions - Stanford School of Medicine, Stanford Health Care, and Stanford Children's Health. For more information, please visit the Office of Communications website at http://mednews.stanford.edu.