Stunning diversity of gut bacteria uncovered by new approach to gene sequencing devised at Stanford

The many microbes living in our intestines are far more diverse than once suspected, a new genomic technique reveals.

Researchers have overcome some of the limitations of current sequencing technology to better measure the diversity of bacteria in the human gut.
Christos Georghiou / Shutterstock

A collaboration between computer scientists and geneticists at Stanford has produced a novel technique for mapping the diversity of bacteria living in the human gut.

The new approach revealed a far more diverse community than the researchers had anticipated. “The bacteria are genetically much more heterogeneous than we thought,” said Michael Snyder, PhD, professor and chair of genetics.

Any two humans typically differ by about 1 in 1,000 DNA bases, whereas bacteria of the same species may differ by as many as 250 in 1,000, Snyder said. “I don’t think people realized just how much diversity there was. The complexity we found was astounding,” he said.

In the past, researchers could only study bacteria that would grow in the lab. But the vast majority of bacterial species will not grow on traditional culture medium. As a result, the true diversity of bacteria — not only in the human gut but throughout the living world — has remained largely unexplored.

In recent years, a genomics approach has begun to reveal diverse communities of new bacterial species growing nearly everywhere biologists have looked. Modern gene sequencing has tantalized biologists with hints of bacterial worlds as biodiverse as any tropical rain forest. Yet the limitations of current technologies have created only a blurry picture and prevented researchers from seeing all that is there.

Of particular interest are the bacteria that live in our intestines. Some communities of bacterial species in the gut have been associated with good health, others with any of a long list of conditions — including obesity, Type 2 diabetes, bowel disease and liver disease. And some are outright pathogens that can sicken and even kill, such as certain strains of E. coli or the bacterium that causes cholera. Given their importance to human health, the ecological communities of bacteria that live inside us and on our skin have come under increasing scrutiny.

A Stanford team has overcome some of the limitations of current sequencing technology to create a sharper picture of the bacterial community, or microbiome, of the human gut. The team used new computational approaches and “long-read” DNA sequencing to reveal the diversity of bacteria in the gut microbiome of a single male human.

A paper describing their work was published online Dec. 14 in Nature Biotechnology. The lead author is Volodymyr Kuleshov, a doctoral student in computer science at Stanford. Snyder, the Stanford W. Ascherman, MD, FACS, Professor in Genetics, is co-senior author with professor of computer science Serafim Batzoglou, PhD.

Problem posed by short snippets of DNA

Current DNA sequencing technology looks at very short snippets of DNA sequences. If you are looking at just one genome — from a bacterium or a single person, for example — you can assemble the snippets into a whole genome, much as you might painstakingly assemble a jigsaw puzzle.

Michael Snyder

But when you are looking at snippets from a mass of different bacteria from the human gut, assembling those snippets is like trying to assemble 100 jigsaw puzzles from a pile of pieces from all 100 puzzles jumbled together, explained Snyder. Any two pieces could be from completely unrelated puzzles — analogous to different species of bacteria — while others could be from multiple copies of the same puzzle — analogous to the same species of bacteria.

If that sounds difficult, the real challenge is being able to tell apart the pieces from puzzles that are almost the same but not quite. And that’s what the researchers’ new technique does. “We assembled one whole genome from this big gemisch, which has never been done before,” said Snyder.

“We normally sequence 100 DNA bases off a 300-base fragment,” he said. “You just get snippets of information.” But using a new informatics approach, Snyder and Batzoglou’s team stitched together larger segments of the genome. “We have a sophisticated algorithm that lets us put together all these pieces — first assembling the snippets into longer, 10,000-base pieces, then the 10,000-base pieces into still-longer fragments, and then those into whole genomes,” Snyder said.

Such long sequences of DNA can span hundreds or even thousands of genes that couldn’t be recovered from short-read sequencing; they can help classify bacteria and other organisms by how related they are to one another; and the long sequences also help identify rare bacteria that might be missed by current methods. “We could assemble either entire genomes or at least very, very large chunks of the genome,” said Snyder.

Great bacterial diversity

Being able to see such long sections of the genome means being able to distinguish not only different species of bacteria, but different strains of the same species. The team tested the technique on a standardized sample of known bacteria and then took it for a spin on the gut contents of a human male. The result revealed not only lots of species, but many different strains of the same species. One bacterial species, for example, included five separate strains — all from one person.

The consequences of having so many different strains are hard to predict, but some strains may be more or less likely to make people ill. For example, many strains of E. coli bacteria live harmlessly and even helpfully in the human gut, while others are lethal. Being able to tell one strain from another could help researchers determine which strains are dangerous and why.

Right now, researchers who want to study virulence have to isolate that strain and then grow it in the lab. But some bacteria don’t grow easily in the lab. If researchers can study the genes that contribute to virulence directly in the mixture of bacteria from a human gut sample, they don’t need to isolate it and grow it in a pure culture. “When you assemble the whole genome, you have a better idea of what the pathogenic genes are. I think it’s going to be very, very powerful for understanding the genetic basis of pathogenesis,” said Snyder.

The new approach will make it easier to construct the evolutionary history of strains of infectious bacteria or viruses, such as Ebola. And the approach can be used in the field to study microbial diversity in healthy people and other animals, as well as in plants, water and soil. “When we put this together now, using these long reads, it’s like an IMAX movie,” Snyder said. “You can see the whole thing much more clearly than with what we do now, which is like an old black-and-white TV.”

Other Stanford-affiliated authors of the paper are postdoctoral scholars Chao Jiang, PhD, and Wenyu Zhou, PhD, and research associate Fereshteh Jahaniani, PhD.

This work was supported by National Institutes of Health (grant 3U54DK102556).

Stanford’s Department of Genetics in the School of Medicine and the Department of Computer Science in the School of Engineering also supported the work.    

Stanford Medicine integrates research, medical education and health care at its three institutions - Stanford University School of Medicine, Stanford Health Care (formerly Stanford Hospital & Clinics), and Lucile Packard Children's Hospital Stanford. For more information, please visit the Office of Communication & Public Affairs site at

Leading in Precision Health

Stanford Medicine is leading the biomedical revolution in precision health, defining and developing the next generation of care that is proactive, predictive and precise. 

A Legacy of Innovation

Stanford Medicine's unrivaled atmosphere of breakthrough thinking and interdisciplinary collaboration has fueled a long history of achievements.