The epigenome controls gene expression across diverse cell types, and integrates genetic and environmental signals. A new reference map will help interpret the genetic basis for disease.
February 18, 2015 - By Krista Conger
With few exceptions, every cell in your body has the same genome, or DNA sequence. However, different cell types interpret that all-inclusive list of instructions in different ways, based on a variety of biochemical modifications to the DNA or its associated proteins.
These modifications, which often occur in the form of chemical tags such as methyl groups that latch on to DNA and its packaging proteins, control whether, when and how a gene is expressed. Collectively, these modifications are known as the epigenome, but much about how they work in the hundreds of diverse cell types and tissue in the human body has remained a mystery.
Now, researchers in a massive, international collaboration sponsored by the National Institutes of Health have published the first integrative analysis of human epigenomes derived from diverse adult and fetal cell types and tissues, ranging from stem cells and blood cells to brain, muscle and liver tissue. The results of the collaboration, called the Roadmap Epigenomics Project, were published Feb. 18 in a series of papers in Nature.
The papers characterize and compare epigenomes within and across the largest collection of reference human cell types and tissues to date. Anshul Kundaje, PhD, an assistant professor of genetics and of computer science at Stanford University, is a lead author of the flagship integrative analysis paper. He also co-authored another companion paper in Nature that describes a potentially novel role of the immune system in the development of Alzheimer’s disease.
Understanding the genome’s ‘dark matter’
“We are trying to identify the key dynamic properties of the epigenome that give rise to different cell types and disease states throughout the body,” Kundaje said. “These maps will be extremely powerful in our quest to understand what is happening in the non-coding portions of the human genome.”
Kundaje began the work while a research scientist at the Massachusetts Institute of Technology and the Broad Institute, and continued it after joining the Stanford faculty in September 2013. He is the primary computational analyst of the project and has led the analysis team responsible for mining patterns from thousands of epigenomic data sets generated by large-scale data production centers from institutions across the country and around the world.
Only about 1.5 percent of the genome comprises genes that are translated into proteins. The remainder, sometimes referred to as the “dark matter” of the genome, is now found to contain millions of regulatory regions that turn on and off in specific configurations in each cell type thereby precisely controlling the expression of their target genes and giving rise to the diversity of cell types and cellular response in the human body. The Roadmap Epigenomics Project provides one of the most comprehensive maps of these regulatory elements.
In 2003, the Human Genome Project released the first complete draft of the human genome reference sequence. Over the years, the sequence was analyzed to identify the precise locations of genes along chromosomes. Efforts were then focused on understanding the sequence of the DNA between the genes. It was necessary to understand how individual cells use this non-coding regulatory sequence information to generate the vast array of tissues that make up the body, and what actually goes wrong with these regulatory instructions when humans develop diseases.
The problem is somewhat like being handed a list of all the ingredients available in a well-stocked kitchen without any idea of how to combine them. Tossing a few of them together, willy-nilly, into a baking dish and popping it into the oven isn’t likely to yield anything edible. But with a well-written recipe telling you how much and when to mix together flour, sugar, eggs and butter, you can turn out a perfect cake or fantastic waffles.
The completion of the Human Genome Project gave biologists the list of ingredients to which every cell has access. The Roadmap Epigenomics Project outlines the recipes and shows how cells use these ingredients to generate their own special sauce. By comparing and contrasting these cellular recipes, researchers can begin to draw parallels among cell types and even predict which cells might be involved in specific traits and diseases.
Now we can begin to see how specific portions of the genome are being used jointly, or exclusively, in different tissues in health and disease.
For example, a beta cell of the pancreas, which senses and responds to changing blood sugar levels, must know when and how to churn out a perfect batch of insulin. This exquisite control requires the coordinated expression of several genes. Meanwhile, a skin cell of the epidermis specializes in making keratin to provide a protective layer against germs and melanin to give skin its color and protect it from the sun’s rays.
Cues for the cells are found in their epigenomic modifications, many of which affect proteins called histones which keep DNA in the nucleus tightly bundled. Chemical tags called methyl or acetyl groups bind to histones to signal whether a regulatory region or gene is available for use, or to keep them inaccessible. Other methyl groups attached to the DNA itself to serve as runway lights or “keep away” signals for proteins responsible for regulating gene expression. It’s a complex recipe that keeps each cell functioning perfectly.
Now researchers can begin to peek behind the curtain into the inner workings of the cell by mapping and comparing the exact patterns of histone modification, DNA accessibility and gene expression in each tissue.
“This is the first time we’ve been able to analyze and understand relationships among a massive collection of cell types in a data-driven manner,” said Kundaje. “Now we can begin to see how specific portions of the genome are being used jointly, or exclusively, in different tissues in health and disease.”
A clue to Alzheimer’s
Kundaje points out that although researchers have identified many disease-associated mutations, it’s not always clear in which cell types in the body the mutations manifest themselves. Now they can begin to match up the mutations with the epigenomic maps to identify cell types and tissues in which the mutations are likely to affect regulation of genes and be deleterious to cellular function. The analyses thus provide new candidate cell types on which to focus disease studies.
...Our analysis provides convincing evidence that Alzheimer’s disease has a strong immune component that contributes to neurodegeneration.
“For example,” said Kundaje, “our analysis provides convincing evidence that Alzheimer’s disease has a strong immune component that contributes to neurodegeneration, in which mutations strongly affect regulatory regions of cells responsible for removing potentially damaging plaques from the brain.”
The publication of the papers is a milestone in genetics research, according to Kundaje. But it’s just the beginning. The International Human Epigenome Consortium aims to decipher 1,000 human epigenomes within the next seven to 10 years, for example.
“Now we can begin to make many predictions about which cell types are manifesting these mutations,” said Kundaje. “Until now, it’s been very difficult to perform functional genomic experiments for disease studies because we didn’t know which cell types we should focus on. This will open the door to follow-up on many different diseases. It’s going to be a vast resource for researchers around the world.”
Other Stanford authors are postdoctoral scholars Shin Lin, MD, PhD, and Yiing Lin, MD, PhD.
The work was supported by the National Institutes of Health (grants RO1NS078839, R01HG004037, RC1HG005334, U01ES017155, U01ES017154, U01ES017166, U01ES017156, U01DA025956, RC1HG005334, R01HG004037, RO1NS078839, F32HL110473 and K99HL119617, 5R24HD000836, P30AG10161, R01AG15819, R01AG17917 and U01AG46152), the Swiss National Science Foundation and the Belfer Neurodegeneration Consortium.
Stanford Medicine integrates research, medical education and health care at its three institutions - Stanford University School of Medicine, Stanford Health Care (formerly Stanford Hospital & Clinics), and Lucile Packard Children's Hospital Stanford. For more information, please visit the Office of Communication & Public Affairs site at http://mednews.stanford.edu.