Health Research and Policy

Abstract

DATE: February 27, 2014
TIME: 1:15 - 3:00 pm
LOCATION: Medical School Office Building, Rm x303
TITLE: A Comprehensive Regulatory Annotation of the Human Genome
SPEAKER: Anshul Kundaje, Assistant Professor
Departments of Genetics and of Computer Science, Stanford

The Encyclopedia of DNA Elements (ENCODE) Consortium (1) and The Roadmap Epigenomics Project are using a variety of sequencing-based functional genomic assays to interrogate the human transcriptome, regulome and epigenome in diverse cellular contexts. I will begin with a gentle introduction to the diversity and scale of datasets and a brief overview of robust, statistical methods for adaptive detection of high-confidence signals from massive collections of noisy, experimental data (2, 3). Next, I will present a variety of statistical and machine learning approaches to integrate these heterogeneous data and learn models of gene regulation. We have used ensemble learning approaches to dissect the complex combinations of interacting regulatory proteins that regulate different functional classes of genes (4). We have used Hidden Markov models to learn combinatorial chromatin state maps across 100s of human cell types thereby deciphering the largest collection of lineage-specific distal regulatory elements in the human genome (5). We have used probabilistic topic models to exploit the dynamics of gene expression and the activity of regulatory elements across cell-types and automatically learn long-range interactions between distal regulatory elements and their target genes. We have also developing a novel machine learning framework based on Boosting algorithms to integrate the diverse sources of regulatory information and jointly learn predictive, context-specific models of transcriptional regulation across diverse human tissue lineages (6, 7). Collectively, our integrative analyses provide a comprehensive regulatory annotation of the human genome. We have used these regulatory annotations and models to significantly improve detection and interpretation of disease-associated genetic variants (8) and to understand regulation of disease-relevant gene sets.

Suggested readings:
ENCODE Project Consortium et al., An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414), 57-74 (Sep, 2012). http://www.ncbi.nlm.nih.gov/pubmed/22955616

M. B. Gerstein et al., Architecture of the human regulatory network derived from ENCODE data. Nature 489(7414), 91-100 (Sep 6, 2012). http://www.ncbi.nlm.nih.gov/pubmed/22955619

M. M. Hoffman et al., Integrative annotation of chromatin elements from ENCODE data. Nucleic acids research 41(2), 827-41 (Jan, 2013). http://www.ncbi.nlm.nih.gov/pubmed/23221638

A. Kundaje et al., A predictive model of the oxygen and heme regulatory network in yeast. PLoS computational biology 4(11): e1000224 (Nov, 2008). http://www.ncbi.nlm.nih.gov/pubmed/19008939

Stanford Medicine Resources:

Footer Links: