Jonathan Pritchard grew up in England before moving to Pennsylvania during high school. He received his BSc in Biology and Mathematics from Penn State University in 1994, and his PhD in Biology at Stanford in 1998. After that he moved to a postdoc in the Department of Statistics at Oxford University and then to his first faculty job at the University of Chicago in 2001. He has been an Investigator of the Howard Hughes Medical Institute since 2008. Pritchard returned to Stanford University in 2013, where he is now a Professor in the Departments of Biology and Genetics.

Academic Appointments

Administrative Appointments

  • Investigator, Howard Hughes Medical Institute (2008 - Present)
  • Co-Director, Stanford's Center for Computational, Evolutionary and Human Genomics (2017 - Present)

Research & Scholarship

Current Research and Scholarly Interests

My group has expertise in the development of new statistical methods for genetic analysis and in their application to genomic data from humans and other organisms. We focus on questions relating to genetic variation and evolution: How does genetic variation impact phenotypic traits and evolution, both at the organismal and cellular level? What can we learn from genome sequences of modern and ancient humans about the relationships among human populations, and the the nature of adaptation in these populations?

We often work on problems where there are no off-the-shelf statistical methods. Thus, an important part of our work is in developing appropriate statistical and computational approaches that can yield new insights into biological data. In the past, we have made important contributions to a variety of problems in human population genetics, including methods for complex trait mapping, inference of population structure and history, and studies of natural selection. We have a strong track record of producing user-friendly resources that are widely used in the community, and in applied data analysis to tackle important biological questions. Notably, our Structure algorithm and software package for inferring population structure from genetic data have received >30,000 total citations spread across several papers.

Since 2008 an important emphasis of my group has focused on understanding gene regulation, and in particular how genetic variation may impact regulation. Ultimately, we would like to be able to predict which noncoding variants in the genome are likely to have regulatory effects in any given cell type, and how these link to phenotypic variation and disease. My lab has been deeply involved in developing new computational methods to interpret various types of modern genomic assays and in linking these to genetic variation.

Secondly, we have had a major focus on understanding the genetic architecture of complex traits, and the implications for understanding evolution. We have argued that much--if not most--evolution in humans likely proceeds through a process that we call "polygenic adaptation" in which populations evolve through small allele frequency shifts at many loci.

We have also written extensively about conceptual models for understanding the genetic architecture of trait variation (Boyle et al, 2017). We have argued that the data are consistent with a model in which essentially every regulatory variant in disease-relevant cell types can affect risk, and proposed that most of these effects act through trans-regulatory networks. Testing this model is an ongoing focus of our work.


2018-19 Courses

Stanford Advisees

Graduate and Fellowship Programs

  • Biology (School of Humanities and Sciences) (Phd Program)
  • Biomedical Informatics (Phd Program)


All Publications

  • Remodeling the Specificity of an Endosomal CORVET Tether Underlies Formation of Regulated Secretory Vesicles in the Ciliate Tetrahymena thermophila CURRENT BIOLOGY Sparvoli, D., Richardson, E., Osakada, H., Lan, X., Iwamoto, M., Bowman, G. R., Kontur, C., Bourland, W. A., Lynn, D. H., Pritchard, J. K., Haraguchi, T., Dacks, J. B., Turkewitz, A. P. 2018; 28 (5): 697-+


    In the endocytic pathway of animals, two related complexes, called CORVET (class C core vacuole/endosome transport) and HOPS (homotypic fusion and protein sorting), act as both tethers and fusion factors for early and late endosomes, respectively. Mutations in CORVET or HOPS lead to trafficking defects and contribute to human disease, including immune dysfunction. HOPS and CORVET are conserved throughout eukaryotes, but remarkably, in the ciliate Tetrahymena thermophila, the HOPS-specific subunits are absent, while CORVET-specific subunits have proliferated. VPS8 (vacuolar protein sorting), a CORVET subunit, expanded to 6 paralogs in Tetrahymena. This expansion correlated with loss of HOPS within a ciliate subgroup, including the Oligohymenophorea, which contains Tetrahymena. As uncovered via forward genetics, a single VPS8 paralog in Tetrahymena (VPS8A) is required to synthesize prominent secretory granules called mucocysts. More specifically, Δvps8a cells fail to deliver a subset of cargo proteins to developing mucocysts, instead accumulating that cargo in vesicles also bearing the mucocyst-sorting receptor Sor4p. Surprisingly, although this transport step relies on CORVET, it does not appear to involve early endosomes. Instead, Vps8a associates with the late endosomal/lysosomal marker Rab7, indicating that target specificity switching occurred in CORVET subunits during the evolution of ciliates. Mucocysts belong to a markedly diverse and understudied class of protist secretory organelles called extrusomes. Our results underscore that biogenesis of mucocysts depends on endolysosomal trafficking, revealing parallels with invasive organelles in apicomplexan parasites and suggesting that a wide array of secretory adaptations in protists, like in animals, depend on mechanisms related to lysosome biogenesis.

    View details for DOI 10.1016/j.cub.2018.01.047

    View details for Web of Science ID 000426571100022

    View details for PubMedID 29478853

    View details for PubMedCentralID PMC5840023

  • Impact of regulatory variation across human iPSCs and differentiated cells GENOME RESEARCH Banovich, N. E., Li, Y. I., Raj, A., Ward, M. C., Greenside, P., Calderon, D., Tung, P., Burnett, J. E., Myrthil, M., Thomas, S. M., Burrows, C. K., Romero, I., Pavlovic, B. J., Kundaje, A., Pritchard, J. K., Gilad, Y. 2018; 28 (1): 122–31


    Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type-specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type-specific chromatin accessibility.

    View details for DOI 10.1101/gr.224436.117

    View details for Web of Science ID 000419132700011

    View details for PubMedID 29208628

    View details for PubMedCentralID PMC5749177

  • Annotation-free quantification of RNA splicing using LeafCutter NATURE GENETICS Li, Y. I., Knowles, D. A., Humphrey, J., Barbeira, A. N., Dickinson, S. P., Im, H., Pritchard, J. K. 2018; 50 (1): 151-+


    The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both to detect differential splicing between sample groups and to map splicing quantitative trait loci (sQTLs). Compared with contemporary methods, our approach identified 1.4-2.1 times more sQTLs, many of which helped us ascribe molecular effects to disease-associated variants. Transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at a 5% false discovery rate by an average of 2.1-fold compared with that detected through the use of gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available online.

    View details for DOI 10.1038/s41588-017-0004-9

    View details for Web of Science ID 000423157400018

    View details for PubMedID 29229983

    View details for PubMedCentralID PMC5742080

  • Large-Scale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment. Cell stem cell Yamamoto, R., Wilkinson, A. C., Ooehara, J., Lan, X., Lai, C. Y., Nakauchi, Y., Pritchard, J. K., Nakauchi, H. 2018; 22 (4): 600–607.e4


    Aging is linked to functional deterioration and hematological diseases. The hematopoietic system is maintained by hematopoietic stem cells (HSCs), and dysfunction within the HSC compartment is thought to be a key mechanism underlying age-related hematopoietic perturbations. Using single-cell transplantation assays with five blood-lineage analysis, we previously identified myeloid-restricted repopulating progenitors (MyRPs) within the phenotypic HSC compartment in young mice. Here, we determined the age-related functional changes to the HSC compartment using over 400 single-cell transplantation assays. Notably, MyRP frequency increased dramatically with age, while multipotent HSCs expanded modestly within the bone marrow. We also identified a subset of functional cells that were myeloid restricted in primary recipients but displayed multipotent (five blood-lineage) output in secondary recipients. We have termed this cell type latent-HSCs, which appear exclusive to the aged HSC compartment. These results question the traditional dogma of HSC aging and our current approaches to assay and define HSCs.

    View details for DOI 10.1016/j.stem.2018.03.013

    View details for PubMedID 29625072

  • Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Harpak, A., Lan, X., Gao, Z., Pritchard, J. K. 2017; 114 (48): 12779–84


    Gene conversion is the copying of a genetic sequence from a "donor" region to an "acceptor." In nonallelic gene conversion (NAGC), the donor and the acceptor are at distinct genetic loci. Despite the role NAGC plays in various genetic diseases and the concerted evolution of gene families, the parameters that govern NAGC are not well characterized. Here, we survey duplicate gene families and identify converted tracts in 46% of them. These conversions reflect a large GC bias of NAGC. We develop a sequence evolution model that leverages substantially more information in duplicate sequences than used by previous methods and use it to estimate the parameters that govern NAGC in humans: a mean converted tract length of 250 bp and a probability of [Formula: see text] per generation for a nucleotide to be converted (an order of magnitude higher than the point mutation rate). Despite this high baseline rate, we show that NAGC slows down as duplicate sequences diverge-until an eventual "escape" of the sequences from its influence. As a result, NAGC has a small average effect on the sequence divergence of duplicates. This work improves our understanding of the NAGC mechanism and the role that it plays in the evolution of gene duplicates.

    View details for DOI 10.1073/pnas.1708151114

    View details for Web of Science ID 000416891600062

    View details for PubMedID 29138319

    View details for PubMedCentralID PMC5715747

  • Inferring Relevant Cell Types for Complex Traits by Using Single-Cell Gene Expression AMERICAN JOURNAL OF HUMAN GENETICS Calderon, D., Bhaskar, A., Knowles, D. A., Golan, D., Raj, T., Fu, A. Q., Pritchard, J. K. 2017; 101 (5): 686–99


    Previous studies have prioritized trait-relevant cell types by looking for an enrichment of genome-wide association study (GWAS) signal within functional regions. However, these studies are limited in cell resolution by the lack of functional annotations from difficult-to-characterize or rare cell populations. Measurement of single-cell gene expression has become a popular method for characterizing novel cell types, and yet limited work has linked single-cell RNA sequencing (RNA-seq) to phenotypes of interest. To address this deficiency, we present RolyPoly, a regression-based polygenic model that can prioritize trait-relevant cell types and genes from GWAS summary statistics and gene expression data. RolyPoly is designed to use expression data from either bulk tissue or single-cell RNA-seq. In this study, we demonstrated RolyPoly's accuracy through simulation and validated previously known tissue-trait associations. We discovered a significant association between microglia and late-onset Alzheimer disease and an association between schizophrenia and oligodendrocytes and replicating fetal cortical cells. Additionally, RolyPoly computes a trait-relevance score for each gene to reflect the importance of expression specific to a cell type. We found that differentially expressed genes in the prefrontal cortex of individuals with Alzheimer disease were significantly enriched with genes ranked highly by RolyPoly gene scores. Overall, our method represents a powerful framework for understanding the effect of common variants on cell types contributing to complex traits.

    View details for DOI 10.1016/j.ajhg.2017.09.009

    View details for Web of Science ID 000414251600003

    View details for PubMedID 29106824

    View details for PubMedCentralID PMC5673624

  • Rapid evolution of the human mutation spectrum ELIFE Harris, K., Pritchard, J. K. 2017; 6


    DNA is a remarkably precise medium for copying and storing biological information. This high fidelity results from the action of hundreds of genes involved in replication, proofreading, and damage repair. Evolutionary theory suggests that in such a system, selection has limited ability to remove genetic variants that change mutation rates by small amounts or in specific sequence contexts. Consistent with this, using SNV variation as a proxy for mutational input, we report here that mutational spectra differ substantially among species, human continental groups and even some closely related populations. Close examination of one signal, an increased TCC→TTC mutation rate in Europeans, indicates a burst of mutations from about 15,000 to 2000 years ago, perhaps due to the appearance, drift, and ultimate elimination of a genetic modifier of mutation rate. Our results suggest that mutation rates can evolve markedly over short evolutionary timescales and suggest the possibility of mapping mutational modifiers.

    View details for DOI 10.7554/eLife.24284

    View details for Web of Science ID 000401409000001

    View details for PubMedID 28440220

  • Tracing the peopling of the world through genomics. Nature Nielsen, R., Akey, J. M., Jakobsson, M., Pritchard, J. K., Tishkoff, S., Willerslev, E. 2017; 541 (7637): 302-310


    Advances in the sequencing and the analysis of the genomes of both modern and ancient peoples have facilitated a number of breakthroughs in our understanding of human evolutionary history. These include the discovery of interbreeding between anatomically modern humans and extinct hominins; the development of an increasingly detailed description of the complex dispersal of modern humans out of Africa and their population expansion worldwide; and the characterization of many of the genetic adaptions of humans to local environmental conditions. Our interpretation of the evolutionary history and adaptation of humans is being transformed by analyses of these new genomic data.

    View details for DOI 10.1038/nature21347

    View details for PubMedID 28102248

  • Batch effects and the effective design of single-cell gene expression studies SCIENTIFIC REPORTS Tung, P., Blischak, J. D., Hsiao, C. J., Knowles, D. A., Burnett, J. E., Pritchard, J. K., Gilad, Y. 2017; 7


    Single-cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However, the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single-cell Fluidigm C1 platform. To do so, we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples, to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype, but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation, indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results, we suggest a framework for effective scRNA-seq studies.

    View details for DOI 10.1038/srep39921

    View details for Web of Science ID 000391022000001

    View details for PubMedID 28045081

    View details for PubMedCentralID PMC5206706

  • Mutation Rate Variation is a Primary Determinant of the Distribution of Allele Frequencies in Humans PLOS GENETICS Harpak, A., Bhaskar, A., Pritchard, J. K. 2016; 12 (12)


    The site frequency spectrum (SFS) has long been used to study demographic history and natural selection. Here, we extend this summary by examining the SFS conditional on the alleles found at the same site in other species. We refer to this extension as the "phylogenetically-conditioned SFS" or cSFS. Using recent large-sample data from the Exome Aggregation Consortium (ExAC), combined with primate genome sequences, we find that human variants that occurred independently in closely related primate lineages are at higher frequencies in humans than variants with parallel substitutions in more distant primates. We show that this effect is largely due to sites with elevated mutation rates causing significant departures from the widely-used infinite sites mutation model. Our analysis also suggests substantial variation in mutation rates even among mutations involving the same nucleotide changes. In summary, we show that variable mutation rates are key determinants of the SFS in humans.

    View details for DOI 10.1371/journal.pgen.1006489

    View details for Web of Science ID 000392138700034

    View details for PubMedID 27977673

    View details for PubMedCentralID PMC5157949

  • A Bibliometric History of the Journal GENETICS GENETICS Telis, N., Lehmann, B. V., Feldman, M. W., Pritchard, J. K. 2016; 204 (4): 1337-1342

    View details for DOI 10.1534/genetics.116.196964

    View details for Web of Science ID 000390765500004

    View details for PubMedID 27927899

    View details for PubMedCentralID PMC5161266

  • Detection of human adaptation during the past 2000 years. Science Field, Y., Boyle, E. A., Telis, N., Gao, Z., Gaulton, K. J., Golan, D., Yengo, L., Rocheleau, G., Froguel, P., McCarthy, M. I., Pritchard, J. K. 2016


    Detection of recent natural selection is a challenging problem in population genetics. Here we introduce the singleton density score (SDS), a method to infer very recent changes in allele frequencies from contemporary genome sequences. Applied to data from the UK10K Project, SDS reflects allele frequency changes in the ancestors of modern Britons during the past ~2000 to 3000 years. We see strong signals of selection at lactase and the major histocompatibility complex, and in favor of blond hair and blue eyes. For polygenic adaptation, we find that recent selection for increased height has driven allele frequency shifts across most of the genome. Moreover, we identify shifts associated with other complex traits, suggesting that polygenic adaptation has played a pervasive role in shaping genotypic and phenotypic variation in modern humans.

    View details for PubMedID 27738015

    View details for PubMedCentralID PMC5182071

  • Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nature genetics Corces, M. R., Buenrostro, J. D., Wu, B., Greenside, P. G., Chan, S. M., Koenig, J. L., Snyder, M. P., Pritchard, J. K., Kundaje, A., Greenleaf, W. J., Majeti, R., Chang, H. Y. 2016; 48 (10): 1193-1203


    We define the chromatin accessibility and transcriptional landscapes in 13 human primary blood cell types that span the hematopoietic hierarchy. Exploiting the finding that the enhancer landscape better reflects cell identity than mRNA levels, we enable 'enhancer cytometry' for enumeration of pure cell types from complex populations. We identify regulators governing hematopoietic differentiation and further show the lineage ontogeny of genetic elements linked to diverse human diseases. In acute myeloid leukemia (AML), chromatin accessibility uncovers unique regulatory evolution in cancer cells with a progressively increasing mutation burden. Single AML cells exhibit distinctive mixed regulome profiles corresponding to disparate developmental stages. A method to account for this regulatory heterogeneity identified cancer-specific deviations and implicated HOX factors as key regulators of preleukemic hematopoietic stem cell characteristics. Thus, regulome dynamics can provide diverse insights into hematopoietic development and disease.

    View details for DOI 10.1038/ng.3646

    View details for PubMedID 27526324

    View details for PubMedCentralID PMC5042844

  • Genetic variation in MHC proteins is associated with T cell receptor expression biases. Nature genetics Sharon, E., Sibener, L. V., Battle, A., Fraser, H. B., Garcia, K. C., Pritchard, J. K. 2016; 48 (9): 995-1002


    In each individual, a highly diverse T cell receptor (TCR) repertoire interacts with peptides presented by major histocompatibility complex (MHC) molecules. Despite extensive research, it remains controversial whether germline-encoded TCR-MHC contacts promote TCR-MHC specificity and, if so, whether differences exist in TCR V gene compatibilities with different MHC alleles. We applied expression quantitative trait locus (eQTL) mapping to test for associations between genetic variation and TCR V gene usage in a large human cohort. We report strong trans associations between variation in the MHC locus and TCR V gene usage. Fine-mapping of the association signals identifies specific amino acids from MHC genes that bias V gene usage, many of which contact or are spatially proximal to the TCR or peptide in the TCR-peptide-MHC complex. Hence, these MHC variants, several of which are linked to autoimmune diseases, can directly affect TCR-MHC interaction. These results provide the first examples of trans-QTL effects mediated by protein-protein interactions and are consistent with intrinsic TCR-MHC specificity.

    View details for DOI 10.1038/ng.3625

    View details for PubMedID 27479906

    View details for PubMedCentralID PMC5010864

  • Genome-wide association study of behavioral, physiological and gene expression traits in outbred CFW mice. Nature genetics Parker, C. C., Gopalakrishnan, S., Carbonetto, P., Gonzales, N. M., Leung, E., Park, Y. J., Aryee, E., Davis, J., Blizard, D. A., Ackert-Bicknell, C. L., Lionikas, A., Pritchard, J. K., Palmer, A. A. 2016; 48 (8): 919-926


    Although mice are the most widely used mammalian model organism, genetic studies have suffered from limited mapping resolution due to extensive linkage disequilibrium (LD) that is characteristic of crosses among inbred strains. Carworth Farms White (CFW) mice are a commercially available outbred mouse population that exhibit rapid LD decay in comparison to other available mouse populations. We performed a genome-wide association study (GWAS) of behavioral, physiological and gene expression phenotypes using 1,200 male CFW mice. We used genotyping by sequencing (GBS) to obtain genotypes at 92,734 SNPs. We also measured gene expression using RNA sequencing in three brain regions. Our study identified numerous behavioral, physiological and expression quantitative trait loci (QTLs). We integrated the behavioral QTL and eQTL results to implicate specific genes, including Azi2 in sensitivity to methamphetamine and Zmynd11 in anxiety-like behavior. The combination of CFW mice, GBS and RNA sequencing constitutes a powerful approach to GWAS in mice.

    View details for DOI 10.1038/ng.3609

    View details for PubMedID 27376237

    View details for PubMedCentralID PMC4963286

  • Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling. eLife Raj, A., Wang, S. H., Shim, H., Harpak, A., Li, Y. I., Engelmann, B., Stephens, M., Gilad, Y., Pritchard, J. K. 2016; 5


    Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.

    View details for DOI 10.7554/eLife.13328

    View details for PubMedID 27232982

    View details for PubMedCentralID PMC4940163

  • Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals SCIENCE Lan, X., Pritchard, J. K. 2016; 352 (6288): 1009-1013


    Gene duplication is a fundamental process in genome evolution. However, most young duplicates are degraded by loss-of-function mutations, and the factors that allow some duplicate pairs to survive long-term remain controversial. One class of models to explain duplicate retention invokes sub- or neofunctionalization, whereas others focus on sharing of gene dosage. RNA-sequencing data from 46 human and 26 mouse tissues indicate that subfunctionalization of expression evolves slowly and is rare among duplicates that arose within the placental mammals, possibly because tandem duplicates are coregulated by shared genomic elements. Instead, consistent with the dosage-sharing hypothesis, most young duplicates are down-regulated to match expression levels of single-copy genes. Thus, dosage sharing of expression allows for the initial survival of mammalian duplicates, followed by slower functional adaptation enabling long-term preservation.

    View details for DOI 10.1126/science.aad8411

    View details for Web of Science ID 000376147800053

    View details for PubMedID 27199432

  • RNA splicing is a primary link between genetic variation and disease SCIENCE Li, Y. I., van de Geijn, B., Raj, A., Knowles, D. A., Petti, A. A., Golan, D., Gilad, Y., Pritchard, J. K. 2016; 352 (6285): 600-604


    Noncoding variants play a central role in the genetics of complex traits, but we still lack a full understanding of the molecular pathways through which they act. We quantified the contribution of cis-acting genetic effects at all major stages of gene regulation from chromatin to proteins, in Yoruba lymphoblastoid cell lines (LCLs). About ~65% of expression quantitative trait loci (eQTLs) have primary effects on chromatin, whereas the remaining eQTLs are enriched in transcribed regions. Using a novel method, we also detected 2893 splicing QTLs, most of which have little or no effect on gene-level expression. These splicing QTLs are major contributors to complex traits, roughly on a par with variants that affect gene expression levels. Our study provides a comprehensive view of the mechanisms linking genetic variation to variation in human gene regulation.

    View details for DOI 10.1126/science.aad9417

    View details for Web of Science ID 000374998600048

    View details for PubMedID 27126046

  • Genetic Variation, Not Cell Type of Origin, Underlies the Majority of Identifiable Regulatory Differences in iPSCs. PLoS genetics Burrows, C. K., Banovich, N. E., Pavlovic, B. J., Patterson, K., Gallego Romero, I., Pritchard, J. K., Gilad, Y. 2016; 12 (1)


    The advent of induced pluripotent stem cells (iPSCs) revolutionized human genetics by allowing us to generate pluripotent cells from easily accessible somatic tissues. This technology can have immense implications for regenerative medicine, but iPSCs also represent a paradigm shift in the study of complex human phenotypes, including gene regulation and disease. Yet, an unresolved caveat of the iPSC model system is the extent to which reprogrammed iPSCs retain residual phenotypes from their precursor somatic cells. To directly address this issue, we used an effective study design to compare regulatory phenotypes between iPSCs derived from two types of commonly used somatic precursor cells. We find a remarkably small number of differences in DNA methylation and gene expression levels between iPSCs derived from different somatic precursors. Instead, we demonstrate genetic variation is associated with the majority of identifiable variation in DNA methylation and gene expression levels. We show that the cell type of origin only minimally affects gene expression levels and DNA methylation in iPSCs, and that genetic variation is the main driver of regulatory differences between iPSCs of different donors. Our findings suggest that studies using iPSCs should focus on additional individuals rather than clones from the same individual.

    View details for DOI 10.1371/journal.pgen.1005793

    View details for PubMedID 26812582

    View details for PubMedCentralID PMC4727884

  • Genetic Variation, Not Cell Type of Origin, Underlies the Majority of Identifiable Regulatory Differences in iPSCs PLOS GENETICS Burrows, C. K., Banovich, N. E., Pavlovic, B. J., Patterson, K., Romero, I. G., Pritchard, J. K., Gilad, Y. 2016; 12 (1)
  • Abundant contribution of short tandem repeats to gene expression variation in humans NATURE GENETICS Gymrek, M., Willems, T., Guilmatre, A., Zeng, H., Markus, B., Georgiev, S., Daly, M. J., Price, A. L., Pritchard, J. K., Sharp, A. J., Erlich, Y. 2016; 48 (1): 22-?

    View details for DOI 10.1038/ng.3461

    View details for Web of Science ID 000367255300009

  • Whole Genome Sequencing Identifies a Novel Factor Required for Secretory Granule Maturation in Tetrahymena thermophila. G3 (Bethesda, Md.) Kontur, C., Kumar, S., Lan, X., Pritchard, J. K., Turkewitz, A. P. 2016; 6 (8): 2505-2516


    Unbiased genetic approaches have a unique ability to identify novel genes associated with specific biological pathways. Thanks to next generation sequencing, forward genetic strategies can be expanded to a wider range of model organisms. The formation of secretory granules, called mucocysts, in the ciliate Tetrahymena thermophila relies, in part, on ancestral lysosomal sorting machinery, but is also likely to involve novel factors. In prior work, multiple strains with defects in mucocyst biogenesis were generated by nitrosoguanidine mutagenesis, and characterized using genetic and cell biological approaches, but the genetic lesions themselves were unknown. Here, we show that analyzing one such mutant by whole genome sequencing reveals a novel factor in mucocyst formation. Strain UC620 has both morphological and biochemical defects in mucocyst maturation-a process analogous to dense core granule maturation in animals. Illumina sequencing of a pool of UC620 F2 clones identified a missense mutation in a novel gene called MMA1 (Mucocyst maturation). The defects in UC620 were rescued by expression of a wild-type copy of MMA1, and disrupting MMA1 in an otherwise wild-type strain phenocopies UC620. The product of MMA1, characterized as a CFP-tagged copy, encodes a large soluble cytosolic protein. A small fraction of Mma1p-CFP is pelletable, which may reflect association with endosomes. The gene has no identifiable homologs except in other Tetrahymena species, and therefore represents an evolutionarily recent innovation that is required for granule maturation.

    View details for DOI 10.1534/g3.116.028878

    View details for PubMedID 27317773

    View details for PubMedCentralID PMC4978903

  • WASP: allele-specific software for robust molecular quantitative trait locus discovery NATURE METHODS van de Geijn, B., McVicker, G., Gila, Y., Pritchard, J. K. 2015; 12 (11): 1061-1063

    View details for DOI 10.1038/NMETH.3582

    View details for Web of Science ID 000364500900022

    View details for PubMedID 26366987

  • Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions CELL Grubert, F., Zaugg, J. B., Kasowski, M., Ursu, O., Spacek, D. V., Martin, A. R., Greenside, P., Srivas, R., Phanstiel, D. H., Pekowska, A., Heidari, N., Euskirchen, G., Huber, W., Pritchard, J. K., Bustamante, C. D., Steinmetz, L. M., Kundaje, A., Snyder, M. 2015; 162 (5): 1051-1065


    Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.

    View details for DOI 10.1016/j.cell.2015.07.048

    View details for Web of Science ID 000360589900015

    View details for PubMedCentralID PMC4556133

  • Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions. Cell Grubert, F., Zaugg, J. B., Kasowski, M., Ursu, O., Spacek, D. V., Martin, A. R., Greenside, P., Srivas, R., Phanstiel, D. H., Pekowska, A., Heidari, N., Euskirchen, G., Huber, W., Pritchard, J. K., Bustamante, C. D., Steinmetz, L. M., Kundaje, A., Snyder, M. 2015; 162 (5): 1051-1065


    Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.

    View details for DOI 10.1016/j.cell.2015.07.048

    View details for PubMedID 26300125

    View details for PubMedCentralID PMC4556133

  • The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans SCIENCE Ardlie, K. G., DeLuca, D. S., Segre, A. V., Sullivan, T. J., Young, T. R., Gelfand, E. T., Trowbridge, C. A., Maller, J. B., Tukiainen, T., Lek, M., Ward, L. D., Kheradpour, P., Iriarte, B., Meng, Y., Palmer, C. D., Esko, T., Winckler, W., Hirschhorn, J. N., Kellis, M., MacArthur, D. G., Getz, G., Shabalin, A. A., Li, G., Zhou, Y., Nobel, A. B., Rusyn, I., Wright, F. A., Lappalainen, T., Ferreira, P. G., Ongen, H., Rivas, M. A., Battle, A., Mostafavi, S., Monlong, J., Sammeth, M., Mele, M., Reverter, F., Goldmann, J. M., Koller, D., Guigo, R., McCarthy, M. I., Dermitzakis, E. T., Gamazon, E. R., Im, H. K., Konkashbaev, A., Nicolae, D. L., Cox, N. J., Flutre, T., Wen, X., Stephens, M., Pritchard, J. K., Tu, Z., Zhang, B., Huang, T., Long, Q., Lin, L., Yang, J., Zhu, J., Liu, J., Brown, A., Mestichelli, B., Tidwell, D., Lo, E., Salvatore, M., Shad, S., Thomas, J. A., Lonsdale, J. T., Moser, M. T., Gillard, B. M., Karasik, E., Ramsey, K., Choi, C., Foster, B. A., Syron, J., Fleming, J., Magazine, H., Hasz, R., Walters, G. D., Bridge, J. P., Miklos, M., Sullivan, S., Barker, L. K., Traino, H. M., Mosavel, M., Siminoff, L. A., Valley, D. R., Rohrer, D. C., Jewell, S. D., Branton, P. A., Sobin, L. H., Barcus, M., Qi, L., McLean, J., Hariharan, P., Um, K. S., Wu, S., Tabor, D., Shive, C., Smith, A. M., Buia, S. A., Undale, A. H., Robinson, K. L., Roche, N., Valentino, K. M., Britton, A., Burges, R., Bradbury, D., Hambright, K. W., Seleski, J., Korzeniewski, G. E., Erickson, K., Marcus, Y., Tejada, J., Taherian, M., Lu, C., Basile, M., Mash, D. C., Volpi, S., Struewing, J. P., Temple, G. F., Boyer, J., Colantuoni, D., Little, R., Koester, S., Carithers, L. J., Moore, H. M., Guan, P., Compton, C., Sawyer, S. J., Demchok, J. P., Vaught, J. B., Rabiner, C. A., Lockhart, N. C., Ardlie, K. G., Getz, G., Wright, F. A., Kellis, M., Volpi, S., Dermitzakis, E. T. 2015; 348 (6235): 648-660
  • Reprogramming LCLs to iPSCs Results in Recovery of Donor-Specific Gene Expression Signature PLOS GENETICS Thomas, S. M., Kagan, C., Pavlovic, B. J., Burnett, J., Patterson, K., Pritchard, J. K., Gilad, Y. 2015; 11 (5)


    Renewable in vitro cell cultures, such as lymphoblastoid cell lines (LCLs), have facilitated studies that contributed to our understanding of genetic influence on human traits. However, the degree to which cell lines faithfully maintain differences in donor-specific phenotypes is still debated. We have previously reported that standard cell line maintenance practice results in a loss of donor-specific gene expression signatures in LCLs. An alternative to the LCL model is the induced pluripotent stem cell (iPSC) system, which carries the potential to model tissue-specific physiology through the use of differentiation protocols. Still, existing LCL banks represent an important source of starting material for iPSC generation, and it is possible that the disruptions in gene regulation associated with long-term LCL maintenance could persist through the reprogramming process. To address this concern, we studied the effect of reprogramming mature LCL cultures from six unrelated donors to iPSCs on the ensuing gene expression patterns within and between individuals. We show that the reprogramming process results in a recovery of donor-specific gene regulatory signatures, increasing the number of genes with a detectable donor effect by an order of magnitude. The proportion of variation in gene expression statistically attributed to donor increases from 6.9% in LCLs to 24.5% in iPSCs (P < 10-15). Since environmental contributions are unlikely to be a source of individual variation in our system of highly passaged cultured cell lines, our observations suggest that the effect of genotype on gene regulation is more pronounced in iPSCs than in LCLs. Our findings indicate that iPSCs can be a powerful model system for studies of phenotypic variation across individuals in general, and the genetic association with variation in gene regulation in particular. We further conclude that LCLs are an appropriate starting material for iPSC generation.

    View details for DOI 10.1371/journal.pgen.1005216

    View details for Web of Science ID 000355305200032

    View details for PubMedID 25950834

  • Genomic variation. Impact of regulatory variation from RNA to protein. Science Battle, A., Khan, Z., Wang, S. H., Mitrano, A., Ford, M. J., Pritchard, J. K., Gilad, Y. 2015; 347 (6222): 664-667


    The phenotypic consequences of expression quantitative trait loci (eQTLs) are presumably due to their effects on protein expression levels. Yet the impact of genetic variation, including eQTLs, on protein levels remains poorly understood. To address this, we mapped genetic variants that are associated with eQTLs, ribosome occupancy (rQTLs), or protein abundance (pQTLs). We found that most QTLs are associated with transcript expression levels, with consequent effects on ribosome and protein levels. However, eQTLs tend to have significantly reduced effect sizes on protein levels, which suggests that their potential impact on downstream phenotypes is often attenuated or buffered. Additionally, we identified a class of cis QTLs that affect protein abundance with little or no effect on messenger RNA or ribosome levels, which suggests that they may arise from differences in posttranslational regulation.

    View details for DOI 10.1126/science.1260793

    View details for PubMedID 25657249

  • Impact of regulatory variation from RNA to protein SCIENCE Battle, A., Khan, Z., Wang, S. H., Mitrano, A., Ford, M. J., Pritchard, J. K., Gilad, Y. 2015; 347 (6222): 664-667
  • The Genetic and Mechanistic Basis for Variation in Gene Regulation PLOS GENETICS Pai, A. A., Pritchard, J. K., Gilad, Y. 2015; 11 (1)


    It is now well established that noncoding regulatory variants play a central role in the genetics of common diseases and in evolution. However, until recently, we have known little about the mechanisms by which most regulatory variants act. For instance, what types of functional elements in DNA, RNA, or proteins are most often affected by regulatory variants? Which stages of gene regulation are typically altered? How can we predict which variants are most likely to impact regulation in a given cell type? Recent studies, in many cases using quantitative trait loci (QTL)-mapping approaches in cell lines or tissue samples, have provided us with considerable insight into the properties of genetic loci that have regulatory roles. Such studies have uncovered novel biochemical regulatory interactions and led to the identification of previously unrecognized regulatory mechanisms. We have learned that genetic variation is often directly associated with variation in regulatory activities (namely, we can map regulatory QTLs, not just expression QTLs [eQTLs]), and we have taken the first steps towards understanding the causal order of regulatory events (for example, the role of pioneer transcription factors). Yet, in most cases, we still do not know how to interpret overlapping combinations of regulatory interactions, and we are still far from being able to predict how variation in regulatory mechanisms is propagated through a chain of interactions to eventually result in changes in gene expression profiles.

    View details for DOI 10.1371/journal.pgen.1004857

    View details for Web of Science ID 000349314600009

    View details for PubMedID 25569255

  • msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding. PloS one Raj, A., Shim, H., Gilad, Y., Pritchard, J. K., Stephens, M. 2015; 10 (9)

    View details for DOI 10.1371/journal.pone.0138030

    View details for PubMedID 26406244

  • msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding. PloS one Raj, A., Shim, H., Gilad, Y., Pritchard, J. K., Stephens, M. 2015; 10 (9)


    Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at

    View details for DOI 10.1371/journal.pone.0138030

    View details for PubMedID 26406244

  • Methylation QTLs Are Associated with Coordinated Changes in Transcription Factor Binding, Histone Modifications, and Gene Expression Levels PLOS GENETICS Banovich, N. E., Lan, X., McVicker, G., van de Geijn, B., Degner, J. F., Blischak, J. D., Roux, J., Pritchard, J. K., Gilad, Y. 2014; 10 (9)
  • fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets GENETICS Raj, A., Stephens, M., Pritchard, J. K. 2014; 197 (2): 573-U207
  • The deleterious mutation load is insensitive to recent population history. Nature genetics Simons, Y. B., Turchin, M. C., Pritchard, J. K., Sella, G. 2014; 46 (3): 220-224


    Human populations have undergone major changes in population size in the past 100,000 years, including recent rapid growth. How these demographic events have affected the burden of deleterious mutations in individuals and the frequencies of disease mutations in populations remains unclear. We use population genetic models to show that recent human demography has probably had little impact on the average burden of deleterious mutations. This prediction is supported by two exome sequence data sets showing that individuals of west African and European ancestry carry very similar burdens of damaging mutations. We further show that for many diseases, rare alleles are unlikely to contribute a large fraction of the heritable variation, and therefore the impact of recent growth is likely to be modest. However, for those diseases that have a direct impact on fitness, strongly deleterious rare mutations probably do have an important role, and recent growth will have increased their impact.

    View details for DOI 10.1038/ng.2896

    View details for PubMedID 24509481

  • The functional consequences of variation in transcription factor binding. PLoS genetics Cusanovich, D. A., Pavlovic, B., Pritchard, J. K., Gilad, Y. 2014; 10 (3)


    One goal of human genetics is to understand how the information for precise and dynamic gene expression programs is encoded in the genome. The interactions of transcription factors (TFs) with DNA regulatory elements clearly play an important role in determining gene expression outputs, yet the regulatory logic underlying functional transcription factor binding is poorly understood. Many studies have focused on characterizing the genomic locations of TF binding, yet it is unclear to what extent TF binding at any specific locus has functional consequences with respect to gene expression output. To evaluate the context of functional TF binding we knocked down 59 TFs and chromatin modifiers in one HapMap lymphoblastoid cell line. We then identified genes whose expression was affected by the knockdowns. We intersected the gene expression data with transcription factor binding data (based on ChIP-seq and DNase-seq) within 10 kb of the transcription start sites of expressed genes. This combination of data allowed us to infer functional TF binding. Using this approach, we found that only a small subset of genes bound by a factor were differentially expressed following the knockdown of that factor, suggesting that most interactions between TF and chromatin do not result in measurable changes in gene expression levels of putative target genes. We found that functional TF binding is enriched in regulatory elements that harbor a large number of TF binding sites, at sites with predicted higher binding affinity, and at sites that are enriched in genomic regions annotated as "active enhancers."

    View details for DOI 10.1371/journal.pgen.1004226

    View details for PubMedID 24603674

  • The chromatin architectural proteins HMGD1 and H1 bind reciprocally and have opposite effects on chromatin structure and gene regulation BMC GENOMICS Nalabothula, N., McVicker, G., Maiorano, J., Martin, R., Pritchard, J. K., Fondufe-Mittendorf, Y. N. 2014; 15


    Chromatin architectural proteins interact with nucleosomes to modulate chromatin accessibility and higher-order chromatin structure. While these proteins are almost certainly important for gene regulation they have been studied far less than the core histone proteins.Here we describe the genomic distributions and functional roles of two chromatin architectural proteins: histone H1 and the high mobility group protein HMGD1 in Drosophila S2 cells. Using ChIP-seq, biochemical and gene specific approaches, we find that HMGD1 binds to highly accessible regulatory chromatin and active promoters. In contrast, H1 is primarily associated with heterochromatic regions marked with repressive histone marks. We find that the ratio of HMGD1 to H1 binding is a better predictor of gene activity than either protein by itself, which suggests that reciprocal binding between these proteins is important for gene regulation. Using knockdown experiments, we show that HMGD1 and H1 affect the occupancy of the other protein, change nucleosome repeat length and modulate gene expression.Collectively, our data suggest that dynamic and mutually exclusive binding of H1 and HMGD1 to nucleosomes and their linker sequences may control the fluid chromatin structure that is required for transcriptional regulation. This study provides a framework to further study the interplay between chromatin architectural proteins and epigenetics in gene regulation.

    View details for DOI 10.1186/1471-2164-15-92

    View details for Web of Science ID 000332575900002

    View details for PubMedID 24484546

  • The effect of freeze-thaw cycles on gene expression levels in lymphoblastoid cell lines. PloS one Çaliskan, M., Pritchard, J. K., Ober, C., Gilad, Y. 2014; 9 (9)


    Epstein-Barr virus (EBV) transformed lymphoblastoid cell lines (LCLs) are a widely used renewable resource for functional genomic studies in humans. The ability to accumulate multidimensional data pertaining to the same individual cell lines, from complete genomic sequences to detailed gene regulatory profiles, further enhances the utility of LCLs as a model system. However, the extent to which LCLs are a faithful model system is relatively unknown. We have previously shown that gene expression profiles of newly established LCLs maintain a strong individual component. Here, we extend our study to investigate the effect of freeze-thaw cycles on gene expression patterns in mature LCLs, especially in the context of inter-individual variation in gene expression. We report a profound difference in the gene expression profiles of newly established and mature LCLs. Once newly established LCLs undergo a freeze-thaw cycle, the individual specific gene expression signatures become much less pronounced as the gene expression levels in LCLs from different individuals converge to a more uniform profile, which reflects a mature transformed B cell phenotype. We found that previously identified eQTLs are enriched among the relatively few genes whose regulations in mature LCLs maintain marked individual signatures. We thus conclude that while insight drawn from gene regulatory studies in mature LCLs may generally not be affected by the artificial nature of the LCL model system, many aspects of primary B cell biology cannot be observed and studied in mature LCL cultures.

    View details for DOI 10.1371/journal.pone.0107166

    View details for PubMedID 25192014

  • Epigenetic modifications are associated with inter-species gene expression variation in primates GENOME BIOLOGY Zhou, X., Cain, C. E., Myrthil, M., Lewellen, N., Michelini, K., Davenport, E. R., Stephens, M., Pritchard, J. K., Gilad, Y. 2014; 15 (12)
  • Primate Transcript and Protein Expression Levels Evolve Under Compensatory Selection Pressures SCIENCE Khan, Z., Ford, M. J., Cusanovich, D. A., Mitrano, A., Pritchard, J. K., Gilad, Y. 2013; 342 (6162): 1100-1104


    Changes in gene regulation have likely played an important role in the evolution of primates. Differences in messenger RNA (mRNA) expression levels across primates have often been documented; however, it is not yet known to what extent measurements of divergence in mRNA levels reflect divergence in protein expression levels, which are probably more important in determining phenotypic differences. We used high-resolution, quantitative mass spectrometry to collect protein expression measurements from human, chimpanzee, and rhesus macaque lymphoblastoid cell lines and compared them to transcript expression data from the same samples. We found dozens of genes with significant expression differences between species at the mRNA level yet little or no difference in protein expression. Overall, our data suggest that protein expression levels evolve under stronger evolutionary constraint than mRNA levels.

    View details for DOI 10.1126/science.1242379

    View details for Web of Science ID 000327518600059

    View details for PubMedID 24136357

  • Identification of Genetic Variants That Affect Histone Modifications in Human Cells SCIENCE McVicker, G., van de Geijn, B., Degner, J. F., Cain, C. E., Banovich, N. E., Raj, A., Lewellen, N., Myrthil, M., Gilad, Y., Pritchard, J. K. 2013; 342 (6159): 747-749


    Histone modifications are important markers of function and chromatin state, yet the DNA sequence elements that direct them to specific genomic locations are poorly understood. Here, we identify hundreds of quantitative trait loci, genome-wide, that affect histone modification or RNA polymerase II (Pol II) occupancy in Yoruba lymphoblastoid cell lines (LCLs). In many cases, the same variant is associated with quantitative changes in multiple histone marks and Pol II, as well as in deoxyribonuclease I sensitivity and nucleosome positioning. Transcription factor binding site polymorphisms are correlated overall with differences in local histone modification, and we identify specific transcription factors whose binding leads to histone modification in LCLs. Furthermore, variants that affect chromatin at distal regulatory sites frequently also direct changes in chromatin and gene expression at associated promoters.

    View details for DOI 10.1126/science.1242429

    View details for Web of Science ID 000326647600046

    View details for PubMedID 24136359