All Publications

  • Impact of the X Chromosome and sex on regulatory variation GENOME RESEARCH Kukurba, K. R., Parsana, P., Balliu, B., Smith, K. S., Zappala, Z., Knowles, D. A., Fave, M., Davis, J. R., Li, X., Zhu, X., Potash, J. B., Weissman, M. M., Shi, J., Kundaje, A., Levinson, D. F., Awadalla, P., Mostafavi, S., Battle, A., Montgomery, S. B. 2016; 26 (6): 768-777


    The X Chromosome, with its unique mode of inheritance, contributes to differences between the sexes at a molecular level, including sex-specific gene expression and sex-specific impact of genetic variation. Improving our understanding of these differences offers to elucidate the molecular mechanisms underlying sex-specific traits and diseases. However, to date, most studies have either ignored the X Chromosome or had insufficient power to test for the sex-specific impact of genetic variation. By analyzing whole blood transcriptomes of 922 individuals, we have conducted the first large-scale, genome-wide analysis of the impact of both sex and genetic variation on patterns of gene expression, including comparison between the X Chromosome and autosomes. We identified a depletion of expression quantitative trait loci (eQTL) on the X Chromosome, especially among genes under high selective constraint. In contrast, we discovered an enrichment of sex-specific regulatory variants on the X Chromosome. To resolve the molecular mechanisms underlying such effects, we generated chromatin accessibility data through ATAC-sequencing to connect sex-specific chromatin accessibility to sex-specific patterns of expression and regulatory variation. As sex-specific regulatory variants discovered in our study can inform sex differences in heritable disease prevalence, we integrated our data with genome-wide association study data for multiple immune traits identifying several traits with significant sex biases in genetic susceptibilities. Together, our study provides genome-wide insight into how genetic variation, the X Chromosome, and sex shape human gene regulation and disease.

    View details for DOI 10.1101/gr.197897.115

    View details for Web of Science ID 000377090400005

    View details for PubMedID 27197214

  • Transcriptome Sequencing of a Large Human Family Identifies the Impact of Rare Noncoding Variants AMERICAN JOURNAL OF HUMAN GENETICS Li, X., Battle, A., Karczewski, K. J., Zappala, Z., Knowles, D. A., Smith, K. S., Kukurba, K. R., Wu, E., Simon, N., Montgomery, S. B. 2014; 95 (3): 245-256


    Recent and rapid human population growth has led to an excess of rare genetic variants that are expected to contribute to an individual's genetic burden of disease risk. To date, much of the focus has been on rare protein-coding variants, for which potential impact can be estimated from the genetic code, but determining the impact of rare noncoding variants has been more challenging. To improve our understanding of such variants, we combined high-quality genome sequencing and RNA sequencing data from a 17-individual, three-generation family to contrast expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) within this family to eQTLs and sQTLs within a population sample. Using this design, we found that eQTLs and sQTLs with large effects in the family were enriched with rare regulatory and splicing variants (minor allele frequency < 0.01). They were also more likely to influence essential genes and genes involved in complex disease. In addition, we tested the capacity of diverse noncoding annotation to predict the impact of rare noncoding variants. We found that distance to the transcription start site, evolutionary constraint, and epigenetic annotation were considerably more informative for predicting the impact of rare variants than for predicting the impact of common variants. These results highlight that rare noncoding variants are important contributors to individual gene-expression profiles and further demonstrate a significant capability for genomic annotation to predict the impact of rare noncoding variants.

    View details for DOI 10.1016/j.ajhg.2014.08.004

    View details for Web of Science ID 000341404100001

  • Path-scan: a reporting tool for identifying clinically actionable variants. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Daneshjou, R., Zappala, Z., Kukurba, K., Boyle, S. M., Ormond, K. E., Klein, T. E., Snyder, M., Bustamante, C. D., Altman, R. B., Montgomery, S. B. 2014; 19: 229-240


    The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.

    View details for PubMedID 24297550

  • GRSDB2 and GRS_UTRdb: databases of quadruplex forming G-rich sequences in pre-mRNAs and mRNAs NUCLEIC ACIDS RESEARCH Kikin, O., Zappala, Z., D'Antonio, L., Bagga, P. S. 2008; 36: D141-D148


    G-quadruplex motifs in the RNA play significant roles in key cellular processes and human disease. While sequences capable of forming G-quadruplexes in the pre-mRNA are involved in regulation of polyadenylation and splicing events in mammalian transcripts, the G-quadruplex motifs in the UTRs may help regulate mRNA expression. GRSDB2 is a second-generation database containing information on the composition and distribution of putative Quadruplex-forming G-Rich Sequences (QGRS) mapped in approximately 29 000 eukaryotic pre-mRNA sequences, many of which are alternatively processed. The data stored in the GRSDB2 is based on computational analysis of NCBI Entrez Gene entries with the help of an improved version of the QGRS Mapper program. The database allows complex queries with a wide variety of parameters, including Gene Ontology terms. The data is displayed in a variety of formats with several additional computational capabilities. We have also developed a new database, GRS_UTRdb, containing information on the composition and distribution patterns of putative QGRS in the 5'- and 3'-UTRs of eukaryotic mRNA sequences. The goal of these experiments has been to build freely accessible resources for exploring the role of G-quadruplex structure in regulation of gene expression at post-transcriptional level. The databases can be accessed at the G-Quadruplex Resource Site at:

    View details for DOI 10.1093/nar/gkm982

    View details for Web of Science ID 000252545400026

    View details for PubMedID 18045785