Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants.
American journal of human genetics
2014; 95 (3): 245-256
Recent and rapid human population growth has led to an excess of rare genetic variants that are expected to contribute to an individual's genetic burden of disease risk. To date, much of the focus has been on rare protein-coding variants, for which potential impact can be estimated from the genetic code, but determining the impact of rare noncoding variants has been more challenging. To improve our understanding of such variants, we combined high-quality genome sequencing and RNA sequencing data from a 17-individual, three-generation family to contrast expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) within this family to eQTLs and sQTLs within a population sample. Using this design, we found that eQTLs and sQTLs with large effects in the family were enriched with rare regulatory and splicing variants (minor allele frequency < 0.01). They were also more likely to influence essential genes and genes involved in complex disease. In addition, we tested the capacity of diverse noncoding annotation to predict the impact of rare noncoding variants. We found that distance to the transcription start site, evolutionary constraint, and epigenetic annotation were considerably more informative for predicting the impact of rare variants than for predicting the impact of common variants. These results highlight that rare noncoding variants are important contributors to individual gene-expression profiles and further demonstrate a significant capability for genomic annotation to predict the impact of rare noncoding variants.
View details for DOI 10.1016/j.ajhg.2014.08.004
View details for PubMedID 25192044
Path-scan: a reporting tool for identifying clinically actionable variants.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2014; 19: 229-240
The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.
View details for PubMedID 24297550
GRSDB2 and GRS_UTRdb: databases of quadruplex forming G-rich sequences in pre-mRNAs and mRNAs
NUCLEIC ACIDS RESEARCH
2008; 36: D141-D148
G-quadruplex motifs in the RNA play significant roles in key cellular processes and human disease. While sequences capable of forming G-quadruplexes in the pre-mRNA are involved in regulation of polyadenylation and splicing events in mammalian transcripts, the G-quadruplex motifs in the UTRs may help regulate mRNA expression. GRSDB2 is a second-generation database containing information on the composition and distribution of putative Quadruplex-forming G-Rich Sequences (QGRS) mapped in approximately 29 000 eukaryotic pre-mRNA sequences, many of which are alternatively processed. The data stored in the GRSDB2 is based on computational analysis of NCBI Entrez Gene entries with the help of an improved version of the QGRS Mapper program. The database allows complex queries with a wide variety of parameters, including Gene Ontology terms. The data is displayed in a variety of formats with several additional computational capabilities. We have also developed a new database, GRS_UTRdb, containing information on the composition and distribution patterns of putative QGRS in the 5'- and 3'-UTRs of eukaryotic mRNA sequences. The goal of these experiments has been to build freely accessible resources for exploring the role of G-quadruplex structure in regulation of gene expression at post-transcriptional level. The databases can be accessed at the G-Quadruplex Resource Site at: http://bioinformatics.ramapo.edu/GQRS/.
View details for DOI 10.1093/nar/gkm982
View details for Web of Science ID 000252545400026
View details for PubMedID 18045785