Doctor of Philosophy, Ludwig Maximilian Universitat Munchen (2011)
Jin Li, Postdoctoral Faculty Sponsor
Identifying genomic variation is a crucial step for unraveling the relationship between genotype and phenotype and can yield important insights into human diseases. Prevailing methods rely on cost-intensive whole-genome sequencing (WGS) or whole-exome sequencing (WES) approaches while the identification of genomic variants from often existing RNA sequencing (RNA-seq) data remains a challenge because of the intrinsic complexity in the transcriptome. Here, we present a highly accurate approach termed SNPiR to identify SNPs in RNA-seq data. We applied SNPiR to RNA-seq data of samples for which WGS and WES data are also available and achieved high specificity and sensitivity. Of the SNPs called from the RNA-seq data, >98% were also identified by WGS or WES. Over 70% of all expressed coding variants were identified from RNA-seq, and comparable numbers of exonic variants were identified in RNA-seq and WES. Despite our method's limitation in detecting variants in expressed regions only, our results demonstrate that SNPiR outperforms current state-of-the-art approaches for variant detection from RNA-seq data and offers a cost-effective and reliable alternative for SNP discovery.
View details for DOI 10.1016/j.ajhg.2013.08.008
View details for PubMedID 24075185
We show that RNA editing sites can be called with high confidence using RNA sequencing data from multiple samples across either individuals or species, without the need for matched genomic DNA sequence. We identified many previously unidentified editing sites in both humans and Drosophila; our results nearly double the known number of human protein recoding events. We also found that human genes harboring conserved editing sites within Alu repeats are enriched for neuronal functions.
View details for DOI 10.1038/NMETH.2330
View details for Web of Science ID 000314623900018
View details for PubMedID 23291724
We developed a computational framework to robustly identify RNA editing sites using transcriptome and genome deep-sequencing data from the same individual. As compared with previous methods, our approach identified a large number of Alu and non-Alu RNA editing sites with high specificity. We also found that editing of non-Alu sites appears to be dependent on nearby edited Alu sites, possibly through the locally formed double-stranded RNA structure.
View details for DOI 10.1038/NMETH.1982
View details for Web of Science ID 000304778500021
View details for PubMedID 22484847
Li et al. (Research Articles, 1 July 2011, p. 53; published online 19 May 2011) reported widespread differences between the RNA and DNA sequences of the same human cells, including all 12 possible mismatch types. Before accepting such a fundamental claim, a deeper analysis of the sequencing data is required to discern true differences between RNA and DNA from potential artifacts.
View details for DOI 10.1126/science.1210624
View details for Web of Science ID 000301531600026
View details for PubMedID 22422964
Small noncoding RNAs as well as folded RNA structures in genic regions are crucial for many cellular processes. They are involved in posttranscriptional gene regulation (microRNAs), RNA modification (small nucleolar RNAs), regulation of splicing, correct localization of proteins, and many other processes. In most cases, a distinct secondary structure of the molecule is necessary for its correct function. Hence, selection should act to retain the structure of the molecule, although the underlying sequence is allowed to vary. Here, we present the first genome-wide estimates of selective constraints in folded RNA molecules in the nuclear genomes of drosophilids and hominids. In comparison to putatively neutrally evolving sites, we observe substantially reduced rates of substitutions at paired and unpaired sites of folded molecules. We estimated evolutionary constraints to be in the ranges of (0.974,0.991) and (0.895,1.000) for paired nucleotides in drosophilids and hominids, respectively. These values are significantly higher than for constraints at nonsynonymous sites of protein-coding genes in both genera. Nonetheless, valleys of only moderately reduced fitness (s ? 10(-4)) are sufficient to generate the observed fraction of nucleotide changes that are removed by purifying selection. In addition, a comparison of selective coefficients between drosophilids and hominids revealed significantly higher constraints in drosophilids, which can be attributed to the difference in long-term effective population size between these two groups of species. This difference is particularly apparent at the independently evolving (unpaired) sites.
View details for DOI 10.1093/molbev/msq343
View details for Web of Science ID 000288556200017
View details for PubMedID 21172832
The impact of the effective population size (Ne) on the efficacy of selection has been the focus of many theoretical and empirical studies over the recent years. Yet, the effect of Ne on evolution under epistatic fitness interactions is not well understood. In this study, we compare selective constraints at independently evolving (unpaired) and coevolving (paired) sites in orthologous transfer RNAs (tRNA molecules for vertebrate and drosophilid species pairs of different Ne. We show that patterns of nucleotide variation for the two classes of sites are explained well by Kimura's one- and two-locus models of sequence evolution under mutational pressure. We find that constraints in orthologous tRNAs increase with increasing Ne of the investigated species pair. Thereby, the effect of Ne on the efficacy of selection is stronger at unpaired sites than at paired sites. Furthermore, we identify a "core" set of tRNAs with high structural similarity to tRNAs from all major kingdoms of life and a "peripheral" set with lower similarity. We observe that tRNAs in the former set are subject to higher constraints and less prone to the effect of Ne, whereas constraints in tRNAs of the latter set show a large influence of Ne. Finally, we are able to demonstrate that constraints are relaxed in X-linked drosophilid tRNAs compared with autosomal tRNAs and suggest that Ne is responsible for this difference. The observed effects of Ne are consistent with the hypothesis that evolution of most tRNAs is governed by slightly to moderately deleterious mutations (i.e., |Nes|?5).
View details for DOI 10.1093/gbe/evr057
View details for Web of Science ID 000295693200014
View details for PubMedID 21680889
microRNAs (miRNAs) are small non-coding RNAs with fundamental roles in the regulation of gene expression. miRNAs assemble with Argonaute (Ago) proteins to miRNA-protein complexes (miRNPs), which interact with distinct binding sites on mRNAs and regulate gene expression. Specific miRNAs are key regulators of tissue and organ development and it has been shown in mammals that miRNAs are also involved in the pathogenesis of many diseases including cancer. Here, we have characterized the miRNA expression profile of the developing murine genitourinary system. Using a computational approach, we have identified several miRNAs that are specific for the analyzed tissues or the developmental stage. Our comprehensive miRNA expression atlas of the developing genitourinary system forms an invaluable basis for further functional in vivo studies.
View details for DOI 10.1016/j.febslet.2010.09.050
View details for Web of Science ID 000283573100010
View details for PubMedID 20933514
Previous studies have shown that splicing efficiency, and thus maturation of pre-mRNA, depends on the correct folding of the RNA molecule into a secondary or higher order structure. When disrupted by a mutation, aberrant folding may result in a lower splicing efficiency. However, the structure can be restored by a second, compensatory mutation. Here, we present a logistic regression approach to analyze the evolutionary dynamics of RNA secondary structures. We apply our approach to a set of computationally predicted RNA secondary structures in vertebrate introns. Our results are consistent with the hypothesis of a negative influence of the physical distance between pairing nucleotides on the occurrence of covariations, as predicted by Kimura's model of compensatory evolution. We also confirm the hypothesis that longer local secondary structure elements (helices) can accommodate a larger number of covariations, wobbles, and mismatches. Furthermore, we find that wobbles and mismatches are more frequent in the middle of a helix, whereas covariations occur preferentially at the helix ends. The GC content is a major determinant of this pattern.
View details for DOI 10.1093/molbev/msn195
View details for Web of Science ID 000260152700023
View details for PubMedID 18775900