Honors & Awards

  • Chinese Government Award for Outstanding Self-financed Students Abroad, administered through the Consulate-General of P.R.China in Toronto (Mar. 2011)
  • L.W. Macpherson Microbiology Award, Department of Molecular Genetics, University of Toronto (Sep. 2010)
  • Open Fellowship, University of Toronto (Sep. 2010)
  • Jennifer Dorrington Award, Banting and Best Department of Medical Research, University of Toronto (Jul. 2010)
  • Dr. Roman Pakula Award, Department of Molecular Genetics, University of Toronto (Sep. 2008)
  • Open Fellowship, University of Toronto (Jul. 2008)
  • Yongling Liu Scholarship, Chinese Academy of Sciences (Sep. 2006)
  • Full Scholarship for Complex Systems Summer School, Santa Fe Institute (Aug. 2005)
  • 2nd prize in the National Post-Graduate Mathematical Contest in Modeling, The Contest Committee, China (Sep. 2004)

Professional Education

  • Doctor of Philosophy, University of Toronto (2011)

Stanford Advisors


Journal Articles

  • miRNA regulatory variation in human evolution TRENDS IN GENETICS Li, J., Zhang, Z. 2013; 29 (2): 116-124


    Recent advancements have revealed a complex post-transcriptional regulatory network in humans involving miRNAs. However, the contribution of miRNAs to human evolution, especially interindividual variation associated with miRNAs, is only beginning to be studied. In this article, we illustrate the extent of variation in miRNA-mediated post-transcriptional regulation in humans. Based on evidence from recent studies, we argue that the evolution of post-transcriptional control may be adaptive, and that it not only complements the primary transcriptional regulation by transcription factors (TFs), but also diversifies gene expression phenotypes, thereby generating genetic novelty on which natural selection subsequently acts. Given that current evolutionary analyses and genotype-phenotype mapping are primarily focused on protein-coding genes and TF-mediated regulations, comprehensive examination of post-transcriptional variations should be included in future studies to add a new dimension to understanding of human phenotypic evolution.

    View details for DOI 10.1016/j.tig.2012.10.008

    View details for Web of Science ID 000314744400010

    View details for PubMedID 23128010

  • SH3 interactome conserves general function over specific form. Molecular systems biology Xin, X., Gfeller, D., Cheng, J., Tonikian, R., Sun, L., Guo, A., Lopez, L., Pavlenco, A., Akintobi, A., Zhang, Y., Rual, J., Currell, B., Seshagiri, S., Hao, T., Yang, X., Shen, Y. A., Salehi-Ashtiani, K., Li, J., Cheng, A. T., Bouamalay, D., Lugari, A., Hill, D. E., Grimes, M. L., Drubin, D. G., Grant, B. D., Vidal, M., Boone, C., Sidhu, S. S., Bader, G. D. 2013; 9: 652-?


    Src homology 3 (SH3) domains bind peptides to mediate protein-protein interactions that assemble and regulate dynamic biological processes. We surveyed the repertoire of SH3 binding specificity using peptide phage display in a metazoan, the worm Caenorhabditis elegans, and discovered that it structurally mirrors that of the budding yeast Saccharomyces cerevisiae. We then mapped the worm SH3 interactome using stringent yeast two-hybrid and compared it with the equivalent map for yeast. We found that the worm SH3 interactome resembles the analogous yeast network because it is significantly enriched for proteins with roles in endocytosis. Nevertheless, orthologous SH3 domain-mediated interactions are highly rewired. Our results suggest a model of network evolution where general function of the SH3 domain network is conserved over its specific form.

    View details for DOI 10.1038/msb.2013.9

    View details for PubMedID 23549480

  • Evidence for Positive Selection on a Number of MicroRNA Regulatory Interactions during Recent Human Evolution PLOS GENETICS Li, J., Liu, Y., Xin, X., Kim, T. S., Cabeza, E. A., Ren, J., Nielsen, R., Wrana, J. L., Zhang, Z. 2012; 8 (3)


    MicroRNA (miRNA)-mediated gene regulation is of critical functional importance in animals and is thought to be largely constrained during evolution. However, little is known regarding evolutionary changes of the miRNA network and their role in human evolution. Here we show that a number of miRNA binding sites display high levels of population differentiation in humans and thus are likely targets of local adaptation. In a subset we demonstrate that allelic differences modulate miRNA regulation in mammalian cells, including an interaction between miR-155 and TYRP1, an important melanosomal enzyme associated with human pigmentary differences. We identify alternate alleles of TYRP1 that induce or disrupt miR-155 regulation and demonstrate that these alleles are selected with different modes among human populations, causing a strong negative correlation between the frequency of miR-155 regulation of TYRP1 in human populations and their latitude of residence. We propose that local adaptation of microRNA regulation acts as a rheostat to optimize TYRP1 expression in response to differential UV radiation. Our findings illustrate the evolutionary plasticity of the microRNA regulatory network in recent human evolution.

    View details for DOI 10.1371/journal.pgen.1002578

    View details for Web of Science ID 000302254800053

    View details for PubMedID 22457636

  • PhenoM: a database of morphological phenotypes caused by mutation of essential genes in Saccharomyces cerevisiae NUCLEIC ACIDS RESEARCH Jin, K., Li, J., Vizeacoumar, F. S., Li, Z., Min, R., Zamparo, L., Vizeacoumar, F. J., Datti, A., Andrews, B., Boone, C., Zhang, Z. 2012; 40 (D1): D687-D694

    View details for DOI 10.1093/nar/gkr827

    View details for Web of Science ID 000298601300104

  • Systematic exploration of essential yeast gene function with temperature-sensitive mutants NATURE BIOTECHNOLOGY Li, Z., Vizeacoumar, F. J., Bahr, S., Li, J., Warringer, J., Vizeacoumar, F. S., Min, R., VanderSluis, B., Bellay, J., DeVit, M., Fleming, J. A., Stephens, A., Haase, J., Lin, Z., Baryshnikova, A., Lu, H., Yan, Z., Jin, K., Barker, S., Datti, A., Giaever, G., Nislow, C., Bulawa, C., Myers, C. L., Costanzo, M., Gingras, A., Zhang, Z., Blomberg, A., Bloom, K., Andrews, B., Boone, C. 2011; 29 (4): 361-U105


    Conditional temperature-sensitive (ts) mutations are valuable reagents for studying essential genes in the yeast Saccharomyces cerevisiae. We constructed 787 ts strains, covering 497 (?45%) of the 1,101 essential yeast genes, with ?30% of the genes represented by multiple alleles. All of the alleles are integrated into their native genomic locus in the S288C common reference strain and are linked to a kanMX selectable marker, allowing further genetic manipulation by synthetic genetic array (SGA)-based, high-throughput methods. We show two such manipulations: barcoding of 440 strains, which enables chemical-genetic suppression analysis, and the construction of arrays of strains carrying different fluorescent markers of subcellular structure, which enables quantitative analysis of phenotypes using high-content screening. Quantitative analysis of a GFP-tubulin marker identified roles for cohesin and condensin genes in spindle disassembly. This mutant collection should facilitate a wide range of systematic studies aimed at understanding the functions of essential genes.

    View details for DOI 10.1038/nbt.1832

    View details for Web of Science ID 000289284900023

    View details for PubMedID 21441928

  • The Cellular Robustness by Genetic Redundancy in Budding Yeast PLOS GENETICS Li, J., Yuan, Z., Zhang, Z. 2010; 6 (11)


    The frequent dispensability of duplicated genes in budding yeast is heralded as a hallmark of genetic robustness contributed by genetic redundancy. However, theoretical predictions suggest such backup by redundancy is evolutionarily unstable, and the extent of genetic robustness contributed from redundancy remains controversial. It is anticipated that, to achieve mutual buffering, the duplicated paralogs must at least share some functional overlap. However, counter-intuitively, several recent studies reported little functional redundancy between these buffering duplicates. The large yeast genetic interactions released recently allowed us to address these issues on a genome-wide scale. We herein characterized the synthetic genetic interactions for ?500 pairs of yeast duplicated genes originated from either whole-genome duplication (WGD) or small-scale duplication (SSD) events. We established that functional redundancy between duplicates is a pre-requisite and thus is highly predictive of their backup capacity. This observation was particularly pronounced with the use of a newly introduced metric in scoring functional overlap between paralogs on the basis of gene ontology annotations. Even though mutual buffering was observed to be prevalent among duplicated genes, we showed that the observed backup capacity is largely an evolutionarily transient state. The loss of backup capacity generally follows a neutral mode, with the buffering strength decreasing in proportion to divergence time, and the vast majority of the paralogs have already lost their backup capacity. These observations validated previous theoretic predictions about instability of genetic redundancy. However, departing from the general neutral mode, intriguingly, our analysis revealed the presence of natural selection in stabilizing functional overlap between SSD pairs. These selected pairs, both WGD and SSD, tend to have decelerated functional evolution, have higher propensities of co-clustering into the same protein complexes, and share common interacting partners. Our study revealed the general principles for the long-term retention of genetic redundancy.

    View details for DOI 10.1371/journal.pgen.1001187

    View details for Web of Science ID 000284587100003

    View details for PubMedID 21079672

  • Gene Expression Variability within and between Human Populations and Implications toward Disease Susceptibility PLOS COMPUTATIONAL BIOLOGY Li, J., Liu, Y., Kim, T., Min, R., Zhang, Z. 2010; 6 (8)


    Variations in gene expression level might lead to phenotypic diversity across individuals or populations. Although many human genes are found to have differential mRNA levels between populations, the extent of gene expression that could vary within and between populations largely remains elusive. To investigate the dynamic range of gene expression, we analyzed the expression variability of ?18, 000 human genes across individuals within HapMap populations. Although ?20% of human genes show differentiated mRNA levels between populations, our results show that expression variability of most human genes in one population is not significantly deviant from another population, except for a small fraction that do show substantially higher expression variability in a particular population. By associating expression variability with sequence polymorphism, intriguingly, we found SNPs in the untranslated regions (5' and 3'UTRs) of these variable genes show consistently elevated population heterozygosity. We performed differential expression analysis on a genome-wide scale, and found substantially reduced expression variability for a large number of genes, prohibiting them from being differentially expressed between populations. Functional analysis revealed that genes with the greatest within-population expression variability are significantly enriched for chemokine signaling in HIV-1 infection, and for HIV-interacting proteins that control viral entry, replication, and propagation. This observation combined with the finding that known human HIV host factors show substantially elevated expression variability, collectively suggest that gene expression variability might explain differential HIV susceptibility across individuals.

    View details for DOI 10.1371/journal.pcbi.1000910

    View details for Web of Science ID 000281389500038

    View details for PubMedID 20865155

  • Revisiting the Contribution of cis-Elements to Expression Divergence between Duplicated Genes: The Role of Chromatin Structure MOLECULAR BIOLOGY AND EVOLUTION Li, J., Yuan, Z., Zhang, Z. 2010; 27 (7): 1461-1466


    Although divergence in expression is thought to be a hallmark of functional dispersal between paralogs postduplication, there is currently a limited understanding of the mechanisms underlying the necessary transcriptional alterations as recent studies have suggested that only a very small proportion of expression variation could be explained by transcriptional variation between paralogs. To further this understanding, we examined comprehensively curated regulatory interactions and genomewide nucleosome occupancy in budding yeast to specifically determine the contribution of cis-elements to expression divergence between extant duplicates. We found that divergence in activation by transcription factors plays a more important role in expression divergence of paralogs than previously appreciated; further, analysis of promoter chromatin structure demonstrated that differential nucleosome organization is coupled with divergent expression of paralogs. By incorporating information of cis-elements encoding transcriptional regulation and chromatin structure, we improved the fraction of expression variation that was previously shown to be explained based on known cis-transcriptional effects by approximately 3-fold. Taken together, our analysis highlights the importance of chromatin divergence involved in expression evolution between paralogs.

    View details for DOI 10.1093/molbev/msq041

    View details for Web of Science ID 000279872000001

    View details for PubMedID 20139146

  • Exploiting the determinants of stochastic gene expression in Saccharomyces cerevisiae for genome-wide prediction of expression noise PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Li, J., Min, R., Vizeacoumar, F. J., Jin, K., Xin, X., Zhang, Z. 2010; 107 (23): 10472-10477


    Gene regulation is a process with many steps allowing for stochastic biochemical reactions, which leads to expression noise-i.e., the cell-to-cell stochastic fluctuation in protein abundance. Such expression noise can give rise to drastically diverse phenotypes, even within isogenic cell populations. Although numerous biophysical approaches had been proposed to model the origin and propagation of expression noise in biological networks, these models essentially characterize the innate stochastic dynamics in gene regulation in a mechanistic way. In this work, by investigating expression noise in the context of yeast cellular networks, we place the biophysical formulism onto solid genetic ground. At the sequence level, we show that extremely noisy genes are highly conserved in their coding sequences. At the level of cellular networks where natural selection is manifested by the topological constraints, we show that genes with varying expression noise are modularly organized in the protein interaction network and are positioned orderly in the gene regulatory network. We demonstrate that these topological constraints are highly predictive of stochastic gene expression, with which we were able to confidently predict stochastic expression for more than 2,000 yeast genes whose expression noise was previously not known. We validated the predictions by high-content cell imaging. Our approach makes feasible genome-wide prediction of stochastic gene expression, and such predictability in turn suggests that expression noise is an evolvable genetic trait.

    View details for DOI 10.1073/pnas.0914302107

    View details for Web of Science ID 000278549300028

    View details for PubMedID 20489180

  • Evolution of an X-Linked Primate-Specific Micro RNA Cluster MOLECULAR BIOLOGY AND EVOLUTION Li, J., Liu, Y., Dong, D., Zhang, Z. 2010; 27 (3): 671-683


    Micro RNAs (miRNAs) are a class of small regulatory RNAs, which posttranscriptionally repress protein production of the targeted messenger RNAs (mRNAs). Accumulating evidence has suggested lineage-specific miRNAs have contributed to lineage-specific characteristics. However, the birth and death of these miRNAs, particularly in primates, largely remain unexplored. We herein characterized the evolutionary history of a newly discovered miRNA cluster on primate X-chromosome, spanning a approximately 33-kb region in human Xq27.3. The cluster consists of six distinct miRNAs, four of which are compactly organized in a 3-kb region belonging to a phylogenetic group distinct from the other two miRNAs. By comparing the genomic structure of this cluster in human with four other primates (chimpanzee, orangutan, rhesus macaque, and marmoset), we identified several previously uncovered miRNAs in these primates that share orthology with the human miRNAs. We found the entire miRNA cluster was well conserved among primate species but unidentifiable in other mammalian species (including mouse, rat, cat, dog, horse, cow, opossum, and platypus), suggesting that the formation of this cluster was after the primate-rodent split but before the emergence of New-World Monkey (represented by marmoset). Our analysis further revealed complex evolutionary dynamics on this locus, characterized by extensive duplication events. Phylogenetic analysis revealed birth and death of the miRNAs within this region, accompanied by rapid evolution, which highlighted their functional importance. These miRNAs are primarily expressed in primate epididymis, part of the male reproductive system. Our analysis showed that their predicted target mRNAs are significantly enriched for several functional classes relevant to epididymal physiology, such as morphogenesis of epithelium and tube development. Furthermore, several genes controlling sperm maturation and male fertility are confidently predicted to be their targets. Collectively, we argue these miRNAs might play an important role in epididymal morphogenesis and sperm maturation and in establishing primate-specific epididymal characteristics.

    View details for DOI 10.1093/molbev/msp284

    View details for Web of Science ID 000274786900016

    View details for PubMedID 19933172

  • Integrating high-throughput genetic interaction mapping and high-content screening to explore yeast spindle morphogenesis JOURNAL OF CELL BIOLOGY Vizeacoumar, F. J., van Dyk, N., Vizeacoumar, F. S., Cheung, V., Li, J., Sydorskyy, Y., Case, N., Li, Z., Datti, A., Nislow, C., Raught, B., Zhang, Z., Frey, B., Bloom, K., Boone, C., Andrews, B. J. 2010; 188 (1): 69-81


    We describe the application of a novel screening approach that combines automated yeast genetics, synthetic genetic array (SGA) analysis, and a high-content screening (HCS) system to examine mitotic spindle morphogenesis. We measured numerous spindle and cellular morphological parameters in thousands of single mutants and corresponding sensitized double mutants lacking genes known to be involved in spindle function. We focused on a subset of genes that appear to define a highly conserved mitotic spindle disassembly pathway, which is known to involve Ipl1p, the yeast aurora B kinase, as well as the cell cycle regulatory networks mitotic exit network (MEN) and fourteen early anaphase release (FEAR). We also dissected the function of the kinetochore protein Mcm21p, showing that sumoylation of Mcm21p regulates the enrichment of Ipl1p and other chromosomal passenger proteins to the spindle midzone to mediate spindle disassembly. Although we focused on spindle disassembly in a proof-of-principle study, our integrated HCS-SGA method can be applied to virtually any pathway, making it a powerful means for identifying specific cellular functions.

    View details for DOI 10.1083/jcb.200909013

    View details for Web of Science ID 000273507300009

    View details for PubMedID 20065090

  • A probabilistic framework to improve microrna target prediction by incorporating proteomics data. Journal of bioinformatics and computational biology Li, J., Min, R., Bonner, A., Zhang, Z. 2009; 7 (6): 955-972


    Due to the difficulties in identifying microRNA (miRNA) targets experimentally in a high-throughput manner, several computational approaches have been proposed. To this date, most leading algorithms are based on sequence information alone. However, there has been limited overlap between these predictions, implying high false-positive rates, which underlines the limitation of sequence-based approaches. Considering the repressive nature of miRNAs at the mRNA translational level, here we describe a probabilistic model to make predictions by combining sequence complementarity, miRNA expression level, and protein abundance. Our underlying assumption is that, given sequence complementarity between a miRNA and its putative mRNA targets, the miRNA expression level should be high and the protein abundance of the mRNA should be low. Having identified a set of confident predictions, we then built a second probabilistic model to trace back to the mRNA expression of the confident targets to investigate the mechanisms of the miRNA-mediated post-transcriptional regulation. Our results suggest that translational repression (which has no effect on mRNA level), instead of mRNA degradation, is the dominant mechanism in miRNA regulation. This observation explained the previously observed discordant correlation between mRNA expression and protein abundance.

    View details for PubMedID 20014473

  • Learned Random-Walk Kernels and Empirical-Map Kernels for Protein Sequence Classification JOURNAL OF COMPUTATIONAL BIOLOGY Min, R., Bonner, A., Li, J., Zhang, Z. 2009; 16 (3): 457-474


    Biological sequence classification (such as protein remote homology detection) solely based on sequence data is an important problem in computational biology, especially in the current genomics era, when large amount of sequence data are becoming available. Support vector machines (SVMs) based on mismatch string kernels were previously applied to solve this problem, achieving reasonable success. However, they still perform poorly on difficult protein families. In this paper, we propose two approaches to solve the protein remote homology detection problem: one uses a convex combination of random-walk kernels to approximate the random-walk kernel with the optimal random steps, and the other constructs an empirical-map kernel using a profile kernel. Both resulting kernels make use of a large number of pairwise sequence similarity information and unlabeled data; and have much better prediction performance than the best profile kernel directly derived from protein sequences. On a competitive Structural Classification Of Proteins (SCOP) benchmark dataset, the overall mean ROC(50) scores on 54 protein families we obtained using both approaches are above 0.90, which significantly outperform previous published results.

    View details for DOI 10.1089/cmb.2008.0031

    View details for Web of Science ID 000263770200004

    View details for PubMedID 19254184

  • Preferential regulation of duplicated genes by microRNAs in mammals GENOME BIOLOGY Li, J., Musso, G., Zhang, Z. 2008; 9 (8)


    Although recent advances have been made in identifying and analyzing instances of microRNA-mediated gene regulation, it remains unclear by what mechanisms attenuation of transcript expression through microRNAs becomes an integral part of post-transcriptional modification, and it is even less clear to what extent this process occurs for mammalian gene duplicates (paralogs). Specifically, while mammalian paralogs are known to overcome their initial complete functional redundancy through variation in regulation and expression, the potential involvement of microRNAs in this process has not been investigated.We comprehensively investigated the impact of microRNA-mediated post-transcriptional regulation on duplicated genes in human and mouse. Using predicted targets derived from several analysis methods, we report the following observations: microRNA targets are significantly enriched for duplicate genes, implying their roles in the differential regulation of paralogs; on average, duplicate microRNA target genes have longer 3' untranslated regions than singleton targets, and are regulated by more microRNA species, suggesting a more sophisticated mode of regulation; ancient duplicates were more likely to be regulated by microRNAs and, on average, have greater expression divergence than recent duplicates; and ancient duplicate genes share fewer ancestral microRNA regulators, and recent duplicate genes share more common regulating microRNAs.Collectively, these results demonstrate that microRNAs comprise an important element in evolving the regulatory patterns of mammalian paralogs. We further present an evolutionary model in which microRNAs not only adjust imbalanced dosage effects created by gene duplication, but also help maintain long-term buffering of the phenotypic consequences of gene deletion or ablation.

    View details for DOI 10.1186/gb-2008-9-8-r132

    View details for Web of Science ID 000259701400017

    View details for PubMedID 18727826

  • Identifying protein-protein interfacial residues in heterocomplexes using residue conservation scores INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES Li, J., Huang, D., Wang, B., Chen, P. 2006; 38 (3-5): 241-247


    Identifying protein-protein interfaces is crucial for structural biology. Because of the constraints in wet experiments, many computational methods have been proposed. Without knowing any information about the partner chains, a new method of predicting protein-protein interaction interface residues purely based on evolutionary information in heterocomplexes is proposed here. Unlike traditional approaches using multiple sequence alignment profiles to represent the conservation level for each residue, we make predictions based on the concept of residue conservation scores so that the dimension of the feature vector for each residue can be drastically reduced, at least 20 times less than conventional methods. Based on the representation approach, a simple linear discriminant function is used to make predictions, so the computational complexity of the whole prediction procedure can also be greatly decreased. By testing our approach on 69 heterocomplex chains, experimental results demonstrate the performance of our approach is indeed superior to current existing methods.

    View details for DOI 10.1016/j.ijbiomac.2006.02.024

    View details for Web of Science ID 000238334000012

    View details for PubMedID 16600360

  • Predicting protein interaction sites from residue spatial sequence profile and evolution rate FEBS LETTERS Wang, B., Chen, P., Huang, D. S., Li, J. J., Lok, T. M., Lyu, M. R. 2006; 580 (2): 380-384


    This paper proposes a novel method that can predict protein interaction sites in heterocomplexes using residue spatial sequence profile and evolution rate approaches. The former represents the information of multiple sequence alignments while the latter corresponds to a residue's evolutionary conservation score based on a phylogenetic tree. Three predictors using a support vector machines algorithm are constructed to predict whether a surface residue is a part of a protein-protein interface. The efficiency and the effectiveness of our proposed approach is verified by its better prediction performance compared with other models. The study is based on a non-redundant data set of heterodimers consisting of 69 protein chains.

    View details for DOI 10.1016/j.febslet.2005.11.081

    View details for Web of Science ID 000234937400006

    View details for PubMedID 16376878

  • Network analysis of the protein chain tertiary structures of heterocomplexes PROTEIN AND PEPTIDE LETTERS Li, J. J., Huang, D. S., Lok, T. M., Lyu, M. R., Li, Y. X., Zhu, Y. P. 2006; 13 (4): 391-396


    In this paper, the tertiary structures of protein chains of heterocomplexes were mapped to 2D networks; based on the mapping approach, statistical properties of these networks were systematically studied. Firstly, our experimental results confirmed that the networks derived from protein structures possess small-world properties. Secondly, an interesting relationship between network average degree and the network size was discovered, which was quantified as an empirical function enabling us to estimate the number of residue contacts of the protein chains accurately. Thirdly, by analyzing the average clustering coefficient for nodes having the same degree in the network, it was found that the architectures of the networks and protein structures analyzed are hierarchically organized. Finally, network motifs were detected in the networks which are believed to determine the family or superfamily the networks belong to. The study of protein structures with the new perspective might shed some light on understanding the underlying laws of evolution, function and structures of proteins, and therefore would be complementary to other currently existing methods.

    View details for Web of Science ID 000237306400012

    View details for PubMedID 16712516

  • Characterizing human gene splice sites using evolved regular expressions Proceedings of IEEE International Joint Conference on Neural Network Jingjing Li, Huang, D., MacCallum, R., Wu, X.-R. 2005; 1: 493-498

Stanford Medicine Resources: