The Bejerano Lab studies genome function in human and related species. We are deeply interested in the following broad questions: Mapping genome sequence (variation) to phenotype (differences) and extracting specific genetic insights from deep sequencing measurements. We take a particular interest in gene cis regulation. We use our joint affiliation to apply a combination of computational and experimental approaches. We collect large scale experimental data; write computational analysis tools; run them massively to discover the most exciting testable hypotheses; which we proceed to experimentally validate. We work in small teams, in house or with close collaborators of experimentalists and computational tool users who interact directly with our computational tool builders. Please see our research tab for more.

Administrative Appointments

  • Member, Editorial Board, Gene (2007 - 2008)
  • Technical Advisory Board, Numenta (2008 - Present)

Honors & Awards

  • Rector Prize & Dean’s list for undergraduate achievements., Hebrew University (1993-1996)
  • Intel award for achievements., Hebrew University (1996)
  • Rector Prize & Dean’s list for graduate studies achievements., Hebrew University (1997-1999)
  • Rachel & Salim Banin scholarship., Hebrew University (1999)
  • Best paper by a young scientist award., RECOMB conference (1999)
  • Levi Eshkol graduate studies fellowship., Hebrew University (1999-2002)
  • Best paper by a young scientist award., RECOMB conference (2003)
  • Junior Faculty Grant, Edward Mallinckrodt, Jr. Foundation (2007-2010)
  • Tomorrow's Principal Investigator, Genome Technology Magazine (2008)
  • Alfred P. Sloan Fellow, Alfred P. Sloan Foundation (2008-2010)
  • Young Investigator Award, Human Frontier Science Program (2008-2011)
  • Searle Scholar, Searle Scholars Program (2008-2011)
  • Research Grant Award, Okawa Foundation (2008)
  • Fellow, David & Lucile Packard Foundation (2008-2013)
  • New Faculty Fellow, Microsoft Research (2009)

Professional Education

  • Ph.D., Hebrew University, Computer Science (2004)
  • B.Sc., Hebrew University, Physics, Mathematics, Computer Science (summa cum laude) (1997)

Research & Scholarship

Current Research and Scholarly Interests

The Bejerano Lab is currently focused on the following topics:

1. Genotype - Phenotype relationships in humans.
We are developing novel methods for linking human whole genome variation with human disease and trait variation. We apply these methods to multiple datasets in the contexts of prematurity, autism, heart disease and more [20, 29, 32, 34, 36, 38, 39, 43].

2. Genotype - Phenotype relationships between mammals.
We develop novel methods to link trait evolution in the mammalian tree to whole genome evolution in over a hundred species. Application of these methods allow us to shed new light on human genome function, on human disease and on human evolution [29, 34, 35]. See our "Forward Genomics" web server.

3. Extracting genetic knowledge from high throughput genomic assays.
High throughput genomic assays are most often used to make biochemical discoveries. We develop methods to extract genetic and developmental knowledge from these assays [27, 28, 31]. Through joint work with Sue McConnell we take special interest in the developing neocortex [29, 41]. Also see our popular GREAT web server for the cis-regulatory interpretation of high throughput genomic datasets.

4. Vertebrate transcription regulation.
Much of our work relies on our strong foundations in the study of vertebrate gene regulation [9-11, 14, 15, 18, 22, 25, 27, 29-33, 35, 38-42]. See our PRISM resource of predicted transcription factor functions and COMPLEX resource for predicted transcription factor dimers and complexes. Also see our zCNE resource of conserved non-coding (likely gene regulatory) sequences in the zebrafish genome.

5. Vertebrate genome evolution.
We are extremely well versed in human and vertebrate genome evolution [9-11, 14, 17, 18, 22, 23, 25, 26, 29, 33-35, 37, 39, 40]. Notably, we discovered ultraconservation and correctly postulated that many of these elements are developmental enhancers. We also showed that mammalian ultraconserved elements evolve under extreme purifying selection, and that they are almost never lost during mammalian evolution [9, 23, 25]. We also discovered the first developmental enhancers conserved between human and protostomes [33], attempted to group human conserved non-coding DNA into paralog families [10], and studied the co-option of mobile elements into cis-regulatory roles [18, 22, 26, 41].

6. Evolutionary Developmental Biology ("evo devo").
We have done work in the field of evolutionary developmental biology [29, 33-35, 43], including a first survey of developmental enhancers (including a penile spine/vibrissae enhancer) uniquely lost in humans [29], fueled by our deep interest in phenotype - genotype relationships.

[For links to the references and more, pleae see our lab's website]


2017-18 Courses

Stanford Advisees

Graduate and Fellowship Programs


All Publications

  • Chitayat syndrome: hyperphalangism, characteristic facies, hallux valgus and bronchomalacia results from a recurrent c.266A > G p.(Tyr89Cys) variant in the ERF gene JOURNAL OF MEDICAL GENETICS Balasubramanian, M., Lord, H., Levesque, S., Guturu, H., Thuriot, F., Sillon, G., Wenger, A. M., Sureka, D. L., Lester, T., Johnson, D. S., Bowen, J., Calhoun, A. R., Viskochil, D. H., Bejerano, G., Bernstein, J. A., Chitayat, D. 2017; 54 (3): 157-165
  • Systematic reanalysis of clinical exome data yields additional diagnoses: implications for providers GENETICS IN MEDICINE Wenger, A. M., Guturu, H., Bernstein, J. A., Bejerano, G. 2017; 19 (2): 209-214
  • M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nature genetics Jagadeesh, K. A., Wenger, A. M., Berger, M. J., Guturu, H., Stenson, P. D., Cooper, D. N., Bernstein, J. A., Bejerano, G. 2016


    Variant pathogenicity classifiers such as SIFT, PolyPhen-2, CADD, and MetaLR assist in interpretation of the hundreds of rare, missense variants in the typical patient genome by deprioritizing some variants as likely benign. These widely used methods misclassify 26 to 38% of known pathogenic mutations, which could lead to missed diagnoses if the classifiers are trusted as definitive in a clinical setting. We developed M-CAP, a clinical pathogenicity classifier that outperforms existing methods at all thresholds and correctly dismisses 60% of rare, missense variants of uncertain significance in a typical genome at 95% sensitivity.

    View details for DOI 10.1038/ng.3703

    View details for PubMedID 27776117

  • TBR1 regulates autism risk genes in the developing neocortex. Genome research Notwell, J. H., Heavner, W. E., Darbandi, S. F., Katzman, S., McKenna, W. L., Ortiz-Londono, C. F., Tastad, D., Eckler, M. J., Rubenstein, J. L., McConnell, S. K., Chen, B., Bejerano, G. 2016; 26 (8): 1013-1022


    Exome sequencing studies have identified multiple genes harboring de novo loss-of-function (LoF) variants in individuals with autism spectrum disorders (ASD), including TBR1, a master regulator of cortical development. We performed ChIP-seq for TBR1 during mouse cortical neurogenesis and show that TBR1-bound regions are enriched adjacent to ASD genes. ASD genes were also enriched among genes that are differentially expressed in Tbr1 knockouts, which together with the ChIP-seq data, suggests direct transcriptional regulation. Of the nine ASD genes examined, seven were misexpressed in the cortices of Tbr1 knockout mice, including six with increased expression in the deep cortical layers. ASD genes with adjacent cortical TBR1 ChIP-seq peaks also showed unusually low levels of LoF mutations in a reference human population and among Icelanders. We then leveraged TBR1 binding to identify an appealing subset of candidate ASD genes. Our findings highlight a TBR1-regulated network of ASD genes in the developing neocortex that are relatively intolerant to LoF mutations, indicating that these genes may play critical roles in normal cortical development.

    View details for DOI 10.1101/gr.203612.115

    View details for PubMedID 27325115

  • Systematic reanalysis of clinical exome data yields additional diagnoses: implications for providers. Genetics in medicine Wenger, A. M., Guturu, H., Bernstein, J. A., Bejerano, G. 2016


    Clinical exome sequencing is nondiagnostic for about 75% of patients evaluated for a possible Mendelian disorder. We examined the ability of systematic reevaluation of exome data to establish additional diagnoses.The exome and phenotypic data of 40 individuals with previously nondiagnostic clinical exomes were reanalyzed with current software and literature.A definitive diagnosis was identified for 4 of 40 participants (10%). In these cases the causative variant is de novo and in a relevant autosomal-dominant disease gene. The literature to tie the causative genes to the participants' phenotypes was weak, nonexistent, or not readily located at the time of the initial clinical exome reports. At the time of diagnosis by reanalysis, the supporting literature was 1 to 3 years old.Approximately 250 gene-disease and 9,200 variant-disease associations are reported annually. This increase in information necessitates regular reevaluation of nondiagnostic exomes. To be practical, systematic reanalysis requires further automation and more up-to-date variant databases. To maximize the diagnostic yield of exome sequencing, providers should periodically request reanalysis of nondiagnostic exomes. Accordingly, policies regarding reanalysis should be weighed in combination with factors such as cost and turnaround time when selecting a clinical exome laboratory.Genet Med advance online publication 21 July 2016Genetics in Medicine (2016); doi:10.1038/gim.2016.88.

    View details for DOI 10.1038/gim.2016.88

    View details for PubMedID 27441994

  • "Reverse Genomics" Predicts Function of Human Conserved Noncoding Elements MOLECULAR BIOLOGY AND EVOLUTION Marcovitz, A., Jia, R., Bejerano, G. 2016; 33 (5): 1358-1369


    Evolutionary changes in cis-regulatory elements are thought to play a key role in morphological and physiological diversity across animals. Many conserved noncoding elements (CNEs) function as cis-regulatory elements, controlling gene expression levels in different biological contexts. However, determining specific associations between CNEs and related phenotypes is a challenging task. Here, we present a computational "reverse genomics" approach that predicts the phenotypic functions of human CNEs. We identify thousands of human CNEs that were lost in at least two independent mammalian lineages (IL-CNEs), and match their evolutionary profiles against a diverse set of phenotypes recently annotated across multiple mammalian species. We identify 2,759 compelling associations between human CNEs and a diverse set of mammalian phenotypes. We discuss multiple CNEs, including a predicted ear element near BMP7, a pelvic CNE in FBN1, a brain morphology element in UBE4B, and an aquatic adaptation forelimb CNE near EGR2, and provide a full list of our predictions. As more genomes are sequenced and more traits are annotated across species, we expect our method to facilitate the interpretation of noncoding mutations in human disease and expedite the discovery of individual CNEs that play key roles in human evolution and development.

    View details for DOI 10.1093/molbev/msw001

    View details for Web of Science ID 000374834900019

    View details for PubMedID 26744417

  • Erosion of Conserved Binding Sites in Personal Genomes Points to Medical Histories. PLoS computational biology Guturu, H., Chinchali, S., Clarke, S. L., Bejerano, G. 2016; 12 (2)


    Although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing variants, raising the question of the existence of alternate processes to identify disease mutations. To address this question, we collect ancestral transcription factor binding sites disrupted by an individual's variants and then look for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is invariably reflective of their very different medical histories. For example, our method implicates "abnormal cardiac output" for a patient with a longstanding family history of heart disease, "decreased circulating sodium level" for an individual with hypertension, and other biologically appealing links for medical histories spanning narcolepsy to axonal neuropathy. Our results suggest that erosion of gene regulation by mutation load significantly contributes to observed heritable phenotypes that manifest in the medical history. The test we developed exposes a hitherto hidden layer of personal variants that promise to shed new light on human disease penetrance, expressivity and the sensitivity with which we can detect them.

    View details for DOI 10.1371/journal.pcbi.1004711

    View details for PubMedID 26845687

  • Changes in the enhancer landscape during early placental development uncover a trophoblast invasion gene-enhancer network. Placenta Tuteja, G., Chung, T., Bejerano, G. 2016; 37: 45-55


    Trophoblast invasion establishes adequate blood flow between mother and fetus in early placental development. However, little is known about the cis-regulatory mechanisms underlying this important process. We aimed to identify enhancer elements that are active during trophoblast invasion, and build a trophoblast invasion gene-enhancer network.We carried out ChIP-Seq for an enhancer-associated mark (H3k27Ac) at two time points during early placental development in mouse. One time point when invasion is at its peak (e7.5) and another time point shortly afterwards (e9.5). We use computational analysis to identify putative enhancers, as well as the transcription factor binding sites within them, that are specific to the time point of trophoblast invasion.We compared read profiles at e7.5 and e9.5 to identify 1,977 e7.5-specific enhancers. Within a subset of e7.5-specific enhancers, we discovered a cell migration associated regulatory code, consisting of three transcription factor motifs: AP1, Ets, and Tcfap2. To validate differential expression of the transcription factors that bind these motifs, we performed RNA-Seq in the same context. Finally, we integrated these data with publicly available protein-protein interaction data and constructed a trophoblast invasion gene-enhancer network.The data we generated and analysis we carried out improves our understanding of the regulatory mechanisms of trophoblast invasion, by suggesting a transcriptional code exists in the enhancers of cell migration genes. Furthermore, the network we constructed highlights novel candidate genes that may be critical for trophoblast invasion.

    View details for DOI 10.1016/j.placenta.2015.11.001

    View details for PubMedID 26604129

  • Mx1 and Mx2 key antiviral proteins are surprisingly lost in toothed whales PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Braun, B. A., Marcovitz, A., Camp, J. G., Jia, R., Bejerano, G. 2015; 112 (26): 8036-8040


    Viral outbreaks in dolphins and other Delphinoidea family members warrant investigation into the integrity of the cetacean immune system. The dynamin-like GTPase genes Myxovirus 1 (Mx1) and Mx2 defend mammals against a broad range of viral infections. Loss of Mx1 function in human and mice enhances infectivity by multiple RNA and DNA viruses, including orthomyxoviruses (influenza A), paramyxoviruses (measles), and hepadnaviruses (hepatitis B), whereas loss of Mx2 function leads to decreased resistance to HIV-1 and other viruses. Here we show that both Mx1 and Mx2 have been rendered nonfunctional in Odontoceti cetaceans (toothed whales, including dolphins and orcas). We discovered multiple exon deletions, frameshift mutations, premature stop codons, and transcriptional evidence of decay in the coding sequence of both Mx1 and Mx2 in four species of Odontocetes. We trace the likely loss event for both proteins to soon after the divergence of Odontocetes and Mystocetes (baleen whales) ∼33-37 Mya. Our data raise intriguing questions as to what drove the loss of both Mx1 and Mx2 genes in the Odontoceti lineage, a double loss seen in none of 56 other mammalian genomes, and suggests a hitherto unappreciated fundamental genetic difference in the way these magnificent mammals respond to viral infections.

    View details for DOI 10.1073/pnas.1501844112

    View details for Web of Science ID 000357079400051

    View details for PubMedID 26080416

  • Characterization of TCF21 Downstream Target Regions Identifies a Transcriptional Network Linking Multiple Independent Coronary Artery Disease Loci PLOS GENETICS Sazonova, O., Zhao, Y., Nuernberg, S., Miller, C., Pjanic, M., Castano, V. G., Kim, J. B., Salfati, E. L., Kundaje, A. B., Bejerano, G., Assimes, T., Yang, X., Quertermous, T. 2015; 11 (5)

    View details for DOI 10.1371/journal.pgen.1005202

    View details for Web of Science ID 000355305200022

    View details for PubMedID 26020271

  • A family of transposable elements co-opted into developmental enhancers in the mouse neocortex NATURE COMMUNICATIONS Notwell, J. H., Chung, T., Heavner, W., Bejerano, G. 2015; 6

    View details for DOI 10.1038/ncomms7644

    View details for Web of Science ID 000353040900001

    View details for PubMedID 25806706

  • A family of transposable elements co-opted into developmental enhancers in the mouse neocortex. Nature communications Notwell, J. H., Chung, T., Heavner, W., Bejerano, G. 2015; 6: 6644-?


    The neocortex is a mammalian-specific structure that is responsible for higher functions such as cognition, emotion and perception. To gain insight into its evolution and the gene regulatory codes that pattern it, we studied the overlap of its active developmental enhancers with transposable element (TE) families and compared this overlap to uniformly shuffled enhancers. Here we show a striking enrichment of the MER130 repeat family among active enhancers in the mouse dorsal cerebral wall, which gives rise to the neocortex, at embryonic day 14.5. We show that MER130 instances preserve a common code of transcriptional regulatory logic, function as enhancers and are adjacent to critical neocortical genes. MER130, a nonautonomous interspersed TE, originates in the tetrapod or possibly Sarcopterygii ancestor, which far predates the appearance of the neocortex. Our results show that MER130 elements were recruited, likely through their common regulatory logic, as neocortical enhancers.

    View details for DOI 10.1038/ncomms7644

    View details for PubMedID 25806706

  • Microbiota modulate transcription in the intestinal epithelium without remodeling the accessible chromatin landscape. Genome research Camp, J. G., Frank, C. L., Lickwar, C. R., Guturu, H., Rube, T., Wenger, A. M., Chen, J., Bejerano, G., Crawford, G. E., Rawls, J. F. 2014; 24 (9): 1504-1516


    Microbiota regulate intestinal physiology by modifying host gene expression along the length of the intestine, but the underlying regulatory mechanisms remain unresolved. Transcriptional specificity occurs through interactions between transcription factors (TFs) and cis-regulatory regions (CRRs) characterized by nucleosome-depleted accessible chromatin. We profiled transcriptome and accessible chromatin landscapes in intestinal epithelial cells (IECs) from mice reared in the presence or absence of microbiota. We show that regional differences in gene transcription along the intestinal tract were accompanied by major alterations in chromatin accessibility. Surprisingly, we discovered that microbiota modify host gene transcription in IECs without significantly impacting the accessible chromatin landscape. Instead, microbiota regulation of host gene transcription might be achieved by differential expression of specific TFs and enrichment of their binding sites in nucleosome-depleted CRRs near target genes. Our results suggest that the chromatin landscape in IECs is preprogrammed by the host in a region-specific manner to permit responses to microbiota through binding of open CRRs by specific TFs.

    View details for DOI 10.1101/gr.165845.113

    View details for PubMedID 24963153

  • Automated discovery of tissue-targeting enhancers and transcription factors from binding motif and gene function data. PLoS computational biology Tuteja, G., Moreira, K. B., Chung, T., Chen, J., Wenger, A. M., Bejerano, G. 2014; 10 (1)


    Identifying enhancers regulating gene expression remains an important and challenging task. While recent sequencing-based methods provide epigenomic characteristics that correlate well with enhancer activity, it remains onerous to comprehensively identify all enhancers across development. Here we introduce a computational framework to identify tissue-specific enhancers evolving under purifying selection. First, we incorporate high-confidence binding site predictions with target gene functional enrichment analysis to identify transcription factors (TFs) likely functioning in a particular context. We then search the genome for clusters of binding sites for these TFs, overcoming previous constraints associated with biased manual curation of TFs or enhancers. Applying our method to the placenta, we find 33 known and implicate 17 novel TFs in placental function, and discover 2,216 putative placenta enhancers. Using luciferase reporter assays, 31/36 (86%) tested candidates drive activity in placental cells. Our predictions agree well with recent epigenomic data in human and mouse, yet over half our loci, including 7/8 (87%) tested regions, are novel. Finally, we establish that our method is generalizable by applying it to 5 additional tissues: heart, pancreas, blood vessel, bone marrow, and liver.

    View details for DOI 10.1371/journal.pcbi.1003449

    View details for PubMedID 24499934

  • Automated discovery of tissue-targeting enhancers and transcription factors from binding motif and gene function data. PLoS computational biology Tuteja, G., Moreira, K. B., Chung, T., Chen, J., Wenger, A. M., Bejerano, G. 2014; 10 (1)

    View details for DOI 10.1371/journal.pcbi.1003449

    View details for PubMedID 24499934

  • Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements. Philosophical transactions of the Royal Society of London. Series B, Biological sciences Guturu, H., Doxey, A. C., Wenger, A. M., Bejerano, G. 2013; 368 (1632): 20130029-?


    Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis-regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF-TF-DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein-protein interactions, potentially indirect interactions and 'through-DNA' interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at

    View details for DOI 10.1098/rstb.2013.0029

    View details for PubMedID 24218641

  • A Penile Spine/Vibrissa Enhancer Sequence Is Missing in Modern and Extinct Humans but Is Retained in Multiple Primates with Penile Spines and Sensory Vibrissae PLOS ONE Reno, P. L., McLean, C. Y., Hines, J. E., Capellini, T. D., Bejerano, G., Kingsley, D. M. 2013; 8 (12)

    View details for DOI 10.1371/journal.pone.0084258

    View details for Web of Science ID 000328741900040

    View details for PubMedID 24367647

  • Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish. Nucleic acids research Hiller, M., Agarwal, S., Notwell, J. H., Parikh, R., Guturu, H., Wenger, A. M., Bejerano, G. 2013; 41 (15)


    Many important model organisms for biomedical and evolutionary research have sequenced genomes, but occupy a phylogenetically isolated position, evolutionarily distant from other sequenced genomes. This phylogenetic isolation is exemplified for zebrafish, a vertebrate model for cis-regulation, development and human disease, whose evolutionary distance to all other currently sequenced fish exceeds the distance between human and chicken. Such large distances make it difficult to align genomes and use them for comparative analysis beyond gene-focused questions. In particular, detecting conserved non-genic elements (CNEs) as promising cis-regulatory elements with biological importance is challenging. Here, we develop a general comparative genomics framework to align isolated genomes and to comprehensively detect CNEs. Our approach integrates highly sensitive and quality-controlled local alignments and uses alignment transitivity and ancestral reconstruction to bridge large evolutionary distances. We apply our framework to zebrafish and demonstrate substantially improved CNE detection and quality compared with previous sets. Our zebrafish CNE set comprises 54 533 CNEs, of which 11 792 (22%) are conserved to human or mouse. Our zebrafish CNEs ( are highly enriched in known enhancers and extend existing experimental (ChIP-Seq) sets. The same framework can now be applied to the isolated genomes of frog, amphioxus, Caenorhabditis elegans and many others.

    View details for DOI 10.1093/nar/gkt557

    View details for PubMedID 23814184

  • Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish. Nucleic acids research Hiller, M., Agarwal, S., Notwell, J. H., Parikh, R., Guturu, H., Wenger, A. M., Bejerano, G. 2013; 41 (15)

    View details for DOI 10.1093/nar/gkt557

    View details for PubMedID 23814184

  • The Enhancer Landscape during Early Neocortical Development Reveals Patterns of Dense Regulation and Co-option. PLoS genetics Wenger, A. M., Clarke, S. L., Notwell, J. H., Chung, T., Tuteja, G., Guturu, H., Schaar, B. T., Bejerano, G. 2013; 9 (8)

    View details for DOI 10.1371/journal.pgen.1003728

    View details for PubMedID 24009522

  • The enhancer landscape during early neocortical development reveals patterns of dense regulation and co-option. PLoS genetics Wenger, A. M., Clarke, S. L., Notwell, J. H., Chung, T., Tuteja, G., Guturu, H., Schaar, B. T., Bejerano, G. 2013; 9 (8)


    Genetic studies have identified a core set of transcription factors and target genes that control the development of the neocortex, the region of the human brain responsible for higher cognition. The specific regulatory interactions between these factors, many key upstream and downstream genes, and the enhancers that mediate all these interactions remain mostly uncharacterized. We perform p300 ChIP-seq to identify over 6,600 candidate enhancers active in the dorsal cerebral wall of embryonic day 14.5 (E14.5) mice. Over 95% of the peaks we measure are conserved to human. Eight of ten (80%) candidates tested using mouse transgenesis drive activity in restricted laminar patterns within the neocortex. GREAT based computational analysis reveals highly significant correlation with genes expressed at E14.5 in key areas for neocortex development, and allows the grouping of enhancers by known biological functions and pathways for further studies. We find that multiple genes are flanked by dozens of candidate enhancers each, including well-known key neocortical genes as well as suspected and novel genes. Nearly a quarter of our candidate enhancers are conserved well beyond mammals. Human and zebrafish regions orthologous to our candidate enhancers are shown to most often function in other aspects of central nervous system development. Finally, we find strong evidence that specific interspersed repeat families have contributed potentially key developmental enhancers via co-option. Our analysis expands the methodologies available for extracting the richness of information found in genome-wide functional maps.

    View details for DOI 10.1371/journal.pgen.1003728

    View details for PubMedID 24009522

  • PRISM offers a comprehensive genomic approach to transcription factor function prediction. Genome research Wenger, A. M., Clarke, S. L., Guturu, H., Chen, J., Schaar, B. T., McLean, C. Y., Bejerano, G. 2013; 23 (5): 889-904


    The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.

    View details for DOI 10.1101/gr.139071.112

    View details for PubMedID 23382538

  • Enhancers: five essential questions NATURE REVIEWS GENETICS Pennacchio, L. A., Bickmore, W., Dean, A., Nobrega, M. A., Bejerano, G. 2013; 14 (4): 288-295


    It is estimated that the human genome contains hundreds of thousands of enhancers, so understanding these gene-regulatory elements is a crucial goal. Several fundamental questions need to be addressed about enhancers, such as how do we identify them all, how do they work, and how do they contribute to disease and evolution? Five prominent researchers in this field look at how much we know already and what needs to be done to answer these questions.

    View details for Web of Science ID 000316975300012

    View details for PubMedID 23503198

  • Evolutionary biology for the 21st century. PLoS biology Losos, J. B., Arnold, S. J., Bejerano, G., Brodie, E. D., Hibbett, D., Hoekstra, H. E., Mindell, D. P., Monteiro, A., Moritz, C., Orr, H. A., Petrov, D. A., Renner, S. S., Ricklefs, R. E., Soltis, P. S., Turner, T. L. 2013; 11 (1)

    View details for DOI 10.1371/journal.pbio.1001466

    View details for PubMedID 23319892

  • Evolutionary Biology for the 21st Century PLOS BIOLOGY Losos, J. B., Arnold, S. J., Bejerano, G., Brodie, E. D., Hibbett, D., Hoekstra, H. E., Mindell, D. P., Monteiro, A., Moritz, C., Orr, H. A., Petrov, D. A., Renner, S. S., Ricklefs, R. E., Soltis, P. S., Turner, T. L. 2013; 11 (1)

    View details for DOI 10.1371/journal.pbio.1001466

    View details for Web of Science ID 000314648700006

    View details for PubMedID 23319892

  • A penile spine/vibrissa enhancer sequence is missing in modern and extinct humans but is retained in multiple primates with penile spines and sensory vibrissae. PloS one Reno, P. L., McLean, C. Y., Hines, J. E., Capellini, T. D., Bejerano, G., Kingsley, D. M. 2013; 8 (12)


    Previous studies show that humans have a large genomic deletion downstream of the Androgen Receptor gene that eliminates an ancestral mammalian regulatory enhancer that drives expression in developing penile spines and sensory vibrissae. Here we use a combination of large-scale sequence analysis and PCR amplification to demonstrate that the penile spine/vibrissa enhancer is missing in all humans surveyed and in the Neandertal and Denisovan genomes, but is present in DNA samples of chimpanzees and bonobos, as well as in multiple other great apes and primates that maintain some form of penile integumentary appendage and facial vibrissae. These results further strengthen the association between the presence of the penile spine/vibrissa enhancer and the presence of penile spines and macro- or micro- vibrissae in non-human primates as well as show that loss of the enhancer is both a distinctive and characteristic feature of the human lineage.

    View details for DOI 10.1371/journal.pone.0084258

    View details for PubMedID 24367647

  • Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements. Philosophical transactions of the Royal Society of London. Series B, Biological sciences Guturu, H., Doxey, A. C., Wenger, A. M., Bejerano, G. 2013; 368 (1632): 20130029-?


    Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis-regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF-TF-DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein-protein interactions, potentially indirect interactions and 'through-DNA' interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at

    View details for DOI 10.1098/rstb.2013.0029

    View details for PubMedID 24218641

  • PESNPdb: A comprehensive database of SNPs studied in association with pre-eclampsia PLACENTA Tuteja, G., Cheng, E., Papadakis, H., Bejerano, G. 2012; 33 (12): 1055-1057


    Pre-eclampsia is a pregnancy specific disorder that can be life threatening for mother and child. Multiple studies have been carried out in an attempt to identify SNPs that contribute to the genetic susceptibility of the disease. Here we describe PESNPdb (, a database aimed at centralizing SNP and study details investigated in association with pre-eclampsia. We also describe a Placenta Disorders ontology that utilizes information from PESNPdb. The main focus of PESNPdb is to help researchers study the genetic complexity of pre-eclampsia through a user-friendly interface that encourages community participation.

    View details for DOI 10.1016/j.placenta.2012.09.016

    View details for Web of Science ID 000312171900015

    View details for PubMedID 23084601

  • Hundreds of conserved non-coding genomic regions are independently lost in mammals NUCLEIC ACIDS RESEARCH Hiller, M., Schaar, B. T., Bejerano, G. 2012; 40 (22): 11463-11476


    Conserved non-protein-coding DNA elements (CNEs) often encode cis-regulatory elements and are rarely lost during evolution. However, CNE losses that do occur can be associated with phenotypic changes, exemplified by pelvic spine loss in sticklebacks. Using a computational strategy to detect complete loss of CNEs in mammalian genomes while strictly controlling for artifacts, we find >600 CNEs that are independently lost in at least two mammalian lineages, including a spinal cord enhancer near GDF11. We observed several genomic regions where multiple independent CNE loss events happened; the most extreme is the DIAPH2 locus. We show that CNE losses often involve deletions and that CNE loss frequencies are non-uniform. Similar to less pleiotropic enhancers, we find that independently lost CNEs are shorter, slightly less constrained and evolutionarily younger than CNEs without detected losses. This suggests that independently lost CNEs are less pleiotropic and that pleiotropic constraints contribute to non-uniform CNE loss frequencies. We also detected 35 CNEs that are independently lost in the human lineage and in other mammals. Our study uncovers an interesting aspect of the evolution of functional DNA in mammalian genomes. Experiments are necessary to test if these independently lost CNEs are associated with parallel phenotype changes in mammals.

    View details for DOI 10.1093/nar/gks905

    View details for Web of Science ID 000313414800031

    View details for PubMedID 23042682

  • A "Forward Genomics'' Approach Links Genotype to Phenotype using Independent Phenotypic Losses among Related Species CELL REPORTS Hiller, M., Schaar, B. T., Indjeian, V. B., Kingsley, D. M., Hagey, L. R., Bejerano, G. 2012; 2 (4): 817-823


    Genotype-phenotype mapping is hampered by countless genomic changes between species. We introduce a computational "forward genomics" strategy that-given only an independently lost phenotype and whole genomes-matches genomic and phenotypic loss patterns to associate specific genomic regions with this phenotype. We conducted genome-wide screens for two metabolic phenotypes. First, our approach correctly matches the inactivated Gulo gene exactly with the species that lost the ability to synthesize vitamin C. Second, we attribute naturally low biliary phospholipid levels in guinea pigs and horses to the inactivated phospholipid transporter Abcb4. Human ABCB4 mutations also result in low phospholipid levels but lead to severe liver disease, suggesting compensatory mechanisms in guinea pig and horse. Our simulation studies, counts of independent changes in existing phenotype surveys, and the forthcoming availability of many new genomes all suggest that forward genomics can be applied to many phenotypes, including those relevant for human evolution and disease.

    View details for DOI 10.1016/j.celrep.2012.08.032

    View details for Web of Science ID 000314455600014

    View details for PubMedID 23022484

  • Human Developmental Enhancers Conserved between Deuterostomes and Protostomes PLOS GENETICS Clarke, S. L., VanderMeer, J. E., Wenger, A. M., Schaar, B. T., Ahituv, N., Bejerano, G. 2012; 8 (8)


    The identification of homologies, whether morphological, molecular, or genetic, is fundamental to our understanding of common biological principles. Homologies bridging the great divide between deuterostomes and protostomes have served as the basis for current models of animal evolution and development. It is now appreciated that these two clades share a common developmental toolkit consisting of conserved transcription factors and signaling pathways. These patterning genes sometimes show common expression patterns and genetic interactions, suggesting the existence of similar or even conserved regulatory apparatus. However, previous studies have found no regulatory sequence conserved between deuterostomes and protostomes. Here we describe the first such enhancers, which we call bilaterian conserved regulatory elements (Bicores). Bicores show conservation of sequence and gene synteny. Sequence conservation of Bicores reflects conserved patterns of transcription factor binding sites. We predict that Bicores act as response elements to signaling pathways, and we show that Bicores are developmental enhancers that drive expression of transcriptional repressors in the vertebrate central nervous system. Although the small number of identified Bicores suggests extensive rewiring of cis-regulation between the protostome and deuterostome clades, additional Bicores may be revealed as our understanding of cis-regulatory logic and sample of bilaterian genomes continue to grow.

    View details for DOI 10.1371/journal.pgen.1002852

    View details for Web of Science ID 000308529300014

    View details for PubMedID 22876195

  • A novel 13 base pair insertion in the sonic hedgehog ZRS limb enhancer (ZRS/LMBR1) causes preaxial polydactyly with triphalangeal thumb HUMAN MUTATION Laurell, T., VanderMeer, J. E., Wenger, A. M., Grigelioniene, G., Nordenskjold, A., Arner, M., Ekblom, A. G., Bejerano, G., Ahituv, N., Nordgren, A. 2012; 33 (7): 1063-1066


    Mutations in the Sonic hedgehog limb enhancer, the zone of polarizing activity regulatory sequence (ZRS, located within the gene LMBR1), commonly called the ZRS), cause limb malformations. In humans, three classes of mutations have been proposed based on the limb phenotype; single base changes throughout the region cause preaxial polydactyly (PPD), single base changes at one specific site cause Werner mesomelic syndrome, and large duplications cause polysyndactyly. This study presents a novel mutation-a small insertion. In a Swedish family with autosomal-dominant PPD, we found a 13 base pair insertion within the ZRS, NG_009240.1:g.106934_106935insTAAGGAAGTGATT (traditional nomenclature: ZRS603ins13). Computational transcription factor-binding site predictions suggest that this insertion creates new binding sites and a mouse enhancer assay shows that this insertion causes ectopic gene expression. This study is the first to discover a small insertion in an enhancer that causes a human limb malformation and suggests a potential mechanism that could explain the ectopic expression caused by this mutation.

    View details for DOI 10.1002/humu.22097

    View details for Web of Science ID 000304815100010

    View details for PubMedID 22495965

  • Coding exons function as tissue-specific enhancers of nearby genes GENOME RESEARCH Birnbaum, R. Y., Clowney, E. J., Agamy, O., Kim, M. J., Zhao, J., Yamanaka, T., Pappalardo, Z., Clarke, S. L., Wenger, A. M., Loan Nguyen, L., Gurrieri, F., Everman, D. B., Schwartz, C. E., Birk, O. S., Bejerano, G., Lomvardas, S., Ahituv, N. 2012; 22 (6): 1059-1068


    Enhancers are essential gene regulatory elements whose alteration can lead to morphological differences between species, developmental abnormalities, and human disease. Current strategies to identify enhancers focus primarily on noncoding sequences and tend to exclude protein coding sequences. Here, we analyzed 25 available ChIP-seq data sets that identify enhancers in an unbiased manner (H3K4me1, H3K27ac, and EP300) for peaks that overlap exons. We find that, on average, 7% of all ChIP-seq peaks overlap coding exons (after excluding for peaks that overlap with first exons). By using mouse and zebrafish enhancer assays, we demonstrate that several of these exonic enhancer (eExons) candidates can function as enhancers of their neighboring genes and that the exonic sequence is necessary for enhancer activity. Using ChIP, 3C, and DNA FISH, we further show that one of these exonic limb enhancers, Dync1i1 exon 15, has active enhancer marks and physically interacts with Dlx5/6 promoter regions 900 kb away. In addition, its removal by chromosomal abnormalities in humans could cause split hand and foot malformation 1 (SHFM1), a disorder associated with DLX5/6. These results demonstrate that DNA sequences can have a dual function, operating as coding exons in one tissue and enhancers of nearby gene(s) in another tissue, suggesting that phenotypes resulting from coding mutations could be caused not only by protein alteration but also by disrupting the regulation of another gene.

    View details for DOI 10.1101/gr.133546.111

    View details for Web of Science ID 000304728100007

    View details for PubMedID 22442009

    View details for PubMedCentralID PMC3371700

  • Control of Pelvic Girdle Development by Genes of the Pbx Family and Emx2 DEVELOPMENTAL DYNAMICS Capellini, T. D., Handschuh, K., Quintana, L., Ferretti, E., Di Giacomo, G., Fantini, S., Vaccari, G., Clarke, S. L., Wenger, A. M., Bejerano, G., Sharpe, J., Zappavigna, V., Selleri, L. 2011; 240 (5): 1173-1189


    Genes expressed in the somatopleuric mesoderm, the embryonic domain giving rise to the vertebrate pelvis, appear important for pelvic girdle formation. Among such genes, Pbx family members and Emx2 were found to genetically interact in hindlimb and pectoral girdle formation. Here, we generated compound mutant embryos carrying combinations of mutated alleles for Pbx1, Pbx2, and Pbx3, as well as Pbx1 and Emx2, to examine potential genetic interactions during pelvic development. Indeed, Pbx genes share overlapping functions and Pbx1 and Emx2 genetically interact in pelvic formation. We show that, in compound Pbx1;Pbx2 and Pbx1;Emx2 mutants, pelvic mesenchymal condensation is markedly perturbed, indicative of an upstream control by these homeoproteins. We establish that expression of Tbx15, Prrx1, and Pax1, among other genes involved in the specification and development of select pelvic structures, is altered in our compound mutants. Lastly, we identify potential Pbx1-Emx2-regulated enhancers for Tbx15, Prrx1, and Pax1, using bioinformatics analyses.

    View details for DOI 10.1002/dvdy.22617

    View details for Web of Science ID 000289942300023

    View details for PubMedID 21455939

  • Human-specific loss of regulatory DNA and the evolution of human-specific traits NATURE McLean, C. Y., Reno, P. L., Pollen, A. A., Bassan, A. I., Capellini, T. D., Guenther, C., Indjeian, V. B., Lim, X., Menke, D. B., Schaar, B. T., Wenger, A. M., Bejerano, G., Kingsley, D. M. 2011; 471 (7337): 216-219


    Humans differ from other animals in many aspects of anatomy, physiology, and behaviour; however, the genotypic basis of most human-specific traits remains unknown. Recent whole-genome comparisons have made it possible to identify genes with elevated rates of amino acid change or divergent expression in humans, and non-coding sequences with accelerated base pair changes. Regulatory alterations may be particularly likely to produce phenotypic effects while preserving viability, and are known to underlie interesting evolutionary differences in other species. Here we identify molecular events particularly likely to produce significant regulatory changes in humans: complete deletion of sequences otherwise highly conserved between chimpanzees and other mammals. We confirm 510 such deletions in humans, which fall almost exclusively in non-coding regions and are enriched near genes involved in steroid hormone signalling and neural function. One deletion removes a sensory vibrissae and penile spine enhancer from the human androgen receptor (AR) gene, a molecular change correlated with anatomical loss of androgen-dependent sensory vibrissae and penile spines in the human lineage. Another deletion removes a forebrain subventricular zone enhancer near the tumour suppressor gene growth arrest and DNA-damage-inducible, gamma (GADD45G), a loss correlated with expansion of specific brain regions in humans. Deletions of tissue-specific enhancers may thus accompany both loss and gain traits in the human lineage, and provide specific examples of the kinds of regulatory alterations and inactivation events long proposed to have an important role in human evolutionary divergence.

    View details for DOI 10.1038/nature09774

    View details for Web of Science ID 000288170200037

    View details for PubMedID 21390129

  • Noninvasive Monitoring of Placenta-Specific Transgene Expression by Bioluminescence Imaging PLOS ONE Fan, X., Ren, P., Dhal, S., Bejerano, G., Goodman, S. B., Druzin, M. L., Gambhir, S. S., Nayak, N. R. 2011; 6 (1)


    Placental dysfunction underlies numerous complications of pregnancy. A major obstacle to understanding the roles of potential mediators of placental pathology has been the absence of suitable methods for tissue-specific gene manipulation and sensitive assays for studying gene functions in the placentas of intact animals. We describe a sensitive and noninvasive method of repetitively tracking placenta-specific gene expression throughout pregnancy using lentivirus-mediated transduction of optical reporter genes in mouse blastocysts.Zona-free blastocysts were incubated with lentivirus expressing firefly luciferase (Fluc) and Tomato fluorescent fusion protein for trophectoderm-specific infection and transplanted into day 3 pseudopregnant recipients (GD3). Animals were examined for Fluc expression by live bioluminescence imaging (BLI) at different points during pregnancy, and the placentas were examined for tomato expression in different cell types on GD18. In another set of experiments, blastocysts with maximum photon fluxes in the range of 2.0E+4 to 6.0E+4 p/s/cm(2)/sr were transferred. Fluc expression was detectable in all surrogate dams by day 5 of pregnancy by live imaging, and the signal increased dramatically thereafter each day until GD12, reaching a peak at GD16 and maintaining that level through GD18. All of the placentas, but none of the fetuses, analyzed on GD18 by BLI showed different degrees of Fluc expression. However, only placentas of dams transferred with selected blastocysts showed uniform photon distribution with no significant variability of photon intensity among placentas of the same litter. Tomato expression in the placentas was limited to only trophoblast cell lineages.These results, for the first time, demonstrate the feasibility of selecting lentivirally-transduced blastocysts for uniform gene expression in all placentas of the same litter and early detection and quantitative analysis of gene expression throughout pregnancy by live BLI. This method may be useful for a wide range of applications involving trophoblast-specific gene manipulations in utero.

    View details for DOI 10.1371/journal.pone.0016348

    View details for Web of Science ID 000286522300037

    View details for PubMedID 21283713

    View details for PubMedCentralID PMC3025029

  • Human-specific loss of an androgen receptor enhancer is associated with the loss of vibrissae and penile spines 80th Annual Meeting of the American-Association-of-Physical-Anthropologists Reno, P. L., McLean, C. Y., Pollen, A. A., Bejerano, G., Kingsley, D. M. WILEY-BLACKWELL. 2011: 252–252
  • Endangered Species Hold Clues to Human Evolution JOURNAL OF HEREDITY Lowe, C. B., Bejerano, G., Salama, S. R., Haussler, D. 2010; 101 (4): 437-447


    We report that 18 conserved, and by extension functional, elements in the human genome are the result of retroposon insertions that are evolving under purifying selection in mammals. We show evidence that 1 of the 18 elements regulates the expression of ASXL3 during development by encoding an alternatively spliced exon that causes nonsense-mediated decay of the transcript. The retroposon that gave rise to these functional elements was quickly inactivated in the mammalian ancestor, and all traces of it have been lost due to neutral decay. However, the tuatara has maintained a near-ancestral version of this retroposon in its extant genome, which allows us to connect the 18 human elements to the evolutionary events that created them. We propose that conservation efforts over more than 100 years may not have only prevented the tuatara from going extinct but could have preserved our ability to understand the evolutionary history of functional elements in the human genome. Through simulations, we argue that species with historically low population sizes are more likely to harbor ancient mobile elements for long periods of time and in near-ancestral states, making these species indispensable in understanding the evolutionary origin of functional elements in the human genome.

    View details for DOI 10.1093/jhered/esq016

    View details for Web of Science ID 000279430300005

    View details for PubMedID 20332163

  • GREAT improves functional interpretation of cis-regulatory regions NATURE BIOTECHNOLOGY McLean, C. Y., Bristor, D., Hiller, M., Clarke, S. L., Schaar, B. T., Lowe, C. B., Wenger, A. M., Bejerano, G. 2010; 28 (5): 495-U155


    We developed the Genomic Regions Enrichment of Annotations Tool (GREAT) to analyze the functional significance of cis-regulatory regions identified by localized measurements of DNA binding events across an entire genome. Whereas previous methods took into account only binding proximal to genes, GREAT is able to properly incorporate distal binding sites and control for false positives using a binomial test over the input genomic regions. GREAT incorporates annotations from 20 ontologies and is available as a web application. Applying GREAT to data sets from chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq) of multiple transcription-associated factors, including SRF, NRSF, GABP, Stat3 and p300 in different developmental contexts, we recover many functions of these factors that are missed by existing gene-based tools, and we generate testable hypotheses. The utility of GREAT is not limited to ChIP-seq, as it could also be applied to open chromatin, localized epigenomic markers and similar functional data sets, as well as comparative genomics sets.

    View details for DOI 10.1038/nbt.1630

    View details for Web of Science ID 000277452700030

    View details for PubMedID 20436461

  • Dispensability of mammalian DNA GENOME RESEARCH McLean, C., Bejerano, G. 2008; 18 (11): 1743-1751


    In the lab, the cis-regulatory network seems to exhibit great functional redundancy. Many experiments testing enhancer activity of neighboring cis-regulatory elements show largely overlapping expression domains. Of recent interest, mice in which cis-regulatory ultraconserved elements were knocked out showed no obvious phenotype, further suggesting functional redundancy. Here, we present a global evolutionary analysis of mammalian conserved nonexonic elements (CNEs), and find strong evidence to the contrary. Given a set of CNEs conserved between several mammals, we characterize functional dispensability as the propensity for the ancestral element to be lost in mammalian species internal to the spanned species tree. We show that ultraconserved-like elements are over 300-fold less likely than neutral DNA to have been lost during rodent evolution. In fact, many thousands of noncoding loci under purifying selection display near uniform indispensability during mammalian evolution, largely irrespective of nucleotide conservation level. These findings suggest that many genomic noncoding elements possess functions that contribute noticeably to organism fitness in naturally evolving populations.

    View details for DOI 10.1101/gr.080184.108

    View details for Web of Science ID 000260536100007

    View details for PubMedID 18832441

  • Human genome ultraconserved elements are ultraselected SCIENCE Katzman, S., Kern, A. D., Bejerano, G., Fewell, G., Fulton, L., Wilson, R. K., Salama, S. R., Haussler, D. 2007; 317 (5840): 915-915


    Ultraconserved elements in the human genome are defined as stretches of at least 200 base pairs of DNA that match identically with corresponding regions in the mouse and rat genomes. Most ultraconserved elements are noncoding and have been evolutionarily conserved since mammal and bird ancestors diverged over 300 million years ago. The reason for this extreme conservation remains a mystery. It has been speculated that they are mutational cold spots or regions where every site is under weak but still detectable negative selection. However, analysis of the derived allele frequency spectrum shows that these regions are in fact under negative selection that is much stronger than that in protein coding genes.

    View details for DOI 10.1126/science.1142430

    View details for Web of Science ID 000248780200030

    View details for PubMedID 17702936

  • Comparative genomic analysis using the UCSC genome browser. Methods in molecular biology (Clifton, N.J.) Karolchik, D., Bejerano, G., Hinrichs, A. S., Kuhn, R. M., Miller, W., Rosenbloom, K. R., Zweig, A. S., Haussler, D., Kent, W. J. 2007; 395: 17-34


    Comparative analysis of DNA sequence from multiple species can provide insights into the function and evolutionary processes that shape genomes. The University of California Santa Cruz (UCSC) Genome Bioinformatics group has developed several tools and methodologies in its study of comparative genomics, many of which have been incorporated into the UCSC Genome Browser (, an easy-to-use online tool for browsing genomic data and aligned annotation "tracks" in a single window. The comparative genomics annotations in the browser include pairwise alignments, which aid in the identification of orthologous regions between species, and conservation tracks that show measures of evolutionary conservation among sets of multiply aligned species, highlighting regions of the genome that may be functionally important. A related tool, the UCSC Table Browser, provides a simple interface for querying, analyzing, and downloading the data underlying the Genome Browser annotation tracks. Here, we describe a procedure for examining a genomic region of interest in the Genome Browser, analyzing characteristics of the region, filtering the data, and downloading data sets for further study.

    View details for PubMedID 17993665

  • Thousands of human mobile element fragments undergo strong purifying selection near developmental genes Proc. Nat?l Acad. Sci. USA C.B. Lowe, G. Bejerano, D. Haussler 2007; 104 (19): 8005-8010
  • Branch and bound computation of exact p-values BIOINFORMATICS Bejerano, G. 2006; 22 (17): 2158-2159


    P-value computation is often used in bioinformatics to quantify the surprise, or significance, associated with a given observation. An implementation is provided that computes the exact p-value associated with any observed sample, against a null multinomial distribution, using the likelihood-ratio statistic. The efficient branch and bound code, far exceeding the full enumeration implemented by commercial packages, is especially useful with small sample, sparse data and rare events, common scenarios in bioinformatics, where approximations are often inaccurate and inappropriate. This code base can also be adapted to compute exact p-values of other statistics in diverse sampling scenarios.Freely available at

    View details for DOI 10.1093/bioinformatics/btl357

    View details for Web of Science ID 000240433100015

    View details for PubMedID 16895926

  • Identification and classification of conserved RNA secondary structures in the human genome PLOS COMPUTATIONAL BIOLOGY Pedersen, J. S., Bejerano, G., Siepel, A., Rosenbloom, K., Lindblad-Toh, K., Lander, E. S., Kent, J., Miller, W., Haussler, D. 2006; 2 (4): 251-262


    The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3'UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization.

    View details for DOI 10.1371/journal.pcbi.0020033

    View details for Web of Science ID 000239493800005

    View details for PubMedID 16628248

  • The UCSC Genome Browser Database: update 2006 NUCLEIC ACIDS RESEARCH Hinrichs, A. S., Karolchik, D., Baertsch, R., Barber, G. P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T. S., Harte, R. A., Hsu, F., Hillman-Jackson, J., Kuhn, R. M., PEDERSEN, J. S., Pohl, A., Raney, B. J., Rosenbloom, K. R., Siepel, A., Smith, K. E., Sugnet, C. W., Sultan-Qurraie, A., Thomas, D. J., Trumbower, H., Weber, R. J., Weirauch, M., Zweig, A. S., Haussler, D., Kent, W. J. 2006; 34: D590-D598


    The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering and comparison of genes by several metrics including expression data and several gene properties. BLAT and In Silico PCR search for sequences in entire genomes in seconds. These tools are highly integrated and provide many hyperlinks to other databases and websites. The GBD, browsing tools, downloadable data files and links to documentation and other information can be found at

    View details for DOI 10.1093/nar/gkj144

    View details for Web of Science ID 000239307700126

    View details for PubMedID 16381938

  • A Distal Enhancer and an Ultraconserved Exon are Derived From a Novel Retroposon Nature G. Bejerano, C.B. Lowe, N. Ahituv, B. King, A. Siepel, S.R. Salama, E.M. Rubin, W.J. Kent, D. Haussler 2006; 441 (7089): 87-90
  • Forces Shaping the Fastest Evolving Regions in the Human Genome PLoS Genetics K.S. Pollard, K.S. Pollard, S.R. Salama, B. King, A.D. Kern, T. Dreszer, S. Katzman, A.Siepel, J.S. Pedersen, G. Bejerano, R. Baertsch, K.R. Rosenbloom, J. Kent, D. Haussler 2006; 2 (10): e168
  • Computational screening of conserved genomic DNA in search of functional noncoding elements NATURE METHODS Bejerano, G., Siepel, A. C., Kent, W. J., Haussler, D. 2005; 2 (7): 535-545

    View details for Web of Science ID 000230165700018

    View details for PubMedID 16170870

  • Ultraconserved elements in insect genomes: A highly conserved intronic sequence implicated in the control of homothorax mRNA splicing GENOME RESEARCH Glazov, E. A., Pheasant, M., McGraw, E. A., Bejerano, G., Mattick, J. S. 2005; 15 (6): 800-808


    Recently, we identified a large number of ultraconserved (uc) sequences in noncoding regions of human, mouse, and rat genomes that appear to be essential for vertebrate and amniote ontogeny. Here, we used similar methods to identify ultraconserved genomic regions between the insect species Drosophila melanogaster and Drosophila pseudoobscura, as well as the more distantly related Anopheles gambiae. As with vertebrates, ultraconserved sequences in insects appear to occur primarily in intergenic and intronic sequences, and at intron-exon junctions. The sequences are significantly associated with genes encoding developmental regulators and transcription factors, but are less frequent and are smaller in size than in vertebrates. The longest identical, nongapped orthologous match between the three genomes was found within the homothorax (hth) gene. This sequence spans an internal exon-intron junction, with the majority located within the intron, and is predicted to form a highly stable stem-loop RNA structure. Real-time quantitative PCR analysis of different hth splice isoforms and Northern blotting showed that the conserved element is associated with a high incidence of intron retention in hth pre-mRNA, suggesting that the conserved intronic element is critically important in the post-transcriptional regulation of hth expression in Diptera.

    View details for DOI 10.1101/gr.3545105

    View details for Web of Science ID 000229623100005

    View details for PubMedID 15899965

  • Evolutionarily Conserved Elements in Vertebrate, Fly, Worm, and Yeast Genomes Genome Research A. Siepel, G. Bejerano, J.S. Pedersen, A. Hinrichs, M. Hou, K. Rosenbloom, H. Clawson, J. Spieth, L.W. Hillier, S. Richards, G.M. Weinstock, R.K. Wilson, R.A. Gibbs, W.J. Kent, W. Miller, D. Hausler 2005; 15 (8): 1034-1050
  • Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution NATURE Hillier, L. W., Miller, W., Birney, E., Warren, W., Hardison, R. C., Ponting, C. P., Bork, P., Burt, D. W., Groenen, M. A., Delany, M. E., Dodgson, J. B., Chinwalla, A. T., Cliften, P. F., Clifton, S. W., Delehaunty, K. D., Fronick, C., Fulton, R. S., Graves, T. A., Kremitzki, C., Layman, D., Magrini, V., McPherson, J. D., Miner, T. L., Minx, P., Nash, W. E., Nhan, M. N., Nelson, J. O., Oddy, L. G., Pohl, C. S., Randall-Maher, J., Smith, S. M., Wallis, J. W., Yang, S. P., Romanov, M. N., Rondelli, C. M., Paton, B., Smith, J., Morrice, D., Daniels, L., Tempest, H. G., Robertson, L., Masabanda, J. S., Griffin, D. K., Vignal, A., Fillon, V., Jacobbson, L., Kerje, S., Andersson, L., Crooijmans, R. P., Aerts, J., van der Poel, J. J., Ellegren, H., Caldwell, R. B., Hubbard, S. J., Grafham, D. V., Kierzek, A. M., McLaren, S. R., Overton, I. M., Arakawa, H., Beattie, K. J., Bezzubov, Y., Boardman, P. E., Bonfield, J. K., Croning, M. D., Davies, R. M., Francis, M. D., Humphray, S. J., Scott, C. E., Taylor, R. G., Tickle, C., Brown, W. R., Rogers, J., Buerstedde, J. M., Wilson, S. A., Stubbs, L., Ovcharenko, I., Gordon, L., Lucas, S., Miller, M. M., Inoko, H., Shiina, T., Kaufman, J., Salomonsen, J., Skjoedt, K., Wong, G. K., Wang, J., Liu, B., Wang, J., Yu, J., Yang, H. M., Nefedov, M., Koriabine, M., deJong, P. J., Goodstadt, L., Webber, C., Dickens, N. J., Letunic, I., Suyama, M., Torrents, D., von Mering, C., Zdobnov, E. M., Makova, K., Nekrutenko, A., Elnitski, L., Eswara, P., King, D. C., Yang, S., Tyekucheva, S., Radakrishnan, A., HARRIS, R. S., Chiaromonte, F., Taylor, J., He, J. B., Rijnkels, M., Griffiths-Jones, S., Ureta-Vidal, A., Hoffman, M. M., Severin, J., Searle, S. M., Law, A. S., Speed, D., Waddington, D., Cheng, Z., Tuzun, E., Eichler, E., Bao, Z. R., Flicek, P., Shteynberg, D. D., Brent, M. R., Bye, J. M., Huckle, E. J., Chatterji, S., Dewey, C., Pachter, L., Kouranov, A., Mourelatos, Z., Hatzigeorgiou, A. G., Paterson, A. H., Ivarie, R., Brandstrom, M., Axelsson, E., Backstrom, N., Berlin, S., Webster, M. T., Pourquie, O., Reymond, A., Ucla, C., Antonarakis, S. E., Long, M. Y., Emerson, J. J., Betran, E., Dupanloup, I., Kaessmann, H., Hinrichs, A. S., Bejerano, G., Furey, T. S., Harte, R. A., Raney, B., Siepel, A., Kent, W. J., Haussler, D., Eyras, E., Castelo, R., Abril, J. F., Castellano, S., Camara, F., Parra, G., Guigo, R., Bourque, G., Tesler, G., Pevzner, P. A., Smit, A., Fulton, L. A., Mardis, E. R., Wilson, R. K. 2004; 432 (7018): 695-716


    We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

    View details for DOI 10.1038/nature03154

    View details for Web of Science ID 000225597200038

    View details for PubMedID 15592404

  • Into the heart of darkness: large-scale clustering of human non-coding DNA BIOINFORMATICS Bejerano, G., Haussler, D., Blanchette, M. 2004; 20: 40-48
  • Into the heart of darkness: large-scale clustering of human non-coding DNA. Bioinformatics Bejerano, G., Haussler, D., Blanchette, M. 2004; 20: i40-8


    It is currently believed that the human genome contains about twice as much non-coding functional regions as it does protein-coding genes, yet our understanding of these regions is very limited.We examine the intersection between syntenically conserved sequences in the human, mouse and rat genomes, and sequence similarities within the human genome itself, in search of families of non-protein-coding elements. For this purpose we develop a graph theoretic clustering algorithm, akin to the highly successful methods used in elucidating protein sequence family relationships. The algorithm is applied to a highly filtered set of about 700 000 human-rodent evolutionarily conserved regions, not resembling any known coding sequence, which encompasses 3.7% of the human genome. From these, we obtain roughly 12 000 non-singleton clusters, dense in significant sequence similarities. Further analysis of genomic location, evidence of transcription and RNA secondary structure reveals many clusters to be significantly homogeneous in one or more characteristics. This subset of the highly conserved non-protein-coding elements in the human genome thus contains rich family-like structures, which merit in-depth analysis.Supplementary material to this work is available at

    View details for PubMedID 15262779

  • Ultraconserved elements in the human genome SCIENCE Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W. J., Mattick, J. S., Haussler, D. 2004; 304 (5675): 1321-1325


    There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. Along with more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals, these represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserved between these species than are proteins and appear to be essential for the ontogeny of mammals and other vertebrates.

    View details for DOI 10.1126/science.1098119

    View details for Web of Science ID 000221669600054

    View details for PubMedID 15131266

  • Algorithms for variable length Markov chain modeling BIOINFORMATICS Bejerano, G. 2004; 20 (5): 788-U729


    We present a general purpose implementation of variable length Markov models. Contrary to fixed order Markov models, these models are not restricted to a predefined uniform depth. Rather, by examining the training data, a model is constructed that fits higher order Markov dependencies where such contexts exist, while using lower order Markov dependencies elsewhere. As both theoretical and experimental results show, these models are capable of capturing rich signals from a modest amount of training data, without the use of hidden states.The source code is freely available at

    View details for DOI 10.1093/bioinformatics/btg489

    View details for Web of Science ID 000220485300025

    View details for PubMedID 14751999

  • Efficient exact p-value computation for small sample, sparse and surprising categorical data J. Computational Biology G. Bejerano, N. Friedman, N. Tishby 2004; 11 (5675): 867-886
  • Extremely conserved non-coding sequences in vertebrate genomes 4th International Conference on Bioinformatics of Genome Regulation and Structure (BGRS 2004) Makunin, I., Stephen, S., Pheasant, M., Bejerano, G., Kent, J. W., HAUSSLER, H., Mattick, J. S. RUSSIAN ACAD SCI SIBERIAN BRANCH. 2004: 138–140
  • Extremely conserved non-coding sequences in the vertebrate genomes Proceedings of 4th International Conference on Bioinformatics of Genome Regulation and Structure I.V. Makunin, S. Stephen, M. Pheasant, G. Bejerano, W.J. Kent, D. Haussler, J.S. Mattick 2004; BGRS
  • Sequencing and comparative analysis of the chicken genome Nature International Chicken Genome Sequencing Consortium 2004; 432 (7018): 695-716
  • Discriminative feature selection via multiclass variable memory Markov model EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING Slonim, N., Bejerano, G., Fine, S., Tishby, N. 2003; 2003 (2): 93-102
  • A system for computer music generation by learning and improvisation in a particular style IEEE Computer J. O. Lartillot, S. Dubnov, G. Assayag, G. Bejerano 2003; 36 (10): 73-80
  • Efficient exact p-value computation and applications to biosequence analysis Proceedings of the 7th annual international conference on research in computational molecular biology G. Bejerano 2003; RECOMB
  • Discriminative feature selection via multiclass variable memory Markov models EURASIP J. Applied Signal Processing N. Slonim, G. Bejerano, S. Fine, N. Tishby 2003; 2: 93-102
  • Discriminative feature selection via multiclass variable memory Markov models Proceedings of 19th International Conference on Machine Learning N. Slonim, G. Bejerano, S. Fine, N. Tishby 2002; ICML
  • Novel small RNA-encoding genes in the intergenic regions of Escherichia coli CURRENT BIOLOGY Argaman, L., Hershberg, R., Vogel, J., Bejerano, G., Wagner, E. G., Margalit, H., Altuvia, S. 2001; 11 (12): 941-950


    Small, untranslated RNA molecules were identified initially in bacteria, but examples can be found in all kingdoms of life. These RNAs carry out diverse functions, and many of them are regulators of gene expression. Genes encoding small, untranslated RNAs are difficult to detect experimentally or to predict by traditional sequence analysis approaches. Thus, in spite of the rising recognition that such RNAs may play key roles in bacterial physiology, many of the small RNAs known to date were discovered fortuitously.To search the Escherichia coli genome sequence for genes encoding small RNAs, we developed a computational strategy employing transcription signals and genomic features of the known small RNA-encoding genes. The search, for which we used rather restrictive criteria, has led to the prediction of 24 putative sRNA-encoding genes, of which 23 were tested experimentally. Here we report on the discovery of 14 genes encoding novel small RNAs in E. coli and their expression patterns under a variety of physiological conditions. Most of the newly discovered RNAs are abundant. Interestingly, the expression level of a significant number of these RNAs increases upon entry into stationary phase.Based on our results, we conclude that small RNAs are much more widespread than previously imagined and that these versatile molecules may play important roles in the fine-tuning of cell responses to changing environments.

    View details for Web of Science ID 000169612900018

    View details for PubMedID 11448770

  • Variations on probabilistic suffix trees: statistical modeling and prediction of protein families BIOINFORMATICS Bejerano, G., Yona, G. 2001; 17 (1): 23-43


    We present a method for modeling protein families by means of probabilistic suffix trees (PSTs). The method is based on identifying significant patterns in a set of related protein sequences. The patterns can be of arbitrary length, and the input sequences do not need to be aligned, nor is delineation of domain boundaries required. The method is automatic, and can be applied, without assuming any preliminary biological information, with surprising success. Basic biological considerations such as amino acid background probabilities, and amino acids substitution probabilities can be incorporated to improve performance.The PST can serve as a predictive tool for protein sequence classification, and for detecting conserved patterns (possibly functionally or structurally important) within protein sequences. The method was tested on the Pfam database of protein families with more than satisfactory performance. Exhaustive evaluations show that the PST model detects much more related sequences than pairwise methods such as Gapped-BLAST, and is almost as sensitive as a hidden Markov model that is trained from a multiple alignment of the input sequences, while being much faster.

    View details for Web of Science ID 000167241500005

    View details for PubMedID 11222260

  • Unsupervised sequence segmentation by a mixture of switching variable memory Markov sources Proceedings of 18th International Conference on Machine Learning Y. Seldin, G. Bejerano, N. Tishby 2001; IMCL
  • PromEC: An updated database of Escherichia coli mRNA promoters with experimentally identified transcriptional start sites NUCLEIC ACIDS RESEARCH Hershberg, R., Bejerano, G., Santos-Zavaleta, A., Margalit, H. 2001; 29 (1): 277-277


    PromEC is an updated compilation of Escherichia coli mRNA promoter sequences. It includes documentation on the location of experimentally identified mRNA transcriptional start sites on the E. coli chromosome, as well as the actual sequences in the promoter region. The database was updated as of July 2000 and includes 472 entries. PromEC is accessible at il/marg/promec

    View details for Web of Science ID 000166360300075

    View details for PubMedID 11125111

  • A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites, 1st Workshop on Algorithms in Bioinformatics Lecture Notes in Computer Science Y. Barash, G. Bejerano, N. Friedman 2001; WABI (2149): 278-293
  • Novel small RNA-encoding genes in Escherichia coli Current Biology L. Argaman, R. Hershberg, J. Vogel, G. Bejerano, G. Wagner, H. Margalit, S. Altuvia 2001; 11 (12): 941-950
  • Automated modeling of musical style Proceedings of the International Computer Music Conference O. Lartillot, S. Dubnov, G. Assayag, G. Bejerano 2001; ICMC
  • A variable memory Markovian modeling approach to unsupervised sequence segmentation Proceedings of 33rd Symposium on the Interface of Computing Science and Statistics Y. Seldin, G. Bejerano, N. Tishby 2001; INTERFACE
  • Optimal amnesic probabilistic automata, or, how to learn and classify proteins in linear time and space Proceedings of the 4th annual international conference on research in computational molecular biology A. Apostolico, G. Bejerano 2000; RECOMB
  • Optimal amnesic probabilistic automata, or, how to learn and classify proteins in linear time and space J. Computational Biology A. Apostolico, G. Bejerano 2000; 7 (3-4): 381-393
  • Modeling protein families using probabilistic suffix trees, Proceedings of the 3rd annual international conference on research in computational molecular biology RECOMB G. Bejerano, G. Yona 1999