Bio

Academic Appointments


Honors & Awards


  • Army Breast Cancer Research Fellowship, Department of Defence (1997-1998)
  • Cold Spring Harbor Fellowship, Cold Spring Harbor Laboratory (1996-1997)
  • Prize Studentship, The Wellcome Trust (1991-1994)
  • John Buckley Entrance Scholarship for Science, Manchester University (1988-1991)

Professional Education


  • B.Sc., Manchester University, Genetics (1991)
  • Ph.D., Manchester University, Molecular Biology (1994)

Research & Scholarship

Current Research and Scholarly Interests


1. Evolution and the Adaptive Landscape

When yeast are evolved under various selective pressures in a chemostat, mutations that arise and provide an adaptive advantage will expand within the population. We are using high throughput sequencing to determine the identity of such mutations, as well as to understand the dynamics of the mutations within the populations, and the interactions between the mutations (such as epistasis).

2. Genome Annotation by Transcriptome Sequencing

The set of genes in a sequenced genome has typically been defined using various prediction criteria (such as ORFs capable of encoding a protein > 100 amino acids), coupled with experimental data, such as transposon mutagenesis and EST sequencing. The availability of high throughput sequencing now allows full transcriptome sequencing to better annotate the transcribed regions of the genome, and we are applying this to various yeasts.

Teaching

2013-14 Courses


Postdoctoral Advisees


Graduate and Fellowship Programs


Publications

Journal Articles


  • Whole genome, whole population sequencing reveals that loss of signaling networks is the major adaptive strategy in a constant environment. PLoS genetics Kvitek, D. J., Sherlock, G. 2013; 9 (11)

    Abstract

    Molecular signaling networks are ubiquitous across life and likely evolved to allow organisms to sense and respond to environmental change in dynamic environments. Few examples exist regarding the dispensability of signaling networks, and it remains unclear whether they are an essential feature of a highly adapted biological system. Here, we show that signaling network function carries a fitness cost in yeast evolving in a constant environment. We performed whole-genome, whole-population Illumina sequencing on replicate evolution experiments and find the major theme of adaptive evolution in a constant environment is the disruption of signaling networks responsible for regulating the response to environmental perturbations. Over half of all identified mutations occurred in three major signaling networks that regulate growth control: glucose signaling, Ras/cAMP/PKA and HOG. This results in a loss of environmental sensitivity that is reproducible across experiments. However, adaptive clones show reduced viability under starvation conditions, demonstrating an evolutionary tradeoff. These mutations are beneficial in an environment with a constant and predictable nutrient supply, likely because they result in constitutive growth, but reduce fitness in an environment where nutrient supply is not constant. Our results are a clear example of the myopic nature of evolution: a loss of environmental sensitivity in a constant environment is adaptive in the short term, but maladaptive should the environment change.

    View details for DOI 10.1371/journal.pgen.1003972

    View details for PubMedID 24278038

  • Recurrent Rearrangement during Adaptive Evolution in an Interspecific Yeast Hybrid Suggests a Model for Rapid Introgression PLOS GENETICS Dunn, B., Paulish, T., Stanbery, A., Piotrowski, J., Koniges, G., Kroll, E., Louis, E. J., Liti, G., Sherlock, G., Rosenzweig, F. 2013; 9 (3)

    Abstract

    Genome rearrangements are associated with eukaryotic evolutionary processes ranging from tumorigenesis to speciation. Rearrangements are especially common following interspecific hybridization, and some of these could be expected to have strong selective value. To test this expectation we created de novo interspecific yeast hybrids between two diverged but largely syntenic Saccharomyces species, S. cerevisiae and S. uvarum, then experimentally evolved them under continuous ammonium limitation. We discovered that a characteristic interspecific genome rearrangement arose multiple times in independently evolved populations. We uncovered nine different breakpoints, all occurring in a narrow ~1-kb region of chromosome 14, and all producing an "interspecific fusion junction" within the MEP2 gene coding sequence, such that the 5' portion derives from S. cerevisiae and the 3' portion derives from S. uvarum. In most cases the rearrangements altered both chromosomes, resulting in what can be considered to be an introgression of a several-kb region of S. uvarum into an otherwise intact S. cerevisiae chromosome 14, while the homeologous S. uvarum chromosome 14 experienced an interspecific reciprocal translocation at the same breakpoint within MEP2, yielding a chimaeric chromosome; these events result in the presence in the cell of two MEP2 fusion genes having identical breakpoints. Given that MEP2 encodes for a high-affinity ammonium permease, that MEP2 fusion genes arise repeatedly under ammonium-limitation, and that three independent evolved isolates carrying MEP2 fusion genes are each more fit than their common ancestor, the novel MEP2 fusion genes are very likely adaptive under ammonium limitation. Our results suggest that, when homoploid hybrids form, the admixture of two genomes enables swift and otherwise unavailable evolutionary innovations. Furthermore, the architecture of the MEP2 rearrangement suggests a model for rapid introgression, a phenomenon seen in numerous eukaryotic phyla, that does not require repeated backcrossing to one of the parental species.

    View details for DOI 10.1371/journal.pgen.1003366

    View details for Web of Science ID 000316866700042

    View details for PubMedID 23555283

  • Hunger Artists: Yeast Adapted to Carbon Limitation Show Trade-Offs under Carbon Sufficiency PLOS GENETICS Wenger, J. W., Piotrowski, J., Nagarajan, S., Chiotti, K., Sherlock, G., Rosenzweig, F. 2011; 7 (8)

    Abstract

    As organisms adaptively evolve to a new environment, selection results in the improvement of certain traits, bringing about an increase in fitness. Trade-offs may result from this process if function in other traits is reduced in alternative environments either by the adaptive mutations themselves or by the accumulation of neutral mutations elsewhere in the genome. Though the cost of adaptation has long been a fundamental premise in evolutionary biology, the existence of and molecular basis for trade-offs in alternative environments are not well-established. Here, we show that yeast evolved under aerobic glucose limitation show surprisingly few trade-offs when cultured in other carbon-limited environments, under either aerobic or anaerobic conditions. However, while adaptive clones consistently outperform their common ancestor under carbon limiting conditions, in some cases they perform less well than their ancestor in aerobic, carbon-rich environments, indicating that trade-offs can appear when resources are non-limiting. To more deeply understand how adaptation to one condition affects performance in others, we determined steady-state transcript abundance of adaptive clones grown under diverse conditions and performed whole-genome sequencing to identify mutations that distinguish them from one another and from their common ancestor. We identified mutations in genes involved in glucose sensing, signaling, and transport, which, when considered in the context of the expression data, help explain their adaptation to carbon poor environments. However, different sets of mutations in each independently evolved clone indicate that multiple mutational paths lead to the adaptive phenotype. We conclude that yeasts that evolve high fitness under one resource-limiting condition also become more fit under other resource-limiting conditions, but may pay a fitness cost when those same resources are abundant.

    View details for DOI 10.1371/journal.pgen.1002202

    View details for Web of Science ID 000294297000006

    View details for PubMedID 21829391

  • Reciprocal Sign Epistasis between Frequently Experimentally Evolved Adaptive Mutations Causes a Rugged Fitness Landscape PLOS GENETICS Kvitek, D. J., Sherlock, G. 2011; 7 (4)

    Abstract

    The fitness landscape captures the relationship between genotype and evolutionary fitness and is a pervasive metaphor used to describe the possible evolutionary trajectories of adaptation. However, little is known about the actual shape of fitness landscapes, including whether valleys of low fitness create local fitness optima, acting as barriers to adaptive change. Here we provide evidence of a rugged molecular fitness landscape arising during an evolution experiment in an asexual population of Saccharomyces cerevisiae. We identify the mutations that arose during the evolution using whole-genome sequencing and use competitive fitness assays to describe the mutations individually responsible for adaptation. In addition, we find that a fitness valley between two adaptive mutations in the genes MTH1 and HXT6/HXT7 is caused by reciprocal sign epistasis, where the fitness cost of the double mutant prohibits the two mutations from being selected in the same genetic background. The constraint enforced by reciprocal sign epistasis causes the mutations to remain mutually exclusive during the experiment, even though adaptive mutations in these two genes occur several times in independent lineages during the experiment. Our results show that epistasis plays a key role during adaptation and that inter-genic interactions can act as barriers between adaptive solutions. These results also provide a new interpretation on the classic Dobzhansky-Muller model of reproductive isolation and display some surprising parallels with mutations in genes often associated with tumors.

    View details for DOI 10.1371/journal.pgen.1002056

    View details for Web of Science ID 000289977000039

    View details for PubMedID 21552329

  • Molecular characterization of clonal interference during adaptive evolution in asexual populations of Saccharomyces cerevisiae NATURE GENETICS Kao, K. C., Sherlock, G. 2008; 40 (12): 1499-1504

    Abstract

    The classical model of adaptive evolution in an asexual population postulates that each adaptive clone is derived from the one preceding it. However, experimental evidence has suggested more complex dynamics, with theory predicting the fixation probability of a beneficial mutation as dependent on the mutation rate, population size and the mutation's selection coefficient. Clonal interference has been demonstrated in viruses and bacteria but not in a eukaryote, and a detailed molecular characterization is lacking. Here we use three different fluorescent markers to visualize the dynamics of asexually evolving yeast populations. For each adaptive clone within one of our evolving populations, we identified the underlying mutations, monitored their population frequencies and used microarrays to characterize changes in the transcriptome. These results represent the most detailed molecular characterization of experimental evolution to date and provide direct experimental evidence supporting both the clonal interference and the multiple mutation models.

    View details for DOI 10.1038/ng.280

    View details for Web of Science ID 000261215900030

    View details for PubMedID 19029899

  • Reconstruction of the genome origins and evolution of the hybrid lager yeast Saccharomyces pastorianus GENOME RESEARCH Dunn, B., Sherlock, G. 2008; 18 (10): 1610-1623

    Abstract

    Inter-specific hybridization leading to abrupt speciation is a well-known, common mechanism in angiosperm evolution; only recently, however, have similar hybridization and speciation mechanisms been documented to occur frequently among the closely related group of sensu stricto Saccharomyces yeasts. The economically important lager beer yeast Saccharomyces pastorianus is such a hybrid, formed by the union of Saccharomyces cerevisiae and Saccharomyces bayanus-related yeasts; efforts to understand its complex genome, searching for both biological and brewing-related insights, have been underway since its hybrid nature was first discovered. It had been generally thought that a single hybridization event resulted in a unique S. pastorianus species, but it has been recently postulated that there have been two or more hybridization events. Here, we show that there may have been two independent origins of S. pastorianus strains, and that each independent group--defined by characteristic genome rearrangements, copy number variations, ploidy differences, and DNA sequence polymorphisms--is correlated with specific breweries and/or geographic locations. Finally, by reconstructing common ancestral genomes via array-CGH data analysis and by comparing representative DNA sequences of the S. pastorianus strains with those of many different S. cerevisiae isolates, we have determined that the most likely S. cerevisiae ancestral parent for each of the independent S. pastorianus groups was an ale yeast, with different, but closely related ale strains contributing to each group's parentage.

    View details for DOI 10.1101/gr.076075.108

    View details for Web of Science ID 000259700800008

    View details for PubMedID 18787083

  • PortEco: a resource for exploring bacterial biology through high-throughput data and analysis tools. Nucleic acids research Hu, J. C., Sherlock, G., Siegele, D. A., Aleksander, S. A., Ball, C. A., Demeter, J., Gouni, S., Holland, T. A., Karp, P. D., Lewis, J. E., Liles, N. M., McIntosh, B. K., Mi, H., Muruganujan, A., Wymore, F., Thomas, P. D. 2014; 42 (1): D677-84

    Abstract

    PortEco (http://porteco.org) aims to collect, curate and provide data and analysis tools to support basic biological research in Escherichia coli (and eventually other bacterial systems). PortEco is implemented as a 'virtual' model organism database that provides a single unified interface to the user, while integrating information from a variety of sources. The main focus of PortEco is to enable broad use of the growing number of high-throughput experiments available for E. coli, and to leverage community annotation through the EcoliWiki and GONUTS systems. Currently, PortEco includes curated data from hundreds of genome-wide RNA expression studies, from high-throughput phenotyping of single-gene knockouts under hundreds of annotated conditions, from chromatin immunoprecipitation experiments for tens of different DNA-binding factors and from ribosome profiling experiments that yield insights into protein expression. Conditions have been annotated with a consistent vocabulary, and data have been consistently normalized to enable users to find, compare and interpret relevant experiments. PortEco includes tools for data analysis, including clustering, enrichment analysis and exploration via genome browsers. PortEco search and data analysis tools are extensively linked to the curated gene, metabolic pathway and regulation content at its sister site, EcoCyc.

    View details for DOI 10.1093/nar/gkt1203

    View details for PubMedID 24285306

  • PortEco: a resource for exploring bacterial biology through high-throughput data and analysis tools NUCLEIC ACIDS RESEARCH Hu, J. C., Sherlock, G., Siegele, D. A., Aleksander, S. A., Ball, C. A., Demeter, J., Gouni, S., Holland, T. A., Karp, P. D., Lewis, J. E., Liles, N. M., McIntosh, B. K., Mi, H., Muruganujan, A., Wymore, F., Thomas, P. D. 2014; 42 (D1): D677-D684
  • The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations NUCLEIC ACIDS RESEARCH Cerqueira, G. C., Arnaud, M. B., Inglis, D. O., Skrzypek, M. S., Binkley, G., Simison, M., Miyasato, S. R., Binkley, J., Orvis, J., Shah, P., Wymore, F., Sherlock, G., Wortman, J. R. 2014; 42 (D1): D705-D710
  • The Candida Genome Database: The new homology information page highlights protein similarity and phylogeny NUCLEIC ACIDS RESEARCH Binkley, J., Arnaud, M. B., Inglis, D. O., Skrzypek, M. S., Shah, P., Wymore, F., Binkley, G., Miyasato, S. R., Simison, M., Sherlock, G. 2014; 42 (D1): D711-D716
  • The Candida Genome Database: The new homology information page highlights protein similarity and phylogeny. Nucleic acids research Binkley, J., Arnaud, M. B., Inglis, D. O., Skrzypek, M. S., Shah, P., Wymore, F., Binkley, G., Miyasato, S. R., Simison, M., Sherlock, G. 2014; 42 (1): D711-6

    Abstract

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is a freely available online resource that provides gene, protein and sequence information for multiple Candida species, along with web-based tools for accessing, analyzing and exploring these data. The goal of CGD is to facilitate and accelerate research into Candida pathogenesis and biology. The CGD Web site is organized around Locus pages, which display information collected about individual genes. Locus pages have multiple tabs for accessing different types of information; the default Summary tab provides an overview of the gene name, aliases, phenotype and Gene Ontology curation, whereas other tabs display more in-depth information, including protein product details for coding genes, notes on changes to the sequence or structure of the gene and a comprehensive reference list. Here, in this update to previous NAR Database articles featuring CGD, we describe a new tab that we have added to the Locus page, entitled the Homology Information tab, which displays phylogeny and gene similarity information for each locus.

    View details for DOI 10.1093/nar/gkt1046

    View details for PubMedID 24185697

  • The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations. Nucleic acids research Cerqueira, G. C., Arnaud, M. B., Inglis, D. O., Skrzypek, M. S., Binkley, G., Simison, M., Miyasato, S. R., Binkley, J., Orvis, J., Shah, P., Wymore, F., Sherlock, G., Wortman, J. R. 2014; 42 (1): D705-10

    Abstract

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome.

    View details for DOI 10.1093/nar/gkt1029

    View details for PubMedID 24194595

  • Identification of cell cycle-regulated genes periodically expressed in U2OS cells and their regulation by FOXM1 and E2F transcription factors MOLECULAR BIOLOGY OF THE CELL Grant, G. D., Brooks, L., Zhang, X., Mahoney, J. M., Martyanov, V., Wood, T. A., Sherlock, G., Cheng, C., Whitfield, M. L. 2013; 24 (23): 3634-3650

    Abstract

    We identify the cell cycle-regulated mRNA transcripts genome-wide in the osteosarcoma-derived U2OS cell line. This results in 2140 transcripts mapping to 1871 unique cell cycle-regulated genes that show periodic oscillations across multiple synchronous cell cycles. We identify genomic loci bound by the G2/M transcription factor FOXM1 by chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and associate these with cell cycle-regulated genes. FOXM1 is bound to cell cycle-regulated genes with peak expression in both S phase and G2/M phases. We show that ChIP-seq genomic loci are responsive to FOXM1 using a real-time luciferase assay in live cells, showing that FOXM1 strongly activates promoters of G2/M phase genes and weakly activates those induced in S phase. Analysis of ChIP-seq data from a panel of cell cycle transcription factors (E2F1, E2F4, E2F6, and GABPA) from the Encyclopedia of DNA Elements and ChIP-seq data for the DREAM complex finds that a set of core cell cycle genes regulated in both U2OS and HeLa cells are bound by multiple cell cycle transcription factors. These data identify the cell cycle-regulated genes in a second cancer-derived cell line and provide a comprehensive picture of the transcriptional regulatory systems controlling periodic gene expression in the human cell division cycle.

    View details for DOI 10.1091/mbc.E13-05-0264

    View details for Web of Science ID 000328125100005

    View details for PubMedID 24109597

  • Ras Signaling Gets Fine-Tuned: Regulation of Multiple Pathogenic Traits of Candida albicans EUKARYOTIC CELL Inglis, D. O., Sherlock, G. 2013; 12 (10): 1316-1325

    Abstract

    Candida albicans is an opportunistic fungal pathogen that can cause disseminated infection in patients with indwelling catheters or other implanted medical devices. A common resident of the human microbiome, C. albicans responds to environmental signals, such as cell contact with catheter materials and exposure to serum or CO2, by triggering the expression of a variety of traits, some of which are known to contribute to its pathogenic lifestyle. Such traits include adhesion, biofilm formation, filamentation, white-to-opaque (W-O) switching, and two recently described phenotypes, finger and tentacle formation. Under distinct sets of environmental conditions and in specific cell types (mating type-like a [MTLa]/alpha cells, MTL homozygotes, or daughter cells), C. albicans utilizes (or reutilizes) a single signal transduction pathway-the Ras pathway-to affect these phenotypes. Ras1, Cyr1, Tpk2, and Pde2, the proteins of the Ras signaling pathway, are the only nontranscriptional regulatory proteins that are known to be essential for regulating all of these processes. How does C. albicans utilize this one pathway to regulate all of these phenotypes? The regulation of distinct and yet related processes by a single, evolutionarily conserved pathway is accomplished through the use of downstream transcription factors that are active under specific environmental conditions and in different cell types. In this minireview, we discuss the role of Ras signaling pathway components and Ras pathway-regulated transcription factors as well as the transcriptional regulatory networks that fine-tune gene expression in diverse biological contexts to generate specific phenotypes that impact the virulence of C. albicans.

    View details for DOI 10.1128/EC.00094-13

    View details for Web of Science ID 000324861400001

    View details for PubMedID 23913542

  • Comparative metabolic footprinting of a large number of commercial wine yeast strains in Chardonnay fermentations. FEMS yeast research Richter, C. L., Dunn, B., Sherlock, G., Pugh, T. 2013; 13 (4): 394-410

    Abstract

    Wine has been made for thousands of years. In modern times, as the importance of yeast as an ingredient in winemaking became better appreciated, companies worldwide have collected and marketed specific yeast strains for enhancing positive and minimizing negative attributes in wine. It is generally believed that each yeast strain contributes uniquely to fermentation performance and wine style because of its genetic background; however, the impact of metabolic diversity among wine yeasts on aroma compound production has not been extensively studied. We characterized the metabolic footprints of 69 different commercial wine yeast strains in triplicate fermentations of identical Chardonnay juice, by measuring 29 primary and secondary metabolites; we additionally measured seven attributes of fermentation performance of these strains. We identified up to 1000-fold differences between strains for some of the metabolites and observed large differences in fermentation performance, suggesting significant metabolic diversity. These differences represent potential selective markers for the strains that may be important to the wine industry. Analysis of these metabolic traits further builds on the known genomic diversity of these strains and provides new insights into their genetic and metabolic relatedness.

    View details for DOI 10.1111/1567-1364.12046

    View details for PubMedID 23528123

  • Assembly of a phased diploid Candida albicans genome facilitates allele-specific measurements and provides a simple model for repeat and indel structure. Genome biology Muzzey, D., Schwartz, K., Weissman, J. S., Sherlock, G. 2013; 14 (9): R97

    Abstract

    Candida albicans is a ubiquitous opportunistic fungal pathogen that afflicts immunocompromised human hosts. With rare and transient exceptions the yeast is diploid, yet despite its clinical relevance the respective sequences of its two homologous chromosomes have not been completely resolved.We construct a phased diploid genome assembly by deep sequencing a standard laboratory wild-type strain and a panel of strains homozygous for particular chromosomes. The assembly has 700-fold coverage on average, allowing extensive revision and expansion of the number of known SNPs and indels. This phased genome significantly enhances the sensitivity and specificity of allele-specific expression measurements by enabling pooling and cross-validation of signal across multiple polymorphic sites. Additionally, the diploid assembly reveals pervasive and unexpected patterns in allelic differences between homologous chromosomes. Firstly, we see striking clustering of indels, concentrated primarily in the repeat sequences in promoters. Secondly, both indels and their repeat-sequence substrate are enriched near replication origins. Finally, we reveal an intimate link between repeat sequences and indels, which argues that repeat length is under selective pressure for most eukaryotes. This connection is described by a concise one-parameter model that explains repeat-sequence abundance in C. albicans as a function of the indel rate, and provides a general framework to interpret repeat abundance in species ranging from bacteria to humans.The phased genome assembly and insights into repeat plasticity will be valuable for better understanding allele-specific phenomena and genome evolution.

    View details for DOI 10.1186/gb-2013-14-9-r97

    View details for PubMedID 24025428

  • Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae. BMC microbiology Inglis, D. O., Binkley, J., Skrzypek, M. S., Arnaud, M. B., Cerqueira, G. C., Shah, P., Wymore, F., Wortman, J. R., Sherlock, G. 2013; 13: 91-?

    Abstract

    BACKGROUND: Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research. RESULTS: We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation. CONCLUSIONS: This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites.

    View details for DOI 10.1186/1471-2180-13-91

    View details for PubMedID 23617571

  • Improved Gene Ontology Annotation for Biofilm Formation, Filamentous Growth, and Phenotypic Switching in Candida albicans EUKARYOTIC CELL Inglis, D. O., Skrzypek, M. S., Arnaud, M. B., Binkley, J., Shah, P., Wymore, F., Sherlock, G. 2013; 12 (1): 101-108

    Abstract

    The opportunistic fungal pathogen Candida albicans is a significant medical threat, especially for immunocompromised patients. Experimental research has focused on specific areas of C. albicans biology, with the goal of understanding the multiple factors that contribute to its pathogenic potential. Some of these factors include cell adhesion, invasive or filamentous growth, and the formation of drug-resistant biofilms. The Gene Ontology (GO) (www.geneontology.org) is a standardized vocabulary that the Candida Genome Database (CGD) (www.candidagenome.org) and other groups use to describe the functions of gene products. To improve the breadth and accuracy of pathogenicity-related gene product descriptions and to facilitate the description of as yet uncharacterized but potentially pathogenicity-related genes in Candida species, CGD undertook a three-part project: first, the addition of terms to the biological process branch of the GO to improve the description of fungus-related processes; second, manual recuration of gene product annotations in CGD to use the improved GO vocabulary; and third, computational ortholog-based transfer of GO annotations from experimentally characterized gene products, using these new terms, to uncharacterized orthologs in other Candida species. Through genome annotation and analysis, we identified candidate pathogenicity genes in seven non-C. albicans Candida species and in one additional C. albicans strain, WO-1. We also defined a set of C. albicans genes at the intersection of biofilm formation, filamentous growth, pathogenesis, and phenotypic switching of this opportunistic fungal pathogen, which provides a compelling list of candidates for further experimentation.

    View details for DOI 10.1128/EC.00238-12

    View details for Web of Science ID 000313061100010

    View details for PubMedID 23143685

  • PHENOTYPIC AND GENOTYPIC CONVERGENCES ARE INFLUENCED BY HISTORICAL CONTINGENCY AND ENVIRONMENT IN YEAST. Evolution; international journal of organic evolution Spor, A., Kvitek, D. J., Nidelet, T., Martin, J., Legrand, J., Dillmann, C., Bourgais, A., de Vienne, D., Sherlock, G., Sicard, D. 2013

    Abstract

    Different organisms have independently and recurrently evolved similar phenotypic traits at different points throughout history. This phenotypic convergence may be caused by genotypic convergence and in addition, constrained by historical contingency. To investigate how convergence may be driven by selection in a particular environment and constrained by history, we analyzed nine life-history traits and four metabolic traits during an experimental evolution of six yeast strains in four different environments. In each of the environments, the population converged towards a different multivariate phenotype. However, the evolution of most traits, including fitness components, was constrained by history. Phenotypic convergence was partly associated with the selection of mutations in genes involved in the same pathway. By further investigating the convergence in one gene, BMH1, mutated in 20% of the evolved populations, we show that both the history and the environment influenced the types of mutations (missense/nonsense), their location within the gene itself, as well as their effects on multiple traits. However, these effects could not be easily predicted from ancestors' phylogeny or past-selection. Combined, our data highlight the role of pleiotropy and epistasis in shaping a rugged fitness landscape. This article is protected by copyright. All rights reserved.

    View details for DOI 10.1111/evo.12302

    View details for PubMedID 24164389

  • Turbidostat culture of Saccharomyces cerevisiae W303-1A under selective pressure elicited by ethanol selects for mutations in SSD1 and UTH1 FEMS YEAST RESEARCH Avrahami-Moyal, L., Engelberg, D., Wenger, J. W., Sherlock, G., Braun, S. 2012; 12 (5): 521-533

    Abstract

    We investigated the genetic causes of ethanol tolerance by characterizing mutations selected in Saccharomyces cerevisiae W303-1A under the selective pressure of ethanol. W303-1A was subjected to three rounds of turbidostat, in a medium supplemented with increasing amounts of ethanol. By the end of selection, the growth rate of the culture has increased from 0.029 to 0.32 h(-1) . Unlike the progenitor strain, all yeast cells isolated from this population were able to form colonies on medium supplemented with 7% ethanol within 6 days, our definition of ethanol tolerance. Several clones selected from all three stages of selection were able to form dense colonies within 2 days on solid medium supplemented with 9% ethanol. We sequenced the whole genomes of six clones and identified mutations responsible for ethanol tolerance. Thirteen additional clones were tested for the presence of similar mutations. In 15 of 19 tolerant clones, the stop codon in ssd1-d was replaced with an amino acid-encoding codon. Three other clones contained one of two mutations in UTH1, and one clone did not contain mutations in either SSD1 or UTH1. We showed that the mutations in SSD1 and UTH1 increased tolerance of the cell wall to zymolyase and conclude that stability of the cell wall is a major factor in increased tolerance to ethanol.

    View details for DOI 10.1111/j.1567-1364.2012.00803.x

    View details for Web of Science ID 000306189600003

    View details for PubMedID 22443114

  • APJ1 and GRE3 Homologs Work in Concert to Allow Growth in Xylose in a Natural Saccharomyces sensu stricto Hybrid Yeast GENETICS Schwartz, K., Wenger, J. W., Dunn, B., Sherlock, G. 2012; 191 (2): 621-U504

    Abstract

    Creating Saccharomyces yeasts capable of efficient fermentation of pentoses such as xylose remains a key challenge in the production of ethanol from lignocellulosic biomass. Metabolic engineering of industrial Saccharomyces cerevisiae strains has yielded xylose-fermenting strains, but these strains have not yet achieved industrial viability due largely to xylose fermentation being prohibitively slower than that of glucose. Recently, it has been shown that naturally occurring xylose-utilizing Saccharomyces species exist. Uncovering the genetic architecture of such strains will shed further light on xylose metabolism, suggesting additional engineering approaches or possibly even enabling the development of xylose-fermenting yeasts that are not genetically modified. We previously identified a hybrid yeast strain, the genome of which is largely Saccharomyces uvarum, which has the ability to grow on xylose as the sole carbon source. To circumvent the sterility of this hybrid strain, we developed a novel method to genetically characterize its xylose-utilization phenotype, using a tetraploid intermediate, followed by bulk segregant analysis in conjunction with high-throughput sequencing. We found that this strain's growth in xylose is governed by at least two genetic loci, within which we identified the responsible genes: one locus contains a known xylose-pathway gene, a novel homolog of the aldo-keto reductase gene GRE3, while a second locus contains a homolog of APJ1, which encodes a putative chaperone not previously connected to xylose metabolism. Our work demonstrates that the power of sequencing combined with bulk segregant analysis can also be applied to a nongenetically tractable hybrid strain that contains a complex, polygenic trait, and identifies new avenues for metabolic engineering as well as for construction of nongenetically modified xylose-fermenting strains.

    View details for DOI 10.1534/genetics.112.140053

    View details for Web of Science ID 000308999300020

    View details for PubMedID 22426884

  • Analysis of the Saccharomyces cerevisiae pan-genome reveals a pool of copy number variants distributed in diverse yeast strains from differing industrial environments GENOME RESEARCH Dunn, B., Richter, C., Kvitek, D. J., Pugh, T., Sherlock, G. 2012; 22 (5): 908-924

    Abstract

    Although the budding yeast Saccharomyces cerevisiae is arguably one of the most well-studied organisms on earth, the genome-wide variation within this species--i.e., its "pan-genome"--has been less explored. We created a multispecies microarray platform containing probes covering the genomes of several Saccharomyces species: S. cerevisiae, including regions not found in the standard laboratory S288c strain, as well as the mitochondrial and 2-?m circle genomes-plus S. paradoxus, S. mikatae, S. kudriavzevii, S. uvarum, S. kluyveri, and S. castellii. We performed array-Comparative Genomic Hybridization (aCGH) on 83 different S. cerevisiae strains collected across a wide range of habitats; of these, 69 were commercial wine strains, while the remaining 14 were from a diverse set of other industrial and natural environments. We observed interspecific hybridization events, introgression events, and pervasive copy number variation (CNV) in all but a few of the strains. These CNVs were distributed throughout the strains such that they did not produce any clear phylogeny, suggesting extensive mating in both industrial and wild strains. To validate our results and to determine whether apparently similar introgressions and CNVs were identical by descent or recurrent, we also performed whole-genome sequencing on nine of these strains. These data may help pinpoint genomic regions involved in adaptation to different industrial milieus, as well as shed light on the course of domestication of S. cerevisiae.

    View details for DOI 10.1101/gr.130310.111

    View details for Web of Science ID 000303369600010

    View details for PubMedID 22369888

  • Different selective pressures lead to different genomic outcomes as newly-formed hybrid yeasts evolve BMC EVOLUTIONARY BIOLOGY Piotrowski, J. S., Nagarajan, S., Kroll, E., Stanbery, A., Chiotti, K. E., Kruckeberg, A. L., Dunn, B., Sherlock, G., Rosenzweig, F. 2012; 12

    Abstract

    Interspecific hybridization occurs in every eukaryotic kingdom. While hybrid progeny are frequently at a selective disadvantage, in some instances their increased genome size and complexity may result in greater stress resistance than their ancestors, which can be adaptively advantageous at the edges of their ancestors' ranges. While this phenomenon has been repeatedly documented in the field, the response of hybrid populations to long-term selection has not often been explored in the lab. To fill this knowledge gap we crossed the two most distantly related members of the Saccharomyces sensu stricto group, S. cerevisiae and S. uvarum, and established a mixed population of homoploid and aneuploid hybrids to study how different types of selection impact hybrid genome structure.As temperature was raised incrementally from 31°C to 46.5°C over 500 generations of continuous culture, selection favored loss of the S. uvarum genome, although the kinetics of genome loss differed among independent replicates. Temperature-selected isolates exhibited greater inherent and induced thermal tolerance than parental species and founding hybrids, and also exhibited ethanol resistance. In contrast, as exogenous ethanol was increased from 0% to 14% over 500 generations of continuous culture, selection favored euploid S. cerevisiae x S. uvarum hybrids. Ethanol-selected isolates were more ethanol tolerant than S. uvarum and one of the founding hybrids, but did not exhibit resistance to temperature stress. Relative to parental and founding hybrids, temperature-selected strains showed heritable differences in cell wall structure in the forms of increased resistance to zymolyase digestion and Micafungin, which targets cell wall biosynthesis.This is the first study to show experimentally that the genomic fate of newly-formed interspecific hybrids depends on the type of selection they encounter during the course of evolution, underscoring the importance of the ecological theatre in determining the outcome of the evolutionary play.

    View details for DOI 10.1186/1471-2148-12-46

    View details for Web of Science ID 000305180500001

    View details for PubMedID 22471618

  • The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata NUCLEIC ACIDS RESEARCH Inglis, D. O., Arnaud, M. B., Binkley, J., Shah, P., Skrzypek, M. S., Wymore, F., Binkley, G., Miyasato, S. R., Simison, M., Sherlock, G. 2012; 40 (D1): D667-D674

    View details for DOI 10.1093/nar/gkr945

    View details for Web of Science ID 000298601300101

  • The Aspergillus Genome Database (AspGD): recent developments in comprehensive multispecies curation, comparative genomics and community resources NUCLEIC ACIDS RESEARCH Arnaud, M. B., Cerqueira, G. C., Inglis, D. O., Skrzypek, M. S., Binkley, J., Chibucos, M. C., Crabtree, J., Howarth, C., Orvis, J., Shah, P., Wymore, F., Binkley, G., Miyasato, S. R., Simison, M., Sherlock, G., Wortman, J. R. 2012; 40 (D1): D653-D659

    View details for DOI 10.1093/nar/gkr875

    View details for Web of Science ID 000298601300099

  • The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata. Nucleic acids research Inglis, D. O., Arnaud, M. B., Binkley, J., Shah, P., Skrzypek, M. S., Wymore, F., Binkley, G., Miyasato, S. R., Simison, M., Sherlock, G. 2012; 40 (Database issue): D667-74

    Abstract

    The Candida Genome Database (CGD, http://www.candidagenome.org/) is an internet-based resource that provides centralized access to genomic sequence data and manually curated functional information about genes and proteins of the fungal pathogen Candida albicans and other Candida species. As the scope of Candida research, and the number of sequenced strains and related species, has grown in recent years, the need for expanded genomic resources has also grown. To answer this need, CGD has expanded beyond storing data solely for C. albicans, now integrating data from multiple species. Herein we describe the incorporation of this multispecies information, which includes curated gene information and the reference sequence for C. glabrata, as well as orthology relationships that interconnect Locus Summary pages, allowing easy navigation between genes of C. albicans and C. glabrata. These orthology relationships are also used to predict GO annotations of their products. We have also added protein information pages that display domains, structural information and physicochemical properties; bibliographic pages highlighting important topic areas in Candida biology; and a laboratory strain lineage page that describes the lineage of commonly used laboratory strains. All of these data are freely available at http://www.candidagenome.org/. We welcome feedback from the research community at candida-curator@lists.stanford.edu.

    View details for DOI 10.1093/nar/gkr945

    View details for PubMedID 22064862

  • The Aspergillus Genome Database (AspGD): recent developments in comprehensive multispecies curation, comparative genomics and community resources. Nucleic acids research Arnaud, M. B., Cerqueira, G. C., Inglis, D. O., Skrzypek, M. S., Binkley, J., Chibucos, M. C., Crabtree, J., Howarth, C., Orvis, J., Shah, P., Wymore, F., Binkley, G., Miyasato, S. R., Simison, M., Sherlock, G., Wortman, J. R. 2012; 40 (Database issue): D653-9

    Abstract

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available, web-based resource for researchers studying fungi of the genus Aspergillus, which includes organisms of clinical, agricultural and industrial importance. AspGD curators have now completed comprehensive review of the entire published literature about Aspergillus nidulans and Aspergillus fumigatus, and this annotation is provided with streamlined, ortholog-based navigation of the multispecies information. AspGD facilitates comparative genomics by providing a full-featured genomics viewer, as well as matched and standardized sets of genomic information for the sequenced aspergilli. AspGD also provides resources to foster interaction and dissemination of community information and resources. We welcome and encourage feedback at aspergillus-curator@lists.stanford.edu.

    View details for DOI 10.1093/nar/gkr875

    View details for PubMedID 22080559

  • GC-Content Normalization for RNA-Seq Data BMC BIOINFORMATICS Risso, D., Schwartz, K., Sherlock, G., Dudoit, S. 2011; 12

    Abstract

    Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq.Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.

    View details for DOI 10.1186/1471-2105-12-480

    View details for Web of Science ID 000302434000001

    View details for PubMedID 22177264

  • DNA methylation profiling reveals novel biomarkers and important roles for DNA methyltransferases in prostate cancer GENOME RESEARCH Kobayashi, Y., Absher, D. M., Gulzar, Z. G., Young, S. R., McKenney, J. K., Peehl, D. M., Brooks, J. D., Myers, R. M., Sherlock, G. 2011; 21 (7): 1017-1027

    Abstract

    Candidate gene-based studies have identified a handful of aberrant CpG DNA methylation events in prostate cancer. However, DNA methylation profiles have not been compared on a large scale between prostate tumor and normal prostate, and the mechanisms behind these alterations are unknown. In this study, we quantitatively profiled 95 primary prostate tumors and 86 benign adjacent prostate tissue samples for their DNA methylation levels at 26,333 CpGs representing 14,104 gene promoters by using the Illumina HumanMethylation27 platform. A 2-class Significance Analysis of this data set revealed 5912 CpG sites with increased DNA methylation and 2151 CpG sites with decreased DNA methylation in tumors (FDR < 0.8%). Prediction Analysis of this data set identified 87 CpGs that are the most predictive diagnostic methylation biomarkers of prostate cancer. By integrating available clinical follow-up data, we also identified 69 prognostic DNA methylation alterations that correlate with biochemical recurrence of the tumor. To identify the mechanisms responsible for these genome-wide DNA methylation alterations, we measured the gene expression levels of several DNA methyltransferases (DNMTs) and their interacting proteins by TaqMan qPCR and observed increased expression of DNMT3A2, DNMT3B, and EZH2 in tumors. Subsequent transient transfection assays in cultured primary prostate cells revealed that DNMT3B1 and DNMT3B2 overexpression resulted in increased methylation of a substantial subset of CpG sites that showed tumor-specific increased methylation.

    View details for DOI 10.1101/gr.119487.110

    View details for Web of Science ID 000292298000003

    View details for PubMedID 21521786

  • Integrated genomic analyses of ovarian carcinoma NATURE Bell, D., Berchuck, A., Birrer, M., Chien, J., Cramer, D. W., Dao, F., Dhir, R., Disaia, P., Gabra, H., Glenn, P., Godwin, A. K., GROSS, J., Hartmann, L., Huang, M., Huntsman, D. G., Iacocca, M., Imielinski, M., Kalloger, S., Karlan, B. Y., Levine, D. A., Mills, G. B., Morrison, C., Mutch, D., Olvera, N., Orsulic, S., Park, K., Petrelli, N., Rabeno, B., Rader, J. S., Sikic, B. I., Smith-McCune, K., Sood, A. K., Bowtell, D., PENNY, R., Testa, J. R., Chang, K., Dinh, H. H., Drummond, J. A., Fowler, G., Gunaratne, P., Hawes, A. C., Kovar, C. L., Lewis, L. R., Morgan, M. B., Newsham, I. F., Santibanez, J., Reid, J. G., Trevino, L. R., Wu, Y., Wang, M., Muzny, D. M., Wheeler, D. A., Gibbs, R. A., Getz, G., Lawrence, M. S., Cibulskis, K., Sivachenko, A. Y., Sougnez, C., VOET, D., Wilkinson, J., Bloom, T., Ardlie, K., Fennell, T., Baldwin, J., Gabriel, S., Lander, E. S., Ding, L., Fulton, R. S., Koboldt, D. C., McLellan, M. D., Wylie, T., Walker, J., O'Laughlin, M., Dooling, D. J., Fulton, L., Abbott, R., Dees, N. D., Zhang, Q., Kandoth, C., Wendl, M., Schierding, W., Shen, D., Harris, C. C., Schmidt, H., Kalicki, J., Delehaunty, K. D., Fronick, C. C., Demeter, R., Cook, L., Wallis, J. W., Lin, L., Magrini, V. J., Hodges, J. S., ELDRED, J. M., Smith, S. M., Pohl, C. S., Vandin, F., Raphael, B. J., Weinstock, G. M., Mardis, R., Wilson, R. K., Meyerson, M., Winckler, W., Getz, G., Verhaak, R. G., Carter, S. L., Mermel, C. H., Saksena, G., Nguyen, H., Onofrio, R. C., Lawrence, M. S., Hubbard, D., Gupta, S., Crenshaw, A., RAMOS, A. H., Ardlie, K., Chin, L., Protopopov, A., Zhang, J., Kim, T. M., Perna, I., Xiao, Y., Zhang, H., Ren, G., Sathiamoorthy, N., Park, R. W., Lee, E., Park, P. J., Kucherlapati, R., Absher, D. M., Waite, L., Sherlock, G., Brooks, J. D., Li, J. Z., Xu, J., Myers, R. M., Laird, P. W., Cope, L., Herman, J. G., Shen, H., Weisenberger, D. J., Noushmehr, H., Pan, F., Triche, T., Berman, B. P., Van den Berg, D. J., Buckley, J., BAYLIN, S. B., Spellman, P. T., Purdom, E., Neuvial, P., Bengtsson, H., Jakkula, L. R., Durinck, S., Han, J., Dorton, S., Marr, H., Choi, Y. G., Wang, V., Wang, N. J., Ngai, J., Conboy, J. G., Parvin, B., Feiler, H. S., Speed, T. P., Gray, J. W., Levine, D. A., Socci, N. D., Liang, Y., Taylor, B. S., Schultz, N., Borsu, L., Lash, A. E., Brennan, C., Viale, A., Sander, C., Ladanyi, M., Hoadley, K. A., Meng, S., Du, Y., Shi, Y., Li, L., Turman, Y. J., Zang, D., Helms, E. B., Balu, S., Zhou, X., Wu, J., Topal, M. D., Hayes, D. N., Perou, C. M., Getz, G., VOET, D., Saksena, G., Zhang, J., Zhang, H., Wu, C. J., Shukla, S., Cibulskis, K., Lawrence, M. S., Sivachenko, A., Jing, R., Park, R. W., Liu, Y., Park, P. J., Noble, M., Chin, L., Carter, H., Kim, D., Karchin, R., Spellman, P. T., Purdom, E., Neuvial, P., Bengtsson, H., Durinck, S., Han, J., Korkola, J. E., Heiser, L. M., Cho, R. J., Hu, Z., Parvin, B., Speed, T. P., Gray, J. W., Schultz, N., Cerami, E., Taylor, B. S., Olshen, A., Reva, B., Antipin, Y., Shen, R., Mankoo, P., Sheridan, R., Ciriello, G., Chang, W. K., Bernanke, J. A., Borsu, L., Levine, D. A., Ladanyi, M., Sander, C., Haussler, D., Benz, C. C., Stuart, J. M., Benz, S. C., Sanborn, J. Z., Vaske, C. J., Zhu, J., Szeto, C., Scott, G. K., Yau, C., Hoadley, K. A., Du, Y., Balu, S., Hayes, D. N., Perou, C. M., Wilkerson, M. D., Zhang, N., Akbani, R., Baggerly, K. A., YUNG, W. K., Mills, G. B., Weinstein, J. N., PENNY, R., Shelton, T., Grimm, D., Hatfield, M., Morris, S., Yena, P., Rhodes, P., Sherman, M., Paulauskis, J., Millis, S., Kahn, A., Greene, J. M., Sfeir, R., Jensen, M. A., Chen, J., Whitmore, J., Alonso, S., Jordan, J., Chu, A., Zhang, J., Barker, A., Compton, C., Eley, G., Ferguson, M., Fielding, P., Gerhard, D. S., Myles, R., Schaefer, C., Shaw, K. R., Vaught, J., Vockley, J. B., Good, P. J., Guyer, M. S., Ozenberger, B., Peterson, J., Thomson, E. 2011; 474 (7353): 609-615

    Abstract

    A catalogue of molecular aberrations that cause ovarian cancer is critical for developing and deploying therapies that will improve patients' lives. The Cancer Genome Atlas project has analysed messenger RNA expression, microRNA expression, promoter methylation and DNA copy number in 489 high-grade serous ovarian adenocarcinomas and the DNA sequences of exons from coding genes in 316 of these tumours. Here we report that high-grade serous ovarian cancer is characterized by TP53 mutations in almost all tumours (96%); low prevalence but statistically recurrent somatic mutations in nine further genes including NF1, BRCA1, BRCA2, RB1 and CDK12; 113 significant focal DNA copy number aberrations; and promoter methylation events involving 168 genes. Analyses delineated four ovarian cancer transcriptional subtypes, three microRNA subtypes, four promoter methylation subtypes and a transcriptional signature associated with survival duration, and shed new light on the impact that tumours with BRCA1/2 (BRCA1 or BRCA2) and CCNE1 aberrations have on survival. Pathway analyses suggested that homologous recombination is defective in about half of the tumours analysed, and that NOTCH and FOXM1 signalling are involved in serous ovarian cancer pathophysiology.

    View details for DOI 10.1038/nature10166

    View details for Web of Science ID 000292204300032

    View details for PubMedID 21720365

  • Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads BMC GENOMICS Martin, J., Bruno, V. M., Fang, Z., Meng, X., Blow, M., Zhang, T., Sherlock, G., Snyder, M., Wang, Z. 2010; 11

    Abstract

    Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied.Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95%) and reconstruct full-length genes for the majority of the existing gene models (54.3%). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics.These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.

    View details for DOI 10.1186/1471-2164-11-663

    View details for Web of Science ID 000285303000001

    View details for PubMedID 21106091

  • Annotare-a tool for annotating high-throughput biomedical investigations and resulting data BIOINFORMATICS Shankar, R., Parkinson, H., Burdett, T., Hastings, E., Liu, J., Miller, M., Srinivasa, R., White, J., Brazma, A., Sherlock, G., Stoeckert, C. J., Ball, C. A. 2010; 26 (19): 2470-2471

    Abstract

    Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis.Annotare is available from http://code.google.com/p/annotare/ under the terms of the open-source MIT License (http://www.opensource.org/licenses/mit-license.php). It has been tested on both Mac and Windows.

    View details for DOI 10.1093/bioinformatics/btq462

    View details for Web of Science ID 000282170000021

    View details for PubMedID 20733062

  • Comprehensive annotation of the transcriptome of the human fungal pathogen Candida albicans using RNA-seq GENOME RESEARCH Bruno, V. M., Wang, Z., Marjani, S. L., Euskirchen, G. M., Martin, J., Sherlock, G., Snyder, M. 2010; 20 (10): 1451-1458

    Abstract

    Candida albicans is the major invasive fungal pathogen of humans, causing diseases ranging from superficial mucosal infections to disseminated, systemic infections that are often lifethreatening. We have used massively parallel high-throughput sequencing of cDNA (RNA-seq) to generate a high-resolution map of the C. albicans transcriptome under several different environmental conditions. We have quantitatively determined all of the regions that are transcribed under these different conditions, and have identified 602 novel transcriptionally active regions (TARs) and numerous novel introns that are not represented in the current genome annotation. Interestingly, the expression of many of these TARs is regulated in a condition-specific manner. This comprehensive transcriptome analysis significantly enhances the current genome annotation of C. albicans, a necessary framework for a complete understanding of the molecular mechanisms of pathogenesis for this important eukaryotic pathogen.

    View details for DOI 10.1101/gr.109553.110

    View details for Web of Science ID 000282375000015

    View details for PubMedID 20810668

  • Microarray karyotyping of maltose-fermenting Saccharomyces yeasts with differing maltotriose utilization profiles reveals copy number variation in genes involved in maltose and maltotriose utilization JOURNAL OF APPLIED MICROBIOLOGY Duval, E. H., Alves, S. L., Dunn, B., Sherlock, G., Stambuk, B. U. 2010; 109 (1): 248-259

    Abstract

    We performed an analysis of maltotriose utilization by 52 Saccharomyces yeast strains able to ferment maltose efficiently and correlated the observed phenotypes with differences in the copy number of genes possibly involved in maltotriose utilization by yeast cells.The analysis of maltose and maltotriose utilization by laboratory and industrial strains of the species Saccharomyces cerevisiae and Saccharomyces pastorianus (a natural S. cerevisiae/Saccharomyces bayanus hybrid) was carried out using microscale liquid cultivation, as well as in aerobic batch cultures. All strains utilize maltose efficiently as a carbon source, but three different phenotypes were observed for maltotriose utilization: efficient growth, slow/delayed growth and no growth. Through microarray karyotyping and pulsed-field gel electrophoresis blots, we analysed the copy number and localization of several maltose-related genes in selected S. cerevisiae strains. While most strains lacked the MPH2 and MPH3 transporter genes, almost all strains analysed had the AGT1 gene and increased copy number of MALx1 permeases.Our results showed that S. pastorianus yeast strains utilized maltotriose more efficiently than S. cerevisiae strains and highlighted the importance of the AGT1 gene for efficient maltotriose utilization by S. cerevisiae yeasts.Our results revealed new maltotriose utilization phenotypes, contributing to a better understanding of the metabolism of this carbon source for improved fermentation by Saccharomyces yeasts.

    View details for DOI 10.1111/j.1365-2672.2009.04656.x

    View details for Web of Science ID 000278674300024

    View details for PubMedID 20070441

  • A Genome-Wide Analysis Reveals No Nuclear Dobzhansky-Muller Pairs of Determinants of Speciation between S. cerevisiae and S. paradoxus, but Suggests More Complex Incompatibilities PLOS GENETICS Kao, K. C., Schwartz, K., Sherlock, G. 2010; 6 (7)

    Abstract

    The Dobzhansky-Muller (D-M) model of speciation by genic incompatibility is widely accepted as the primary cause of interspecific postzygotic isolation. Since the introduction of this model, there have been theoretical and experimental data supporting the existence of such incompatibilities. However, speciation genes have been largely elusive, with only a handful of candidate genes identified in a few organisms. The Saccharomyces sensu stricto yeasts, which have small genomes and can mate interspecifically to produce sterile hybrids, are thus an ideal model for studying postzygotic isolation. Among them, only a single D-M pair, comprising a mitochondrially targeted product of a nuclear gene and a mitochondrially encoded locus, has been found. Thus far, no D-M pair of nuclear genes has been identified between any sensu stricto yeasts. We report here the first detailed genome-wide analysis of rare meiotic products from an otherwise sterile hybrid and show that no classic D-M pairs of speciation genes exist between the nuclear genomes of the closely related yeasts S. cerevisiae and S. paradoxus. Instead, our analyses suggest that more complex interactions, likely involving multiple loci having weak effects, may be responsible for their post-zygotic separation. The lack of a nuclear encoded classic D-M pair between these two yeasts, yet the existence of multiple loci that may each exert a small effect through complex interactions suggests that initial speciation events might not always be mediated by D-M pairs. An alternative explanation may be that the accumulation of polymorphisms leads to gamete inviability due to the activities of anti-recombination mechanisms and/or incompatibilities between the species' transcriptional and metabolic networks, with no single pair at least initially being responsible for the incompatibility. After such a speciation event, it is possible that one or more D-M pairs might subsequently arise following isolation.

    View details for DOI 10.1371/journal.pgen.1001038

    View details for Web of Science ID 000280512700034

    View details for PubMedID 20686707

  • TB database 2010: Overview and update TUBERCULOSIS Galagan, J. E., Sisk, P., Stolte, C., Weiner, B., Koehrsen, M., Wymore, F., Reddy, T. B., Zucker, J. D., Engels, R., Gellesch, M., Hubble, J., Jin, H., Larson, L., Mao, M., Nitzberg, M., White, J., Zachariah, Z. K., Sherlock, G., Ball, C. A., Schoolnik, G. K. 2010; 90 (4): 225-235

    Abstract

    The Tuberculosis Database (TBDB) is an online database providing integrated access to genome sequence, expression data and literature curation for TB. TBDB currently houses genome assemblies for numerous strains of Mycobacterium tuberculosis (MTB) as well assemblies for over 20 strains related to MTB and useful for comparative analysis. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives, including over 3000 MTB microarrays, 95 RT-PCR datasets, 2700 microarrays for human and mouse TB related experiments, and 260 arrays for Streptomyces coelicolor. To enable wide use of these data, TBDB provides a suite of tools for searching, browsing, analyzing, and downloading the data. We provide here an overview of TBDB focusing on recent data releases and enhancements. In particular, we describe the recent release of a Global Genetic Diversity dataset for TB, support for short-read re-sequencing data, new tools for exploring gene expression data in the context of gene regulation, and the integration of a metabolic network reconstruction and BioCyc with TBDB. By integrating a wide range of genomic data with tools for their use, TBDB is a unique platform for both basic science research in TB, as well as research into the discovery and development of TB drugs, vaccines and biomarkers.

    View details for DOI 10.1016/j.tube.2010.03.010

    View details for Web of Science ID 000280233900002

    View details for PubMedID 20488753

  • Bulk Segregant Analysis by High-Throughput Sequencing Reveals a Novel Xylose Utilization Gene from Saccharomyces cerevisiae PLOS GENETICS Wenger, J. W., Schwartz, K., Sherlock, G. 2010; 6 (5)

    Abstract

    Fermentation of xylose is a fundamental requirement for the efficient production of ethanol from lignocellulosic biomass sources. Although they aggressively ferment hexoses, it has long been thought that native Saccharomyces cerevisiae strains cannot grow fermentatively or non-fermentatively on xylose. Population surveys have uncovered a few naturally occurring strains that are weakly xylose-positive, and some S. cerevisiae have been genetically engineered to ferment xylose, but no strain, either natural or engineered, has yet been reported to ferment xylose as efficiently as glucose. Here, we used a medium-throughput screen to identify Saccharomyces strains that can increase in optical density when xylose is presented as the sole carbon source. We identified 38 strains that have this xylose utilization phenotype, including strains of S. cerevisiae, other sensu stricto members, and hybrids between them. All the S. cerevisiae xylose-utilizing strains we identified are wine yeasts, and for those that could produce meiotic progeny, the xylose phenotype segregates as a single gene trait. We mapped this gene by Bulk Segregant Analysis (BSA) using tiling microarrays and high-throughput sequencing. The gene is a putative xylitol dehydrogenase, which we name XDH1, and is located in the subtelomeric region of the right end of chromosome XV in a region not present in the S288c reference genome. We further characterized the xylose phenotype by performing gene expression microarrays and by genetically dissecting the endogenous Saccharomyces xylose pathway. We have demonstrated that natural S. cerevisiae yeasts are capable of utilizing xylose as the sole carbon source, characterized the genetic basis for this trait as well as the endogenous xylose utilization pathway, and demonstrated the feasibility of BSA using high-throughput sequencing.

    View details for DOI 10.1371/journal.pgen.1000942

    View details for Web of Science ID 000278557300012

    View details for PubMedID 20485559

  • The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community NUCLEIC ACIDS RESEARCH Arnaud, M. B., Chibucos, M. C., Costanzo, M. C., Crabtree, J., Inglis, D. O., Lotia, A., Orvis, J., Shah, P., Skrzypek, M. S., Binkley, G., Miyasato, S. R., Wortman, J. R., Sherlock, G. 2010; 38: D420-D427

    Abstract

    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.

    View details for DOI 10.1093/nar/gkp751

    View details for Web of Science ID 000276399100066

    View details for PubMedID 19773420

  • New tools at the Candida Genome Database: biochemical pathways and full-text literature search NUCLEIC ACIDS RESEARCH Skrzypek, M. S., Arnaud, M. B., Costanzo, M. C., Inglis, D. O., Shah, P., Binkley, G., Miyasato, S. R., Sherlock, G. 2010; 38: D428-D432

    Abstract

    The Candida Genome Database (CGD, http://www.candidagenome.org/) provides online access to genomic sequence data and manually curated functional information about genes and proteins of the human pathogen Candida albicans. Herein, we describe two recently added features, Candida Biochemical Pathways and the Textpresso full-text literature search tool. The Biochemical Pathways tool provides visualization of metabolic pathways and analysis tools that facilitate interpretation of experimental data, including results of large-scale experiments, in the context of Candida metabolism. Textpresso for Candida allows searching through the full-text of Candida-specific literature, including clinical and epidemiological studies.

    View details for DOI 10.1093/nar/gkp836

    View details for Web of Science ID 000276399100067

    View details for PubMedID 19808938

  • Industrial fuel ethanol yeasts contain adaptive copy number changes in genes involved in vitamin B1 and B6 biosynthesis GENOME RESEARCH Stambuk, B. U., Dunn, B., Alves, S. L., Duval, E. H., Sherlock, G. 2009; 19 (12): 2271-2278

    Abstract

    Fuel ethanol is now a global energy commodity that is competitive with gasoline. Using microarray-based comparative genome hybridization (aCGH), we have determined gene copy number variations (CNVs) common to five industrially important fuel ethanol Saccharomyces cerevisiae strains responsible for the production of billions of gallons of fuel ethanol per year from sugarcane. These strains have significant amplifications of the telomeric SNO and SNZ genes, which are involved in the biosynthesis of vitamins B6 (pyridoxine) and B1 (thiamin). We show that increased copy number of these genes confers the ability to grow more efficiently under the repressing effects of thiamin, especially in medium lacking pyridoxine and with high sugar concentrations. These genetic changes have likely been adaptive and selected for in the industrial environment, and may be required for the efficient utilization of biomass-derived sugars from other renewable feedstocks.

    View details for DOI 10.1101/gr.094276.109

    View details for Web of Science ID 000272273400011

    View details for PubMedID 19897511

  • Gene Ontology and the annotation of pathogen genomes: the case of Candida albicans TRENDS IN MICROBIOLOGY Arnaud, M. B., Costanzo, M. C., Shah, P., Skrzypek, M. S., Sherlock, G. 2009; 17 (7): 295-303

    Abstract

    The Gene Ontology (GO) is a structured controlled vocabulary developed to describe the roles and locations of gene products in a consistent manner and in a way that can be shared across organisms. The unicellular fungus Candida albicans is similar in many ways to the model organism Saccharomyces cerevisiae but, as both a commensal and a pathogen of humans, differs greatly in its lifestyle. With an expanding at-risk population of immunosuppressed patients, increased use of invasive medical procedures, the increasing prevalence of drug resistance and the emergence of additional Candida species as serious pathogens, it has never been more crucial to improve our understanding of Candida biology to guide the development of better treatments. In this brief review, we examine the importance of GO in the annotation of C. albicans gene products, with a focus on those involved in pathogenesis. We also discuss how sequence information combined with GO facilitates the transfer of knowledge across related species and the challenges and opportunities that such an approach presents.

    View details for DOI 10.1016/j.tim.2009.04.007

    View details for Web of Science ID 000268616600006

    View details for PubMedID 19577928

  • Evolution of pathogenicity and sexual reproduction in eight Candida genomes NATURE Butler, G., Rasmussen, M. D., Lin, M. F., Santos, M. A., Sakthikumar, S., Munro, C. A., Rheinbay, E., Grabherr, M., Forche, A., Reedy, J. L., Agrafioti, I., Arnaud, M. B., Bates, S., Brown, A. J., Brunke, S., Costanzo, M. C., Fitzpatrick, D. A., De Groot, P. W., Harris, D., Hoyer, L. L., Hube, B., Klis, F. M., Kodira, C., Lennard, N., Logue, M. E., Martin, R., Neiman, A. M., Nikolaou, E., Quail, M. A., Quinn, J., Santos, M. C., Schmitzberger, F. F., Sherlock, G., Shah, P., Silverstein, K. A., Skrzypek, M. S., Soll, D., Staggs, R., Stansfield, I., Stumpf, M. P., Sudbery, P. E., Srikantha, T., Zeng, Q., Berman, J., Berriman, M., Heitman, J., Gow, N. A., Lorenz, M. C., Birren, B. W., Kellis, M., Cuomo, C. A. 2009; 459 (7247): 657-662

    Abstract

    Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.

    View details for DOI 10.1038/nature08064

    View details for Web of Science ID 000266608600034

    View details for PubMedID 19465905

  • Implementation of GenePattern within the Stanford Microarray Database NUCLEIC ACIDS RESEARCH Hubble, J., Demeter, J., Jin, H., Mao, M., Nitzberg, M., Reddy, T. B., Wymore, F., Zachariah, K., Sherlock, G., Ball, C. A. 2009; 37: D898-D901

    Abstract

    Hundreds of researchers across the world use the Stanford Microarray Database (SMD; http://smd.stanford.edu/) to store, annotate, view, analyze and share microarray data. In addition to providing registered users at Stanford access to their own data, SMD also provides access to public data, and tools with which to analyze those data, to any public user anywhere in the world. Previously, the addition of new microarray data analysis tools to SMD has been limited by available engineering resources, and in addition, the existing suite of tools did not provide a simple way to design, execute and share analysis pipelines, or to document such pipelines for the purposes of publication. To address this, we have incorporated the GenePattern software package directly into SMD, providing access to many new analysis tools, as well as a plug-in architecture that allows users to directly integrate and share additional tools through SMD. In this article, we describe our implementation of the GenePattern microarray analysis software package into the SMD code base. This extension is available with the SMD source code that is fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD with an enriched data analysis capability.

    View details for DOI 10.1093/nar/gkn786

    View details for Web of Science ID 000261906200157

    View details for PubMedID 18953035

  • TB database: an integrated platform for tuberculosis research NUCLEIC ACIDS RESEARCH Reddy, T. B., Riley, R., Wymore, F., Montgomery, P., DeCaprio, D., Engels, R., Gellesch, M., Hubble, J., Jen, D., Jin, H., Koehrsen, M., Larson, L., Mao, M., Nitzberg, M., Sisk, P., Stolte, C., Weiner, B., White, J., Zachariah, Z. K., Sherlock, G., Galagan, J. E., Ball, C. A., Schoolnik, G. K. 2009; 37: D499-D508

    Abstract

    The effective control of tuberculosis (TB) has been thwarted by the need for prolonged, complex and potentially toxic drug regimens, by reliance on an inefficient vaccine and by the absence of biomarkers of clinical status. The promise of the genomics era for TB control is substantial, but has been hindered by the lack of a central repository that collects and integrates genomic and experimental data about this organism in a way that can be readily accessed and analyzed. The Tuberculosis Database (TBDB) is an integrated database providing access to TB genomic data and resources, relevant to the discovery and development of TB drugs, vaccines and biomarkers. The current release of TBDB houses genome sequence data and annotations for 28 different Mycobacterium tuberculosis strains and related bacteria. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives. TBDB currently hosts data for nearly 1500 public tuberculosis microarrays and 260 arrays for Streptomyces. In addition, TBDB provides access to a suite of comparative genomics and microarray analysis software. By bringing together M. tuberculosis genome annotation and gene-expression data with a suite of analysis tools, TBDB (http://www.tbdb.org/) provides a unique discovery platform for TB research.

    View details for DOI 10.1093/nar/gkn652

    View details for Web of Science ID 000261906200090

    View details for PubMedID 18835847

  • Novel Low Abundance and Transient RNAs in Yeast Revealed by Tiling Microarrays and Ultra High-Throughput Sequencing Are Not Conserved Across Closely Related Yeast Species PLOS GENETICS Lee, A., Hansen, K. D., Bullard, J., Dudoit, S., Sherlock, G. 2008; 4 (12)

    Abstract

    A complete description of the transcriptome of an organism is crucial for a comprehensive understanding of how it functions and how its transcriptional networks are controlled, and may provide insights into the organism's evolution. Despite the status of Saccharomyces cerevisiae as arguably the most well-studied model eukaryote, we still do not have a full catalog or understanding of all its genes. In order to interrogate the transcriptome of S. cerevisiae for low abundance or rapidly turned over transcripts, we deleted elements of the RNA degradation machinery with the goal of preferentially increasing the relative abundance of such transcripts. We then used high-resolution tiling microarrays and ultra high-throughput sequencing (UHTS) to identify, map, and validate unannotated transcripts that are more abundant in the RNA degradation mutants relative to wild-type cells. We identified 365 currently unannotated transcripts, the majority presumably representing low abundance or short-lived RNAs, of which 185 are previously unknown and unique to this study. It is likely that many of these are cryptic unstable transcripts (CUTs), which are rapidly degraded and whose function(s) within the cell are still unclear, while others may be novel functional transcripts. Of the 185 transcripts we identified as novel to our study, greater than 80 percent come from regions of the genome that have lower conservation scores amongst closely related yeast species than 85 percent of the verified ORFs in S. cerevisiae. Such regions of the genome have typically been less well-studied, and by definition transcripts from these regions will distinguish S. cerevisiae from these closely related species.

    View details for DOI 10.1371/journal.pgen.1000299

    View details for Web of Science ID 000263667900014

    View details for PubMedID 19096707

  • Changes to NIH Grant System May Backfire SCIENCE Karp, P. D., Sherlock, G., Gerlt, J. A., Sim, I., Paulsen, I., Babbitt, P. C., Laderoute, K., Hunter, L., Sternberg, P., Wooley, J., Bourne, P. E. 2008; 322 (5905): 1187-1188

    View details for Web of Science ID 000261033400017

    View details for PubMedID 19023064

  • Comprehensive genomic characterization defines human glioblastoma genes and core pathways NATURE Chin, L., Meyerson, M., Aldape, K., Bigner, D., Mikkelsen, T., VandenBerg, S., Kahn, A., PENNY, R., Ferguson, M. L., Gerhard, D. S., Getz, G., Brennan, C., Taylor, B. S., Winckler, W., Park, P., Ladanyi, M., Hoadley, K. A., Verhaak, R. G., Hayes, D. N., Spellman, P. T., Absher, D., Weir, B. A., Ding, L., Wheeler, D., Lawrence, M. S., Cibulskis, K., Mardis, E., Zhang, J., Wilson, R. K., Donehower, L., Wheeler, D. A., Purdom, E., Wallis, J., Laird, P. W., Herman, J. G., Schuebel, K. E., Weisenberger, D. J., BAYLIN, S. B., Schultz, N., Yao, J., Wiedemeyer, R., WEINSTEIN, J., Sander, C., Gibbs, R. A., Gray, J., Kucherlapati, R., Lander, E. S., Myers, R. M., Perou, C. M., McLendon, R., Friedman, A., Van Meir, E. G., Brat, D. J., Mastrogianakis, G. M., Olson, J. J., Lehman, N., Yung, W. K., Bogler, O., Berger, M., Prados, M., Muzny, D., Morgan, M., Scherer, S., Sabo, A., Nazareth, L., Lewis, L., Hall, O., Zhu, Y., Ren, Y., Alvi, O., Yao, J., Hawes, A., Jhangiani, S., Fowler, G., San Lucas, A., Kovar, C., Cree, A., Dinh, H., Santibanez, J., Joshi, V., Gonzalez-Garay, M. L., Miller, C. A., Milosavljevic, A., Sougnez, C., Fennell, T., Mahan, S., Wilkinson, J., Ziaugra, L., Onofrio, R., Bloom, T., Nicol, R., Ardlie, K., Baldwin, J., Gabriel, S., Fulton, R. S., McLellan, M. D., Larson, D. E., Shi, X., Abbott, R., Fulton, L., Chen, K., Koboldt, D. C., Wendl, M. C., Meyer, R., Tang, Y., Lin, L., Osborne, J. R., Dunford-Shore, B. H., Miner, T. L., Delehaunty, K., Markovic, C., Swift, G., Courtney, W., Pohl, C., Abbott, S., Hawkins, A., Leong, S., Haipek, C., Schmidt, H., Wiechert, M., Vickery, T., Scott, S., Dooling, D. J., Chinwalla, A., Weinstock, G. M., O'Kelly, M., Robinson, J., Alexe, G., Beroukhim, R., Carter, S., Chiang, D., Gould, J., Gupta, S., Korn, J., Mermel, C., Mesirov, J., Monti, S., Nguyen, H., Parkin, M., Reich, M., Stransky, N., Garraway, L., Golub, T., Protopopov, A., Perna, I., Aronson, S., Sathiamoorthy, N., Ren, G., Kim, H., Kong, S. W., Xiao, Y., Kohane, I. S., Seidman, J., Cope, L., Pan, F., Van Den Berg, D., van Neste, L., Yi, J. M., Li, J. Z., Southwick, A., Brady, S., Aggarwal, A., Chung, T., Sherlock, G., Brooks, J. D., Jakkula, L. R., Lapuk, A. V., Marr, H., Dorton, S., Choi, Y. G., Han, J., Ray, A., Wang, V., Durinck, S., Robinson, M., Wang, N. J., Vranizan, K., Peng, V., Van Name, E., Fontenay, G. V., Ngai, J., Conboy, J. G., Parvin, B., Feiler, H. S., Speed, T. P., Socci, N. D., Olshen, A., Lash, A., Reva, B., Antipin, Y., Stukalov, A., Gross, B., Cerami, E., Wang, W. Q., Qin, L., Seshan, V. E., Villafania, L., Cavatore, M., Borsu, L., Viale, A., Gerald, W., Topal, M. D., Qi, Y., Balu, S., Shi, Y., Wu, G., Bittner, M., Shelton, T., Lenkiewicz, E., Morris, S., Beasley, D., Sanders, S., Sfeir, R., Chen, J., Nassau, D., Feng, L., Hickey, E., Schaefer, C., Madhavan, S., Buetow, K., Barker, A., Vockley, J., Compton, C., Vaught, J., Fielding, P., Collins, F., Good, P., Guyer, M., Ozenberger, B., Peterson, J., Thomson, E. 2008; 455 (7216): 1061-1068

    Abstract

    Human cancer cells typically harbour multiple chromosomal aberrations, nucleotide substitutions and epigenetic modifications that drive malignant transformation. The Cancer Genome Atlas (TCGA) pilot project aims to assess the value of large-scale multi-dimensional analysis of these molecular characteristics in human cancer and to provide the data rapidly to the research community. Here we report the interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas--the most common type of adult brain cancer--and nucleotide sequence aberrations in 91 of the 206 glioblastomas. This analysis provides new insights into the roles of ERBB2, NF1 and TP53, uncovers frequent mutations of the phosphatidylinositol-3-OH kinase regulatory subunit gene PIK3R1, and provides a network view of the pathways altered in the development of glioblastoma. Furthermore, integration of mutation, DNA methylation and clinical treatment data reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated glioblastomas, an observation with potential clinical implications. Together, these findings establish the feasibility and power of TCGA, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.

    View details for DOI 10.1038/nature07385

    View details for Web of Science ID 000260252600035

    View details for PubMedID 18772890

  • Minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE) NATURE BIOTECHNOLOGY Deutsch, E. W., Ball, C. A., Berman, J. J., Bova, G. S., Brazma, A., Bumgarner, R. E., Campbell, D., Causton, H. C., Christiansen, J. H., Daian, F., Dauga, D., Davidson, D. R., Gimenez, G., Goo, Y. A., Grimmond, S., Henrich, T., Herrmann, B. G., Johnson, M. H., Korb, M., Mills, J. C., Oudes, A. J., Parkinson, H. E., Pascal, L. E., Pollet, N. I., Quackenbush, J., Ramialison, M., Ringwald, M., Salgado, D., Sansone, S., Sherlock, G., Stoeckert, C. J., Swedlow, J., Taylor, R. C., Walashek, L., Warford, A., Wilkinson, D. G., Zhou, Y., Zon, L. I., Liu, A. Y., True, L. D. 2008; 26 (3): 305-312

    Abstract

    One purpose of the biomedical literature is to report results in sufficient detail that the methods of data collection and analysis can be independently replicated and verified. Here we present reporting guidelines for gene expression localization experiments: the minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE). MISFISHIE is modeled after the Minimum Information About a Microarray Experiment (MIAME) specification for microarray experiments. Both guidelines define what information should be reported without dictating a format for encoding that information. MISFISHIE describes six types of information to be provided for each experiment: experimental design, biomaterials and treatments, reporters, staining, imaging data and image characterizations. This specification has benefited the consortium within which it was developed and is expected to benefit the wider research community. We welcome feedback from the scientific community to help improve our proposal.

    View details for DOI 10.1038/nbt1391

    View details for Web of Science ID 000254123400023

    View details for PubMedID 18327244

  • Isolation and molecular characterization of cancer stem cells in MMTV-Wnt-1 murine breast tumors STEM CELLS Cho, R. W., Wang, X., Diehn, M., Shedden, K., Chen, G. Y., Sherlock, G., Gurney, A., Lewicki, J., Clarke, M. F. 2008; 26 (2): 364-371

    Abstract

    In human breast cancers, a phenotypically distinct minority population of tumorigenic (TG) cancer cells (sometimes referred to as cancer stem cells) drives tumor growth when transplanted into immunodeficient mice. Our objective was to identify a mouse model of breast cancer stem cells that could have relevance to the study of human breast cancer. To do so, we used breast tumors of the mouse mammary tumor virus (MMTV)-Wnt-1 mice. MMTV-Wnt-1 breast tumors were harvested, dissociated into single-cell suspensions, and sorted by flow cytometry on Thy1, CD24, and CD45. Sorted cells were then injected into recipient background FVB/NJ female syngeneic mice. In six of seven tumors examined, Thy1+CD24+ cancer cells, which constituted approximately 1%-4% of tumor cells, were highly enriched for cells capable of regenerating new tumors compared with cells of the tumor that did not fit this profile ("not-Thy1+CD24+"). Resultant tumors had a phenotypic diversity similar to that of the original tumor and behaved in a similar manner when passaged. Microarray analysis comparing Thy1+CD24+ tumor cells to not-Thy1+CD24+ cells identified a list of differentially expressed genes. Orthologs of these differentially expressed genes predicted survival of human breast cancer patients from two different study groups. These studies suggest that there is a cancer stem cell compartment in the MMTV-Wnt-1 murine breast tumor and that there is a clinical utility of this model for the study of cancer stem cells.

    View details for DOI 10.1634/stemcells.2007-0440

    View details for Web of Science ID 000253372600008

    View details for PubMedID 17975224

  • The XBabelPhish MAGE-ML and XML translator BMC BIOINFORMATICS Maier, D., Wymore, F., Sherlock, G., Ball, C. A. 2008; 9

    Abstract

    MAGE-ML has been promoted as a standard format for describing microarray experiments and the data they produce. Two characteristics of the MAGE-ML format compromise its use as a universal standard: First, MAGE-ML files are exceptionally large - too large to be easily read by most people, and often too large to be read by most software programs. Second, the MAGE-ML standard permits many ways of representing the same information. As a result, different producers of MAGE-ML create different documents describing the same experiment and its data. Recognizing all the variants is an unwieldy software engineering task, resulting in software packages that can read and process MAGE-ML from some, but not all producers. This Tower of MAGE-ML Babel bars the unencumbered exchange of microarray experiment descriptions couched in MAGE-ML.We have developed XBabelPhish - an XQuery-based technology for translating one MAGE-ML variant into another. XBabelPhish's use is not restricted to translating MAGE-ML documents. It can transform XML files independent of their DTD, XML schema, or semantic content. Moreover, it is designed to work on very large (> 200 Mb.) files, which are common in the world of MAGE-ML.XBabelPhish provides a way to inter-translate MAGE-ML variants for improved interchange of microarray experiment information. More generally, it can be used to transform most XML files, including very large ones that exceed the capacity of most XML tools.

    View details for DOI 10.1186/1471-2105-9-28

    View details for Web of Science ID 000253159700001

    View details for PubMedID 18205924

  • The Stanford Tissue Microarray Database NUCLEIC ACIDS RESEARCH Marinelli, R. J., Montgomery, K., Liu, C. L., Shah, N. H., Prapong, W., Nitzberg, M., Zachariah, Z. K., Sherlock, G. J., Natkunam, Y., West, R. B., van de Rijn, M., Brown, P. O., Ball, C. A. 2008; 36: D871-D877

    Abstract

    The Stanford Tissue Microarray Database (TMAD; http://tma.stanford.edu) is a public resource for disseminating annotated tissue images and associated expression data. Stanford University pathologists, researchers and their collaborators worldwide use TMAD for designing, viewing, scoring and analyzing their tissue microarrays. The use of tissue microarrays allows hundreds of human tissue cores to be simultaneously probed by antibodies to detect protein abundance (Immunohistochemistry; IHC), or by labeled nucleic acids (in situ hybridization; ISH) to detect transcript abundance. TMAD archives multi-wavelength fluorescence and bright-field images of tissue microarrays for scoring and analysis. As of July 2007, TMAD contained 205 161 images archiving 349 distinct probes on 1488 tissue microarray slides. Of these, 31 306 images for 68 probes on 125 slides have been released to the public. To date, 12 publications have been based on these raw public data. TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms. The production server uses the Apache HTTP Server, Oracle Database and Perl application code. Source code is available to interested researchers under a no-cost license.

    View details for DOI 10.1093/nar/gkm861

    View details for Web of Science ID 000252545400154

    View details for PubMedID 17989087

  • OntologyWidget - a reusable, embeddable widget for easily locating ontology terms BMC BIOINFORMATICS Beauheim, C. C., Wymore, F., Nitzberg, M., Zachariah, Z. K., Jin, H., Skene, J. H., Ball, C. A., Sherlock, G. 2007; 8

    Abstract

    Biomedical ontologies are being widely used to annotate biological data in a computer-accessible, consistent and well-defined manner. However, due to their size and complexity, annotating data with appropriate terms from an ontology is often challenging for experts and non-experts alike, because there exist few tools that allow one to quickly find relevant ontology terms to easily populate a web form.We have produced a tool, OntologyWidget, which allows users to rapidly search for and browse ontology terms. OntologyWidget can easily be embedded in other web-based applications. OntologyWidget is written using AJAX (Asynchronous JavaScript and XML) and has two related elements. The first is a dynamic auto-complete ontology search feature. As a user enters characters into the search box, the appropriate ontology is queried remotely for terms that match the typed-in text, and the query results populate a drop-down list with all potential matches. Upon selection of a term from the list, the user can locate this term within a generic and dynamic ontology browser, which comprises the second element of the tool. The ontology browser shows the paths from a selected term to the root as well as parent/child tree hierarchies. We have implemented web services at the Stanford Microarray Database (SMD), which provide the OntologyWidget with access to over 40 ontologies from the Open Biological Ontology (OBO) website 1. Each ontology is updated weekly. Adopters of the OntologyWidget can either use SMD's web services, or elect to rely on their own. Deploying the OntologyWidget can be accomplished in three simple steps: (1) install Apache Tomcat 2 on one's web server, (2) download and install the OntologyWidget servlet stub that provides access to the SMD ontology web services, and (3) create an html (HyperText Markup Language) file that refers to the OntologyWidget using a simple, well-defined format.We have developed OntologyWidget, an easy-to-use ontology search and display tool that can be used on any web page by creating a simple html description. OntologyWidget provides a rapid auto-complete search function paired with an interactive tree display. We have developed a web service layer that communicates between the web page interface and a database of ontology terms. We currently store 40 of the ontologies from the OBO website 1, as well as a several others. These ontologies are automatically updated on a weekly basis. OntologyWidget can be used in any web-based application to take advantage of the ontologies we provide via web services or any other ontology that is provided elsewhere in the correct format. The full source code for the JavaScript and description of the OntologyWidget is available from http://smd.stanford.edu/ontologyWidget/.

    View details for DOI 10.1186/1471-2105-8-338

    View details for Web of Science ID 000250989100001

    View details for PubMedID 17854506

  • The prognostic role of a gene signature from tumorigenic breast-cancer cells. NEW ENGLAND JOURNAL OF MEDICINE Liu, R., Wang, X., Chen, G. Y., Dalerba, P., Gurney, A., Hoey, T., Sherlock, G., Lewicki, J., Shedden, K., Clarke, M. F. 2007; 356 (3): 217-226

    Abstract

    Breast cancers contain a minority population of cancer cells characterized by CD44 expression but low or undetectable levels of CD24 (CD44+CD24-/low) that have higher tumorigenic capacity than other subtypes of cancer cells.We compared the gene-expression profile of CD44+CD24-/low tumorigenic breast-cancer cells with that of normal breast epithelium. Differentially expressed genes were used to generate a 186-gene "invasiveness" gene signature (IGS), which was evaluated for its association with overall survival and metastasis-free survival in patients with breast cancer or other types of cancer.There was a significant association between the IGS and both overall and metastasis-free survival (P<0.001, for both) in patients with breast cancer, which was independent of established clinical and pathological variables. When combined with the prognostic criteria of the National Institutes of Health, the IGS was used to stratify patients with high-risk early breast cancer into prognostic categories (good or poor); among patients with a good prognosis, the 10-year rate of metastasis-free survival was 81%, and among those with a poor prognosis, it was 57%. The IGS was also associated with the prognosis in medulloblastoma (P=0.004), lung cancer (P=0.03), and prostate cancer (P=0.01). The prognostic power of the IGS was increased when combined with the wound-response (WR) signature.The IGS is strongly associated with metastasis-free survival and overall survival for four different types of tumors. This genetic signature of tumorigenic breast-cancer cells was even more strongly associated with clinical outcomes when combined with the WR signature in breast cancer.

    View details for Web of Science ID 000243488100004

    View details for PubMedID 17229949

  • The Stanford Microarray Database: implementation of new analysis tools and open source release of software NUCLEIC ACIDS RESEARCH Demeter, J., Beauheim, C., Gollub, J., Hernandez-Boussard, T., Jin, H., Maier, D., Matese, J. C., Nitzberg, M., Wymore, F., Zachariah, Z. K., Brown, P. O., Sherlock, G., Ball, C. A. 2007; 35: D766-D770

    Abstract

    The Stanford Microarray Database (SMD; http://smd.stanford.edu/) is a research tool and archive that allows hundreds of researchers worldwide to store, annotate, analyze and share data generated by microarray technology. SMD supports most major microarray platforms, and is MIAME-supportive and can export or import MAGE-ML. The primary mission of SMD is to be a research tool that supports researchers from the point of data generation to data publication and dissemination, but it also provides unrestricted access to analysis tools and public data from 300 publications. In addition to supporting ongoing research, SMD makes its source code fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD. In this article, we describe several data analysis tools implemented in SMD and we discuss features of our software release.

    View details for DOI 10.1093/nar/gkl1019

    View details for Web of Science ID 000243494600151

    View details for PubMedID 17182626

  • Sequence resources at the Candida genome database NUCLEIC ACIDS RESEARCH Arnaud, M. B., Costanzo, M. C., Skrzypek, M. S., Shah, P., Binkley, G., Lane, C., Miyasato, S. R., Sherlock, G. 2007; 35: D452-D456

    Abstract

    The Candida Genome Database (CGD, http://www.candidagenome.org/) contains a curated collection of genomic information and community resources for researchers who are interested in the molecular biology of the opportunistic pathogen Candida albicans. With the recent release of a new assembly of the C.albicans genome, Assembly 20, C.albicans genomics has entered a new era. Although the C.albicans genome assembly continues to undergo refinement, multiple assemblies and gene nomenclatures will remain in widespread use by the research community. CGD has now taken on the responsibility of maintaining the most up-to-date version of the genome sequence by providing the data from this new assembly alongside the data from the previous assemblies, as well as any future corrections and refinements. In this database update, we describe the sequence information available for C.albicans, the sequence information contained in CGD, and the tools for sequence retrieval, analysis and comparison that CGD provides. CGD is freely accessible at http://www.candidagenome.org/ and CGD curators may be contacted by email at candida-curator@genome.stanford.edu.

    View details for DOI 10.1093/nar/gkl899

    View details for Web of Science ID 000243494600092

    View details for PubMedID 17090582

  • A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB BMC BIOINFORMATICS Rayner, T. F., Rocca-Serra, P., Spellman, P. T., Causton, H. C., Farne, A., Holloway, E., Irizarry, R. A., Liu, J., Maier, D. S., Miller, M., Petersen, K., Quackenbush, J., Sherlock, G., Stoeckert, C. J., White, J., Whetzel, P. L., Wymore, F., Parkinson, H., Sarkans, U., Ball, C. A., Brazma, A. 2006; 7

    Abstract

    Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support.We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion.MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML.

    View details for DOI 10.1186/1471-2105-7-489

    View details for Web of Science ID 000242642800001

    View details for PubMedID 17087822

  • Cell cycle - Complex evolution NATURE Sherlock, G. 2006; 443 (7111): 513-?

    View details for DOI 10.1038/443513a

    View details for Web of Science ID 000240988200026

    View details for PubMedID 17024077

  • The Candida Genome Database: Facilitating research on Candida albicans molecular biology FEMS YEAST RESEARCH Costanzo, M. C., Arnaud, M. B., Skrzypek, M. S., Binkley, G., Lane, C., Miyasato, S. R., Sherlock, G. 2006; 6 (5): 671-684

    Abstract

    The Candida Genome Database (CGD; http://www.candidagenome.org) is a resource for information about the Candida albicans genomic sequence and the molecular biology of its encoded gene products. CGD collects and organizes data from the biological literature concerning C. albicans, and provides tools for viewing, searching, analysing, and downloading these data. CGD also serves as an organizing centre for the C. albicans research community, providing a gene-name registry, contact information, and research community news. This article describes the information contained in CGD and how to access it, either from the perspective of a bench scientist interested in the function of one or a few genes, or from the perspective of a biologist or bioinformatician interpreting large-scale functional genomic datasets.

    View details for DOI 10.1111/j.1567-1364.2006.000074.x

    View details for Web of Science ID 000239004600001

    View details for PubMedID 16879419

  • Radiation-induced effects on gene expression: An in vivo study on breast cancer RADIOTHERAPY AND ONCOLOGY Helland, A., Johnsen, H., Froyland, C., Landmark, H. B., Saetersdal, A. B., Holmen, M. M., Gjertsen, T., Nesland, J. M., Ottestad, W., Jeffrey, S. S., Ottestad, L. O., Rodningen, O. K., Sherlock, G., Borresen-Dale, A. 2006; 80 (2): 230-235

    Abstract

    Breast cancer is diagnosed worldwide in approximately one million women annually and radiation therapy is an integral part of treatment. The purpose of this study was to investigate the molecular basis underlying response to radiotherapy in breast cancer tissue.Tumour biopsies were sampled before radiation and after 10 treatments (of 2 Gray (Gy) each) from 19 patients with breast cancer receiving radiation therapy. Gene expression microarray analyses were performed to identify in vivo radiation-responsive genes in tumours from patients diagnosed with breast cancer. The mutation status of the TP53 gene was determined by using direct sequencing.Several genes involved in cell cycle regulation and DNA repair were found to be significantly induced by radiation treatment. Mutations were found in the TP53 gene in 39% of the tumours and the gene expression profiles observed seemed to be influenced by the TP53 mutation status.

    View details for DOI 10.1016/j.radonc.2006.07.007

    View details for Web of Science ID 000240882300018

    View details for PubMedID 16890317

  • Development of the Minimum Information Specification for in situ Hybridization and Immunohistochemistry Experiments (MISFISHIE) OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY Deutsch, E. W., Ball, C. A., Bova, G. S., Brazma, A., Bumgarner, R. E., Campbell, D., Causton, H. C., Christiansen, J., Davidson, D., Eichner, L. J., Goo, Y. A., Grimmond, S., Henrich, T., Johnson, M. H., Korb, M., Mills, J. C., Oudes, A., Parkinson, H. E., Pascal, L. E., Quackenbush, J., Ramialison, M., Ringwald, M., Sansone, S., Sherlock, G., Stoeckert, C. J., Swedlow, J., Taylor, R. C., Walashek, L., Zhou, Y., Liu, A. Y., True, L. D. 2006; 10 (2): 205-208

    Abstract

    We describe the creation process of the Minimum Information Specification for In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE). Modeled after the existing minimum information specification for microarray data, we created a new specification for gene expression localization experiments, initially to facilitate data sharing within a consortium. After successful use within the consortium, the specification was circulated to members of the wider biomedical research community for comment and refinement. After a period of acquiring many new suggested requirements, it was necessary to enter a final phase of excluding those requirements that were deemed inappropriate as a minimum requirement for all experiments. The full specification will soon be published as a version 1.0 proposal to the community, upon which a more full discussion must take place so that the final specification may be achieved with the involvement of the whole community.

    View details for Web of Science ID 000240210900017

    View details for PubMedID 16901227

  • Top-down standards will not serve systems biology NATURE Quackenbush, J. 2006; 440 (7080): 24-24

    View details for DOI 10.1038/440024a

    View details for Web of Science ID 000235685700017

    View details for PubMedID 16511469

  • The Stanford Microarray Database: a user's guide. Methods in molecular biology (Clifton, N.J.) Gollub, J., Ball, C. A., Sherlock, G. 2006; 338: 191-208

    Abstract

    The Stanford Microarray Database (SMD) is a DNA microarray research database that provides a large amount of data for public use. This chapter describes the use of the primary tools for searching, browsing, retrieving, and analyzing data available for SMD. With this introduction, researchers and students will be able to examine and analyze a large body of gene expression and other experiments. Additional tools for depositing, annotating, sharing, and analyzing data, available only to registered users, are also described. SMD is available for installation as a local database.

    View details for PubMedID 16888360

  • Global analysis of gene function in yeast by quantitative phenotypic profiling MOLECULAR SYSTEMS BIOLOGY Brown, J. A., Sherlock, G., Myers, C. L., Burrows, N. M., Deng, C., Wu, H. I., McCann, K. E., Troyanskaya, O. G., Brown, J. M. 2006; 2

    Abstract

    We present a method for the global analysis of the function of genes in budding yeast based on hierarchical clustering of the quantitative sensitivity profiles of the 4756 strains with individual homozygous deletion of nonessential genes to a broad range of cytotoxic or cytostatic agents. This method is superior to other global methods of identifying the function of genes involved in the various DNA repair and damage checkpoint pathways as well as other interrogated functions. Analysis of the phenotypic profiles of the 51 diverse treatments places a total of 860 genes of unknown function in clusters with genes of known function. We demonstrate that this can not only identify the function of unknown genes but can also suggest the mechanism of action of the agents used. This method will be useful when used alone and in conjunction with other global approaches to identify gene function in yeast.

    View details for DOI 10.1038/msb4100043

    View details for Web of Science ID 000243245400005

    View details for PubMedID 16738548

  • Wrestling with SUMO and bio-ontologies. Nature biotechnology Stoeckert, C., Ball, C., Brazma, A., Brinkman, R., Causton, H., Fan, L., Fostel, J., Fragoso, G., Heiskanen, M., Holstege, F., Morrison, N., Parkinson, H., Quackenbush, J., Rocca-Serra, P., Sansone, S. A., Sarkans, U., Sherlock, G., Stevens, R., Taylor, C., Taylor, R., Whetzel, P., White, J. 2006; 24 (1): 21-2; author reply 23

    View details for PubMedID 16404382

  • Clustering microarray data DNA MICROARRAYS, PART B: DATABASES AND STATISTICS Gollub, J., Sherlock, G. 2006; 411: 194-?

    Abstract

    Even a simple, small-scale, microarray experiment generates thousands to millions of data points. Clearly, spreadsheets or plotting programs do not suffice for analysis of such large volumes of data, and comprehensive analysis requires systematic methods for selection and organization of data. This chapter focuses on the concepts and algorithms of hierarchical clustering and the most commonly employed methods of partitioning or organizing microarray data, and freely available software that implements these algorithms.

    View details for DOI 10.1016/S0076-6879(06)11010-1

    View details for Web of Science ID 000244506300010

    View details for PubMedID 16939791

  • Storage and retrieval of microarray data and open source microarray database software MOLECULAR BIOTECHNOLOGY Sherlock, G., Ball, C. A. 2005; 30 (3): 239-251

    Abstract

    Microarray technology has been widely adopted by researchers who use both home-made microarrays and microarrays purchased from commercial vendors. Associated with the adoption of this technology has been a deluge of complex data, both from the microarrays themselves, and also in the form of associated meta data, such as gene annotation information, the properties and treatment of biological samples, and the data transformation and analysis steps taken downstream. In addition, standards for annotation and data exchange have been proposed, and are now being adopted by journals and funding agencies alike. The coupling of large quantities of complex data with extensive and complex standards require all but the most small-scale of microarray users to have access to a robust and scaleable database with various tools. In this review, we discuss some of the desirable properties of such a database, and look at the features of several freely available alternatives.

    View details for Web of Science ID 000230547300006

    View details for PubMedID 15988049

  • A human-curated annotation of the Candida albicans genome PLOS GENETICS Braun, B. R., Hoog, M. V., d'Enfert, C., Martchenko, M., Dungan, J., Kuo, A., Inglis, D. O., Uhl, M. A., Hogues, H., Berriman, M., Lorenz, M., Levitin, A., Oberholzer, U., Bachewich, C., Harcus, D., Marcil, A., Dignard, D., Iouk, T., Zito, R., Frangeul, L., Tekaia, F., Rutherford, K., Wang, E., Munro, C. A., BATES, S., Gow, N. A., Hoyer, L. L., Kohler, G., Morschhauser, J., Newport, G., Znaidi, S., Raymond, M., Turcotte, B., Sherlock, G., Costanzo, M., Ihmels, J., Berman, J., Sanglard, D., Agabian, N., Mitchell, A. P., Johnson, A. D., Whiteway, M., Nantel, A. 2005; 1 (1): 36-57

    Abstract

    Recent sequencing and assembly of the genome for the fungal pathogen Candida albicans used simple automated procedures for the identification of putative genes. We have reviewed the entire assembly, both by hand and with additional bioinformatic resources, to accurately map and describe 6,354 genes and to identify 246 genes whose original database entries contained sequencing errors (or possibly mutations) that affect their reading frame. Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that might be targeted for antifungal therapy. We also observed that, compared to other fungi, the protein-coding sequences in the C. albicans genome are especially rich in short sequence repeats. Finally, our improved annotation permitted a detailed analysis of several multigene families, and comparative genomic studies showed that C. albicans has a far greater catabolic range, encoding respiratory Complex 1, several novel oxidoreductases and ketone body degrading enzymes, malonyl-CoA and enoyl-CoA carriers, several novel amino acid degrading enzymes, a variety of secreted catabolic lipases and proteases, and numerous transporters to assimilate the resulting nutrients. The results of these efforts will ensure that the Candida research community has uniform and comprehensive genomic information for medical research as well as for future diagnostic and therapeutic applications.

    View details for DOI 10.1371/journal.pgen.0010001

    View details for Web of Science ID 000234295900006

    View details for PubMedID 16103911

  • Of fish and chips NATURE METHODS Sherlock, G. 2005; 2 (5): 329-330

    View details for Web of Science ID 000228790200008

    View details for PubMedID 15846357

  • Microarray karyotyping of commercial wine yeast strains reveals shared, as well as unique, genomic signatures BMC GENOMICS Dunn, B., Levine, R. P., Sherlock, G. 2005; 6

    Abstract

    Genetic differences between yeast strains used in wine-making may account for some of the variation seen in their fermentation properties and may also produce differing sensory characteristics in the final wine product itself. To investigate this, we have determined genomic differences among several Saccharomyces cerevisiae wine strains by using a "microarray karyotyping" (also known as "array-CGH" or "aCGH") technique.We have studied four commonly used commercial wine yeast strains, assaying three independent isolates from each strain. All four wine strains showed common differences with respect to the laboratory S. cerevisiae strain S288C, some of which may be specific to commercial wine yeasts. We observed very little intra-strain variation; i.e., the genomic karyotypes of different commercial isolates of the same strain looked very similar, although an exception to this was seen among the Montrachet isolates. A moderate amount of inter-strain genomic variation between the four wine strains was observed, mostly in the form of depletions or amplifications of single genes; these differences allowed unique identification of each strain. Many of the inter-strain differences appear to be in transporter genes, especially hexose transporters (HXT genes), metal ion sensors/transporters (CUP1, ZRT1, ENA genes), members of the major facilitator superfamily, and in genes involved in drug response (PDR3, SNQ1, QDR1, RDS1, AYT1, YAR068W). We therefore used halo assays to investigate the response of these strains to three different fungicidal drugs (cycloheximide, clotrimazole, sulfomethuron methyl). Strains with fewer copies of the CUP1 loci showed hypersensitivity to sulfomethuron methyl.Microarray karyotyping is a useful tool for analyzing the genome structures of wine yeasts. Despite only small to moderate variations in gene copy numbers between different wine yeast strains and within different isolates of a given strain, there was enough variation to allow unique identification of strains; additionally, some of the variation correlated with drug sensitivity. The relatively small number of differences seen by microarray karyotyping between the strains suggests that the differences in fermentative and organoleptic properties ascribed to these different strains may arise from a small number of genetic changes, making it possible to test whether the observed differences do indeed confer different sensory properties in the finished wine.

    View details for DOI 10.1186/1471-2164-6-53

    View details for Web of Science ID 000228998600001

    View details for PubMedID 15833139

  • The Stanford Microarray Database accommodates additional microarray platforms and data formats NUCLEIC ACIDS RESEARCH Ball, C. A., Awad, I. A., Demeter, J., Gollub, J., Hebert, J. M., Hernandez-Boussard, T., Jin, H., Matese, J. C., Nitzberg, M., Wymore, F., Zachariah, Z. K., Brown, P. O., Sherlock, G. 2005; 33: D580-D582

    Abstract

    The Stanford Microarray Database (SMD) (http://smd.stanford.edu) is a research tool for hundreds of Stanford researchers and their collaborators. In addition, SMD functions as a resource for the entire biological research community by providing unrestricted access to microarray data published by SMD users and by disseminating its source code. In addition to storing GenePix (Axon Instruments) and ScanAlyze output from spotted microarrays, SMD has recently added the ability to store, retrieve, display and analyze the complete raw data produced by several additional microarray platforms and image analysis software packages, so that we can also now accept data from Affymetrix GeneChips (MAS5/GCOS or dChip), Agilent Catalog or Custom arrays (using Agilent's Feature Extraction software) or data created by SpotReader (Niles Scientific). We have implemented software that allows us to accept MAGE-ML documents from array manufacturers and to submit MIAME-compliant data in MAGE-ML format directly to ArrayExpress and GEO, greatly increasing the ease with which data from SMD can be published adhering to accepted standards and also increasing the accessibility of published microarray data to the general public. We have introduced a new tool to facilitate data sharing among our users, so that datasets can be shared during, before or after the completion of data analysis. The latest version of the source code for the complete database package was released in November 2004 (http://smd.stanford.edu/download/), allowing researchers around the world to deploy their own installations of SMD.

    View details for Web of Science ID 000226524300119

    View details for PubMedID 15608265

  • The Candida Genome Database (CGD), a community resource for Candida albicans gene and protein information NUCLEIC ACIDS RESEARCH Arnaud, M. B., Costanzo, M. C., Skrzypek, M. S., Binkley, G., Lane, C., Miyasato, S. R., Sherlock, G. 2005; 33: D358-D363

    Abstract

    The Candida Genome Database (CGD) is a new database that contains genomic information about the opportunistic fungal pathogen Candida albicans. CGD is a public resource for the research community that is interested in the molecular biology of this fungus. CGD curators are in the process of combing the scientific literature to collect all C.albicans gene names and aliases; to assign gene ontology terms that describe the molecular function, biological process, and subcellular localization of each gene product; to annotate mutant phenotypes; and to summarize the function and biological context of each gene product in free-text description lines. CGD also provides community resources, including a reservation system for gene names and a colleague registry through which Candida researchers can share contact information and research interests. CGD is publicly funded (by NIH grant R01 DE15873-01 from the NIDCR) and is freely available at http://www.candidagenome.org/.

    View details for DOI 10.1093/nar/gki003

    View details for Web of Science ID 000226524300074

    View details for PubMedID 15608216

  • GO::TermFinder - open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes BIOINFORMATICS Boyle, E. I., Weng, S. A., Gollub, J., Jin, H., Botstein, D., Cherry, J. M., Sherlock, G. 2004; 20 (18): 3710-3715

    Abstract

    GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology (GO) information and evaluating and visualizing the collective annotation of a list of genes to GO terms. It can be used to draw conclusions from microarray and other biological data, calculating the statistical significance of each annotation. GO::TermFinder can be used on any system on which Perl can be run, either as a command line application, in single or batch mode, or as a web-based CGI script.The full source code and documentation for GO::TermFinder are freely available from http://search.cpan.org/dist/GO-TermFinder/.

    View details for DOI 10.1093/bioinformatics/bth456

    View details for Web of Science ID 000225786600064

    View details for PubMedID 15297299

  • An open letter on microarray data from the MGED Society MICROBIOLOGY-SGM Ball, C., Brazma, A., Causton, H., Chervitz, S., Edgar, R., Hingamp, P., Matese, J. C., Parkinson, H., Quackenbush, J., RINGWALD, M., Sansone, S. A., Sherlock, G., Spellman, P., Stoeckert, C., Tateno, Y., Taylor, R., WHITE, J., Winegarden, N. 2004; 150: 3522-3524

    View details for DOI 10.1099/mic.0.27637-0

    View details for Web of Science ID 000225372700003

    View details for PubMedID 15528642

  • Caryoscope: An Open Source Java application for viewing microarray data in a genomic context BMC BIOINFORMATICS Awad, I. A., Rees, C. A., Hernandez-Boussard, T., Ball, C. A., Sherlock, G. 2004; 5

    Abstract

    Microarray-based comparative genome hybridization experiments generate data that can be mapped onto the genome. These data are interpreted more easily when represented graphically in a genomic context.We have developed Caryoscope, which is an open source Java application for visualizing microarray data from array comparative genome hybridization experiments in a genomic context. Caryoscope can read General Feature Format files (GFF files), as well as comma- and tab-delimited files, that define the genomic positions of the microarray reporters for which data are obtained. The microarray data can be browsed using an interactive, zoomable interface, which helps users identify regions of chromosomal deletion or amplification. The graphical representation of the data can be exported in a number of graphic formats, including publication-quality formats such as PostScript.Caryoscope is a useful tool that can aid in the visualization, exploration and interpretation of microarray data in a genomic context.

    View details for DOI 10.1186/1471-2105-5-151

    View details for Web of Science ID 000225769900002

    View details for PubMedID 15488149

  • GeneXplorer: an interactive web application for microarray data visualization and analysis BMC BIOINFORMATICS Rees, C. A., Demeter, J., Matese, J. C., Botstein, D., Sherlock, G. 2004; 5

    Abstract

    When publishing large-scale microarray datasets, it is of great value to create supplemental websites where either the full data, or selected subsets corresponding to figures within the paper, can be browsed. We set out to create a CGI application containing many of the features of some of the existing standalone software for the visualization of clustered microarray data.We present GeneXplorer, a web application for interactive microarray data visualization and analysis in a web environment. GeneXplorer allows users to browse a microarray dataset in an intuitive fashion. It provides simple access to microarray data over the Internet and uses only HTML and JavaScript to display graphic and annotation information. It provides radar and zoom views of the data, allows display of the nearest neighbors to a gene expression vector based on their Pearson correlations and provides the ability to search gene annotation fields.The software is released under the permissive MIT Open Source license, and the complete documentation and the entire source code are freely available for download from CPAN http://search.cpan.org/dist/Microarray-GeneXplorer/.

    View details for DOI 10.1186/1471-2105-5-141

    View details for Web of Science ID 000224940600001

    View details for PubMedID 15458579

  • Submission of microarray data to public repositories. PLoS biology Ball, C. A., Brazma, A., Causton, H., Chervitz, S., Edgar, R., Hingamp, P., Matese, J. C., Parkinson, H., Quackenbush, J., Ringwald, M., Sansone, S., Sherlock, G., Spellman, P., Stoeckert, C., Tateno, Y., Taylor, R., White, J., Winegarden, N. 2004; 2 (9): E317-?

    View details for PubMedID 15340489

  • Funding high-throughput data sharing NATURE BIOTECHNOLOGY Ball, C. A., Sherlock, G., Brazma, A. 2004; 22 (9): 1179-1183

    View details for DOI 10.1038/nbt0904-1179

    View details for Web of Science ID 000223653400040

    View details for PubMedID 15340487

  • Standards for microarray data: an open letter. Environmental health perspectives Ball, C., Brazma, A., Causton, H., Chervitz, S., Edgar, R., Hingamp, P., Matese, J. C., Parkinson, H., Quackenbush, J., Ringwald, M., Sansone, S., Sherlock, G., Spellman, P., Stoeckert, C., Tateno, Y., Taylor, R., White, J., Winegarden, N. 2004; 112 (12): A666-7

    View details for PubMedID 15345376

  • STARTing to recycle NATURE GENETICS Sherlock, G. 2004; 36 (8): 795-796

    View details for DOI 10.1038/ng0804-795

    View details for Web of Science ID 000222974000010

    View details for PubMedID 15284848

  • Final words: cell age and cell cycle are. unlinked TRENDS IN BIOTECHNOLOGY Spellman, P. T., Sherlock, G. 2004; 22 (6): 277-278

    Abstract

    Cooper has a simple belief: that the cell cycle is connected to age and size. Furthermore, as a result of this connection in his mind he believes that there are no possible manipulations that can operate on a batch culture to synchronize cells within the cell cycle, such that those cells can undergo a semblance of a normal cell cycle. His formulation of this argument is as a 'fundamental law', the law of conservation of cell-age order (LCCAO). The first part of this law - 'there is no batch treatment of the culture that can lead to an alteration of the cell-age order' - can probably be proved true, in the mathematical sense, and certainly makes intuitive sense. Unfortunately the corollaries of this law are rather suspect, drawing inferences from cell age to cell size to the cell cycle.

    View details for Web of Science ID 000222301000006

    View details for PubMedID 15158055

  • Reply: whole-culture synchronization effective tools for cell cycle studies TRENDS IN BIOTECHNOLOGY Spellman, P. T., Sherlock, G. 2004; 22 (6): 270-273

    Abstract

    Studies of gene expression during the eukaryotic cell cycle in whole-culture synchronized cultures have been published using many methodologies. These procedures alter the state of the cell cycle for a population of cells, rather than purifying a population of cells that are in the same state. Criticism of these methods (e.g. see Cooper, this issue, pp. 266-269, ) suggests that these studies are flawed, and posits that such methodologies cannot be used to study the cell cycle because they alter the size and age distributions of the cultures. We believe that whole-culture cell cycle studies work even though they alter the size and age distributions: these cells still progress through the cell cycle and although we do not suggest that the methods are perfect, we will explain how these microarray studies have successfully identified cell cycle regulated genes and why these results are biologically meaningful.

    View details for Web of Science ID 000222301000004

    View details for PubMedID 15158053

  • The Longhorn Array Database (LAD): An open-source, MIAME compliant implementation of the Stanford Microarray database (SMD) BMC BIOINFORMATICS Killion, P. J., Sherlock, G., Iyer, V. R. 2003; 4

    Abstract

    The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms.The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at http://www.longhornarraydatabase.orgOur development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data.

    View details for Web of Science ID 000185003900001

    View details for PubMedID 12930545

  • Microarray databases: storage and retrieval of microarray data. Methods in molecular biology (Clifton, N.J.) Sherlock, G., Ball, C. A. 2003; 224: 235-248

    View details for PubMedID 12710676

  • The Stanford Microarray Database: data access and quality assessment tools NUCLEIC ACIDS RESEARCH Gollub, J., Ball, C. A., Binkley, G., Demeter, J., Finkelstein, D. B., Hebert, J. M., Hernandez-Boussard, T., Jin, H., Kaloper, M., Matese, J. C., Schroeder, M., Brown, P. O., Botstein, D., Sherlock, G. 2003; 31 (1): 94-96

    Abstract

    The Stanford Microarray Database (SMD; http://genome-www.stanford.edu/microarray/) serves as a microarray research database for Stanford investigators and their collaborators. In addition, SMD functions as a resource for the entire scientific community, by making freely available all of its source code and providing full public access to data published by SMD users, along with many tools to explore and analyze those data. SMD currently provides public access to data from 3500 microarrays, including data from 85 publications, and this total is increasing rapidly. In this article, we describe some of SMD's newer tools for accessing public data, assessing data quality and for data analysis.

    View details for DOI 10.1093/nar/gkg078

    View details for Web of Science ID 000181079700020

    View details for PubMedID 12519956

  • SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data NUCLEIC ACIDS RESEARCH Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J. C., Hernandez-Boussard, T., Rees, C. A., Cherry, J. M., Botstein, D., Brown, P. O., Alizadeh, A. A. 2003; 31 (1): 219-223

    Abstract

    The explosion in the number of functional genomic datasets generated with tools such as DNA microarrays has created a critical need for resources that facilitate the interpretation of large-scale biological data. SOURCE is a web-based database that brings together information from a broad range of resources, and provides it in manner particularly useful for genome-scale analyses. SOURCE's GeneReports include aliases, chromosomal location, functional descriptions, GeneOntology annotations, gene expression data, and links to external databases. We curate published microarray gene expression datasets and allow users to rapidly identify sets of co-regulated genes across a variety of tissues and a large number of conditions using a simple and intuitive interface. SOURCE provides content both in gene and cDNA clone-centric pages, and thus simplifies analysis of datasets generated using cDNA microarrays. SOURCE is continuously updated and contains the most recent and accurate information available for human, mouse, and rat genes. By allowing dynamic linking to individual gene or clone reports, SOURCE facilitates browsing of large genomic datasets. Finally, SOURCEs batch interface allows rapid extraction of data for thousands of genes or clones at once and thus facilitates statistical analyses such as assessing the enrichment of functional attributes within clusters of genes. SOURCE is available at http://source.stanford.edu.

    View details for DOI 10.1093/nar/gkg014

    View details for Web of Science ID 000181079700050

    View details for PubMedID 12519986

  • Standards for Microarray data SCIENCE Ball, C. A., Sherlock, G., Parkinson, H., Rocca-Serra, P., Brooksbank, C., Causton, H. C., Cavalieri, D., Gaasterland, T., Hingamp, P., Holstege, F., RINGWALD, M., Spellman, P., Stoeckert, C. J., Stewart, J. E., Taylor, R., Brazma, A., Quackenbuch, J. 2002; 298 (5593): 539-539

    View details for Web of Science ID 000178634800016

    View details for PubMedID 12387284

  • Identification of genes periodically expressed in the human cell cycle and their expression in tumors MOLECULAR BIOLOGY OF THE CELL Whitfield, M. L., Sherlock, G., Saldanha, A. J., Murray, J. I., Ball, C. A., Alexander, K. E., Matese, J. C., Perou, C. M., Hurt, M. M., Brown, P. O., Botstein, D. 2002; 13 (6): 1977-2000

    Abstract

    The genome-wide program of gene expression during the cell division cycle in a human cancer cell line (HeLa) was characterized using cDNA microarrays. Transcripts of >850 genes showed periodic variation during the cell cycle. Hierarchical clustering of the expression patterns revealed coexpressed groups of previously well-characterized genes involved in essential cell cycle processes such as DNA replication, chromosome segregation, and cell adhesion along with genes of uncharacterized function. Most of the genes whose expression had previously been reported to correlate with the proliferative state of tumors were found herein also to be periodically expressed during the HeLa cell cycle. However, some of the genes periodically expressed in the HeLa cell cycle do not have a consistent correlation with tumor proliferation. Cell cycle-regulated transcripts of genes involved in fundamental processes such as DNA replication and chromosome segregation seem to be more highly expressed in proliferative tumors simply because they contain more cycling cells. The data in this report provide a comprehensive catalog of cell cycle regulated genes that can serve as a starting point for functional discovery. The full dataset is available at http://genome-www.stanford.edu/Human-CellCycle/HeLa/.

    View details for DOI 10.1091/mbc.02-02-0030

    View details for Web of Science ID 000176418800016

    View details for PubMedID 12058064

  • Molecular characterisation of soft tissue tumours: a gene expression study LANCET Nielsen, T. O., West, R. B., Linn, S. C., Alter, O., Knowling, M. A., O'Connell, J. X., Zhu, S., Fero, M., Sherlock, G., Pollack, J. R., Brown, P. O., Botstein, D., van de Rijn, M. 2002; 359 (9314): 1301-1307

    Abstract

    Soft-tissue tumours are derived from mesenchymal cells such as fibroblasts, muscle cells, or adipocytes, but for many such tumours the histogenesis is controversial. We aimed to start molecular characterisation of these rare neoplasms and to do a genome-wide search for new diagnostic markers.We analysed gene-expression patterns of 41 soft-tissue tumours with spotted cDNA microarrays. After removal of errors introduced by use of different microarray batches, the expression patterns of 5520 genes that were well defined were used to separate tumours into discrete groups by hierarchical clustering and singular value decomposition.Synovial sarcomas, gastrointestinal stromal tumours, neural tumours, and a subset of the leiomyosarcomas, showed strikingly distinct gene-expression patterns. Other tumour categories--malignant fibrous histiocytoma, liposarcoma, and the remaining leiomyosarcomas--shared molecular profiles that were not predicted by histological features or immunohistochemistry. Strong expression of known genes, such as KIT in gastrointestinal stromal tumours, was noted within gene sets that distinguished the different sarcomas. However, many uncharacterised genes also contributed to the distinction between tumour types.These results suggest a new method for classification of soft-tissue tumours, which could improve on the method based on histological findings. Large numbers of uncharacterised genes contributed to distinctions between the tumours, and some of these could be useful markers for diagnosis, have prognostic significance, or prove possible targets for treatment.

    View details for Web of Science ID 000174989700013

    View details for PubMedID 11965276

  • Exploratory screening of genes and clusters from microarray experiments STATISTICA SINICA Tibshirani, R., Hastie, T., Narasimhan, B., Eisen, M., Sherlock, G., Brown, P., Botstein, D. 2002; 12 (1): 47-59
  • Design and implementation of microarray gene expression markup language (MAGE-ML) GENOME BIOLOGY Spellman, P. T., Miller, M., Stewart, J., Troup, C., Sarkans, U., Chervitz, S., Bernhart, D., Sherlock, G., Ball, C., Lepage, M., Swiatek, M., Marks, W. L., Goncalves, J., Markel, S., Iordan, D., Shojatalab, M., Pizarro, A., White, J., Hubley, R., Deutsch, E., Senger, M., Aronow, B. J., Robinson, A., Bassett, D., Stoeckert, C. J., Brazma, A. 2002; 3 (9)

    Abstract

    Meaningful exchange of microarray data is currently difficult because it is rare that published data provide sufficient information depth or are even in the same format from one publication to another. Only when data can be easily exchanged will the entire biological community be able to derive the full benefit from such microarray studies.To this end we have developed three key ingredients towards standardizing the storage and exchange of microarray data. First, we have created a minimal information for the annotation of a microarray experiment (MIAME)-compliant conceptualization of microarray experiments modeled using the unified modeling language (UML) named MAGE-OM (microarray gene expression object model). Second, we have translated MAGE-OM into an XML-based data format, MAGE-ML, to facilitate the exchange of data. Third, some of us are now using MAGE (or its progenitors) in data production settings. Finally, we have developed a freely available software tool kit (MAGE-STK) that eases the integration of MAGE-ML into end users' systems.MAGE will help microarray data producers and users to exchange information by providing a common platform for data exchange, and MAGE-STK will make the adoption of MAGE easier.

    View details for Web of Science ID 000207581400013

    View details for PubMedID 12225585

  • Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) NUCLEIC ACIDS RESEARCH Dwight, S. S., Harris, M. A., Dolinski, K., Ball, C. A., Binkley, G., Christie, K. R., Fisk, D. G., Issel-Tarver, L., Schroeder, M., Sherlock, G., Sethuraman, A., Weng, S., Botstein, D., Cherry, J. M. 2002; 30 (1): 69-72

    Abstract

    The Saccharomyces Genome Database (SGD) resources, ranging from genetic and physical maps to genome-wide analysis tools, reflect the scientific progress in identifying genes and their functions over the last decade. As emphasis shifts from identification of the genes to identification of the role of their gene products in the cell, SGD seeks to provide its users with annotations that will allow relationships to be made between gene products, both within Saccharomyces cerevisiae and across species. To this end, SGD is annotating genes to the Gene Ontology (GO), a structured representation of biological knowledge that can be shared across species. The GO consists of three separate ontologies describing molecular function, biological process and cellular component. The goal is to use published information to associate each characterized S.cerevisiae gene product with one or more GO terms from each of the three ontologies. To be useful, this must be done in a manner that allows accurate associations based on experimental evidence, modifications to GO when necessary, and careful documentation of the annotations through evidence codes for given citations. Reaching this goal is an ongoing process at SGD. For information on the current progress of GO annotations at SGD and other participating databases, as well as a description of each of the three ontologies, please visit the GO Consortium page at http://www.geneontology.org. SGD gene associations to GO can be found by visiting our site at http://genome-www.stanford.edu/Saccharomyces/.

    View details for Web of Science ID 000173077100017

    View details for PubMedID 11752257

  • Minimum information about a microarray experiment (MIAME) - toward standards for microarray data NATURE GENETICS Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C. A., Causton, H. C., Gaasterland, T., Glenisson, P., HOLSTEGE, F. C., Kim, I. F., Markowitz, V., Matese, J. C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., STEWART, J., Taylor, R., Vilo, J., Vingron, M. 2001; 29 (4): 365-371

    Abstract

    Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.

    View details for Web of Science ID 000172507500006

    View details for PubMedID 11726920

  • Analysis of large-scale gene expression data. Briefings in bioinformatics Sherlock, G. 2001; 2 (4): 350-362

    Abstract

    DNA microarray technology has resulted in the generation of large complex data sets, such that the bottleneck in biological investigation has shifted from data generation, to data analysis. This review discusses some of the algorithms and tools for the analysis and organisation of microarray expression data, including clustering methods, partitioning methods, and methods for correlating expression data to other biological data.

    View details for PubMedID 11808747

  • Creating the gene ontology resource: Design and implementation GENOME RESEARCH Ashburner, M., Ball, C. A., Blake, J. A., Butler, H., Cherry, J. M., Corradi, J., Dolinski, K., Eppig, J. T., Harris, M., Hill, D. P., Lewis, S., Marshall, B., Mungall, C., Reiser, L., Rhee, S., Richardson, J. E., Richter, J., RINGWALD, M., Rubin, G. M., Sherlock, G., Yoon, J. 2001; 11 (8): 1425-1433

    Abstract

    The exponential growth in the volume of accessible biological information has generated a confusion of voices surrounding the annotation of molecular information about genes and their products. The Gene Ontology (GO) project seeks to provide a set of structured vocabularies for specific biological domains that can be used to describe gene products in any organism. This work includes building three extensive ontologies to describe molecular function, biological process, and cellular component, and providing a community database resource that supports the use of these ontologies. The GO Consortium was initiated by scientists associated with three model organism databases: SGD, the Saccharomyces Genome database; FlyBase, the Drosophila genome database; and MGD/GXD, the Mouse Genome Informatics databases. Additional model organism database groups are joining the project. Each of these model organism information systems is annotating genes and gene products using GO vocabulary terms and incorporating these annotations into their respective model organism databases. Each database contributes its annotation files to a shared GO data resource accessible to the public at http://www.geneontology.org/. The GO site can be used by the community both to recover the GO vocabularies and to access the annotated gene product data sets from the model organism databases. The GO Consortium supports the development of the GO database resource and provides tools enabling curators and researchers to query and manipulate the vocabularies. We believe that the shared development of this molecular annotation resource will contribute to the unification of biological information.

    View details for Web of Science ID 000170263900015

    View details for PubMedID 11483584

  • Missing value estimation methods for DNA microarrays BIOINFORMATICS Troyanskaya, O., Cantor, M., Sherlock, G., BROWN, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R. B. 2001; 17 (6): 520-525

    Abstract

    Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data.We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.

    View details for Web of Science ID 000169404700005

    View details for PubMedID 11395428

  • The Stanford Microarray Database NUCLEIC ACIDS RESEARCH Sherlock, G., Hernandez-Boussard, T., Kasarskis, A., Binkley, G., Matese, J. C., Dwight, S. S., Kaloper, M., Weng, S., Jin, H., Ball, C. A., Eisen, M. B., Spellman, P. T., Brown, P. O., Botstein, D., Cherry, J. M. 2001; 29 (1): 152-155

    Abstract

    The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77-80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73-76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10-14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332-333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45-48] and can be accessed at http://genome-www.stanford.edu/microarray.

    View details for Web of Science ID 000166360300039

    View details for PubMedID 11125075

  • Saccharomyces Genome Database provides tools to survey gene expression and functional analysis data NUCLEIC ACIDS RESEARCH Ball, C. A., Jin, H., Sherlock, G., Weng, S., Matese, J. C., Andrada, R., Binkley, G., Dolinski, K., Dwight, S. S., Harris, M. A., Issel-Tarver, L., SCHROEDER, R., Botstein, D., Cherry, J. M. 2001; 29 (1): 80-81

    Abstract

    Upon the completion of the SACCHAROMYCES: cerevisiae genomic sequence in 1996 [Goffeau,A. et al. (1997) NATURE:, 387, 5], several creative and ambitious projects have been initiated to explore the functions of gene products or gene expression on a genome-wide scale. To help researchers take advantage of these projects, the SACCHAROMYCES: Genome Database (SGD) has created two new tools, Function Junction and Expression Connection. Together, the tools form a central resource for querying multiple large-scale analysis projects for data about individual genes. Function Junction provides information from diverse projects that shed light on the role a gene product plays in the cell, while Expression Connection delivers information produced by the ever-increasing number of microarray projects. WWW access to SGD is available at genome-www.stanford. edu/Saccharomyces/.

    View details for Web of Science ID 000166360300019

    View details for PubMedID 11125055

  • A whole-genome microarray reveals genetic diversity among Helicobacter pylori strains PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Salama, N., Guillemin, K., McDaniel, T. K., Sherlock, G., Tompkins, L., FALKOW, S. 2000; 97 (26): 14668-14673

    Abstract

    Helicobacter pylori colonizes the stomach of half of the world's population, causing a wide spectrum of disease ranging from asymptomatic gastritis to ulcers to gastric cancer. Although the basis for these diverse clinical outcomes is not understood, more severe disease is associated with strains harboring a pathogenicity island. To characterize the genetic diversity of more and less virulent strains, we examined the genomic content of 15 H. pylori clinical isolates by using a whole genome H. pylori DNA microarray. We found that a full 22% of H. pylori genes are dispensable in one or more strains, thus defining a minimal functional core of 1281 H. pylori genes. While the core genes encode most metabolic and cellular processes, the strain-specific genes include genes unique to H. pylori, restriction modification genes, transposases, and genes encoding cell surface proteins, which may aid the bacteria under specific circumstances during their long-term infection of genetically diverse hosts. We observed distinct patterns of the strain-specific gene distribution along the chromosome, which may result from different mechanisms of gene acquisition and loss. Among the strain-specific genes, we have found a class of candidate virulence genes identified by their coinheritance with the pathogenicity island.

    View details for Web of Science ID 000165993700121

    View details for PubMedID 11121067

  • Gene Ontology: tool for the unification of biology NATURE GENETICS Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., RINGWALD, M., Rubin, G. M., Sherlock, G. 2000; 25 (1): 25-29

    View details for Web of Science ID 000086884000011

    View details for PubMedID 10802651

  • Analysis of large-scale gene expression data CURRENT OPINION IN IMMUNOLOGY Sherlock, G. 2000; 12 (2): 201-205

    Abstract

    The advent of cDNA and oligonucleotide microarray technologies has led to a paradigm shift in biological investigation, such that the bottleneck in research is shifting from data generation to data analysis. Hierarchical clustering, divisive clustering, self-organizing maps and k-means clustering have all been recently used to make sense of this mass of data.

    View details for Web of Science ID 000085786300012

    View details for PubMedID 10712947

  • Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling NATURE Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. G., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L. M., Marti, G. E., Moore, T., Hudson, J., Lu, L. S., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O., Staudt, L. M. 2000; 403 (6769): 503-511

    Abstract

    Diffuse large B-cell lymphoma (DLBCL), the most common subtype of non-Hodgkin's lymphoma, is clinically heterogeneous: 40% of patients respond well to current therapy and have prolonged survival, whereas the remainder succumb to the disease. We proposed that this variability in natural history reflects unrecognized molecular heterogeneity in the tumours. Using DNA microarrays, we have conducted a systematic characterization of gene expression in B-cell malignancies. Here we show that there is diversity in gene expression among the tumours of DLBCL patients, apparently reflecting the variation in tumour proliferation rate, host response and differentiation state of the tumour. We identified two molecularly distinct forms of DLBCL which had gene expression patterns indicative of different stages of B-cell differentiation. One type expressed genes characteristic of germinal centre B cells ('germinal centre B-like DLBCL'); the second type expressed genes normally induced during in vitro activation of peripheral blood B cells ('activated B-like DLBCL'). Patients with germinal centre B-like DLBCL had a significantly better overall survival than those with activated B-like DLBCL. The molecular classification of tumours on the basis of gene expression can thus identify previously undetected and clinically significant subtypes of cancer.

    View details for Web of Science ID 000085227300039

    View details for PubMedID 10676951

  • Integrating functional genomic information into the Saccharomyces genome database NUCLEIC ACIDS RESEARCH Ball, C. A., Dolinski, K., Dwight, S. S., Harris, M. A., Issel-Tarver, L., Kasarskis, A., Scafe, C. R., Sherlock, G., Binkley, G., Jin, H., Kaloper, M., Orr, S. D., Schroeder, M., Weng, S., Zhu, Y., Botstein, D., Cherry, J. M. 2000; 28 (1): 77-80

    Abstract

    The Saccharomyces Genome Database (SGD) stores and organizes information about the nearly 6200 genes in the yeast genome. The information is organized around the 'locus page' and directs users to the detailed information they seek. SGD is endeavoring to integrate the existing information about yeast genes with the large volume of data generated by functional analyses that are beginning to appear in the literature and on web sites. New features will include searches of systematic analyses and Gene Summary Paragraphs that succinctly review the literature for each gene. In addition to current information, such as gene product and phenotype descriptions, the new locus page will also describe a gene product's cellular process, function and localization using a controlled vocabulary developed in collaboration with two other model organism databases. We describe these developments in SGD through the newly reorganized locus page. The SGD is accessible via the WWW at http://genome-www.stanford.edu/Saccharomyces/

    View details for Web of Science ID 000084896300020

    View details for PubMedID 10592186

  • Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure NUCLEIC ACIDS RESEARCH Chervitz, S. A., Hester, E. T., Ball, C. A., Dolinski, K., Dwight, S. S., Harris, M. A., Juvik, G., Malekian, A., Roberts, S., Roe, T., Scafe, C., Schroeder, M., Sherlock, G., Weng, S., Zhu, Y., Cherry, J. M., Botstein, D. 1999; 27 (1): 74-78

    Abstract

    The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae. The latest protein structure and comparison tools available at SGD are presented here. With the completion of the yeast sequence and the Caenorhabditis elegans sequence soon to follow, comparison of proteins from complete eukaryotic proteomes will be an extremely powerful way to learn more about a particular protein's structure, its function, and its relationships with other proteins. SGD can be accessed through the World Wide Web at http://genome-www.stanford.edu/Saccharomyces/

    View details for Web of Science ID 000077983000017

    View details for PubMedID 9847146

  • Comparison of the complete protein sets of worm and yeast: Orthology and divergence SCIENCE Chervitz, S. A., Aravind, L., Sherlock, G., Ball, C. A., Koonin, E. V., Dwight, S. S., Harris, M. A., Dolinski, K., Mohr, S., Smith, T., Weng, S., Cherry, J. M., Botstein, D. 1998; 282 (5396): 2022-2028

    Abstract

    Comparative analysis of predicted protein sequences encoded by the genomes of Caenorhabditis elegans and Saccharomyces cerevisiae suggests that most of the core biological functions are carried out by orthologous proteins (proteins of different species that can be traced back to a common ancestor) that occur in comparable numbers. The specialized processes of signal transduction and regulatory control that are unique to the multicellular worm appear to use novel proteins, many of which re-use conserved domains. Major expansion of the number of some of these domains seen in the worm may have contributed to the advent of multicellularity. The proteins conserved in yeast and worm are likely to have orthologs throughout eukaryotes; in contrast, the proteins unique to the worm may well define metazoans.

    View details for Web of Science ID 000077467100036

    View details for PubMedID 9851918

  • Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization MOLECULAR BIOLOGY OF THE CELL Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D., Futcher, B. 1998; 9 (12): 3273-3297

    Abstract

    We sought to create a comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle. To this end, we used DNA microarrays and samples from yeast cultures synchronized by three independent methods: alpha factor arrest, elutriation, and arrest of a cdc15 temperature-sensitive mutant. Using periodicity and correlation algorithms, we identified 800 genes that meet an objective minimum criterion for cell cycle regulation. In separate experiments, designed to examine the effects of inducing either the G1 cyclin Cln3p or the B-type cyclin Clb2p, we found that the mRNA levels of more than half of these 800 genes respond to one or both of these cyclins. Furthermore, we analyzed our set of cell cycle-regulated genes for known and new promoter elements and show that several known elements (or variations thereof) contain information predictive of cell cycle regulation. A full description and complete data sets are available at http://cellcycle-www.stanford.edu

    View details for Web of Science ID 000077388600003

    View details for PubMedID 9843569

  • MOLECULAR-CLONING AND ANALYSIS OF CDC28 AND CYCLIN HOMOLOGS FROM THE HUMAN FUNGAL PATHOGEN CANDIDA-ALBICANS MOLECULAR GENERAL GENETICS Sherlock, G., Bahman, A. M., Mahal, A., Shieh, J. C., Ferreira, M., Rosamond, J. 1994; 245 (6): 716-723

    Abstract

    In the budding yeast Saccharomyces cerevisiae, progress of the cell cycle beyond the major control point in G1 phase, termed START, requires activation of the evolutionarily conserved Cdc28 protein kinase by direct association with G1 cyclins. We have used a conditional lethal mutation in CDC28 of S. cerevisiae to clone a functional homologue from the human fungal pathogen Candida albicans. The protein sequence, deduced from the nucleotide sequence, is 79% identical to that of S. cerevisiae Cdc28 and as such is the most closely related protein yet identified. We have also isolated from C. albicans two genes encoding putative G1 cyclins, by their ability to rescue a conditional G1 cyclin defect in S. cerevisiae; one of these genes encodes a protein of 697 amino acids and is identical to the product of the previously described CCN1 gene. The second gene codes for a protein of 465 residues, which has significant homology to S. cerevisiae Cln3. These data suggest that the events and regulatory mechanisms operating at START are highly conserved between these two organisms.

    View details for Web of Science ID A1994QA10400006

    View details for PubMedID 7830719

  • STARTING TO CYCLE - G1 CONTROLS REGULATING CELL-DIVISION IN BUDDING YEAST JOURNAL OF GENERAL MICROBIOLOGY Sherlock, G., Rosamond, J. 1993; 139: 2531-2541

    Abstract

    In Saccharomyces cerevisiae, START has been shown to comprise a series of tightly regulated reactions by which the cellular environment is assessed and under appropriate conditions, cells are commited to a further round of mitotic division. The key effector of START is the product of the CDC28 gene and the mechanisms by which the protein kinase activity of this gene product is regulated at START are well characterized. This is in contrast to the events which follow p34CDC28 activation and the way in which progress to S phase is achieved, which are less clear. We suggest two possible models to describe the regulation of these events. Firstly, it is conceivable that the only post-START targets of the p34CDC28/G1 cyclin kinase complex are components of the SBF and DSC1 transcription factors. This would require that either SBF or DSC1 regulates CDC4 function either directly by activating the transcription of CDC4 itself or else indirectly by activating the transcription of a mediator of CDC4 function in a manner analogous to the way in which the control of CDC7 function may be mediated by transcriptional regulation of DBF4 (Jackson et al., 1993). Potential regulatory effectors of CDC4 function include SCM4, which suppresses cdc4 mutations in an allele-specific manner (Smith et al., 1992) or its homologue HFS1 (J. Hartley & J. Rosamond, unpublished). This possibility is supported by the finding that CDC4 has no upstream SCB or MCB elements, whereas SCM4 and HFS1 have either an exact or close match to the SCB. This model would further require that genes needed for bud emergence and spindle pole body duplication are also subject to transcriptional regulation by DSC1 or SBF. An alternative model is that the p34CDC28/G1 cyclin complexes have several targets post-START, one being DSC1 and the others being as yet unidentified components of the pathways leading to CDC4 function, spindle pole body duplication and bud emergence. This model could account for the functional redundancy observed amongst the G1 cyclins with the various cyclins providing substrate specificity for the kinase complex. We suggest that a complex containing Cln3 protein is primarily responsible for, and acts most efficiently on, the targets containing Swi6 protein (SBF and DSC1), with complexes containing other G1 cyclins (Cln1 and/or Cln2 proteins) principally involved in activating the other pathways. However, there must be overlap in the function of these complexes with each cyclin able to substitute for some or all of the functions when necessary, albeit with differing efficiencies. This hypothesis is supported by several observations.(ABSTRACT TRUNCATED AT 400 WORDS)

    View details for Web of Science ID A1993MH54500001

    View details for PubMedID 8277239

Conference Proceedings


  • The underlying principles of scientific publication. Ball, C. A., Sherlock, G., Parkinson, H., Rocca-Sera, P., Brooksbank, C., Causton, H. C., Cavalieri, D., Gaasterland, T., Hingamp, P., Holstege, F., Ringwald, M., Spellman, P., Stoeckert, C. J., Stewart, J. E., Taylor, R., Brazma, A., Quackenbush, J. 2002: 1409-?

    View details for PubMedID 12424109

Stanford Medicine Resources: