Current Research and Scholarly Interests
The Cherry lab is involved in identifying, validating and integrating scientific information into encyclopedic databases essential for investigation as well as scientific education. Published results of scientific experimentation are a foundation of our understanding of the natural world and provide motivation for new experiments. The combination of in-depth understanding reported in the literature with computational analyses is an essential ingredient of modern biological research. Mastery of the volumes of published literature requires comprehensive databases that provide the facts and underlying experimental data in publically accessible ways. Curation, extraction and sorting of factual experimental data from peer-reviewed journal articles is necessary to acquire these data from its source. Large quantitative datasets using global studies extend our knowledge of genes, their products and their interactions. By integrating quantitative datasets with curated focused experimental results creates unique comprehensive databases. My group creates such essential databases and makes them available to scientists and educators seeking to understand experimental results and to teach scientific knowledge.
The exploration of the genes and other important elements of a genome involve the use of previous results to aid the design of experiments that explore, for example, gene regulation, protein function, and interaction of these processes. New technologies are being applied to the determination of many molecular interactions of the components of chromosomes and the specific controls for the generation of the many cell types that create an organism from a single set of chromosomes. These methods create very large datasets that cannot be appreciated without computational methods and access to databases of scientific results.
The Cherry lab specializes in designing and managing a public database of information for the budding yeast Saccharomyces cerevisiae and have recently begun applying my expertise to human genomic information. Our current projects address three areas of research: engineering for the design of databases and software for the effective integration of complex experimental results; defining standards for eukaryotic genomic data that measure reliability and quality; and developing vocabularies that enhance communication between researchers, and between computational resources. This research involves the collection and standardization of experimental results and the detailed descriptions of these data into complex biological models, application of flexible search and retrieval tools, distribution of the integrated information for the acceleration of discovery.
Three major bioinformatics resources funded by the National Institutes of Health are provided by the lab. The Saccharomyces Genome Database project is the foremost database on a single organism. It is the archetype of all such databases because of its high quality, rich design, completeness, easy of use, and facilitation of scientific discovery. The Gene Ontology Consortium invented a structured vocabulary for the specification and description of gene function, their involvement in biological processes and their location within subcellular complexes and components. This innovative knowledgebase has unified biological nomenclature and is crucial for the analysis of biological results. The ENCODE Data Coordination Center provides an essential component for the analysis and use of large-scale studies of the human genome. Our work specifies the accurate and complete submission of human genomic experimental results, verifies the data quality, specifies and compiles the dataset experimental details, integrates data with existing human genome databases, distributed these results with its analyses via a portal that serves the diverse biomedical research community of skilled bioinformaticists, biologists, and educators.