Doctor of Philosophy, University of Washington (2010)
Michael Snyder, Postdoctoral Faculty Sponsor
The ability of a protein to carry out a given function results from fundamental physicochemical properties that include the protein's structure, mechanism of action, and thermodynamic stability. Traditional approaches to study these properties have typically required the direct measurement of the property of interest, oftentimes a laborious undertaking. Although protein properties can be probed by mutagenesis, this approach has been limited by its low throughput. Recent technological developments have enabled the rapid quantification of a protein's function, such as binding to a ligand, for numerous variants of that protein. Here, we measure the ability of 47,000 variants of a WW domain to bind to a peptide ligand and use these functional measurements to identify stabilizing mutations without directly assaying stability. Our approach is rooted in the well-established concept that protein function is closely related to stability. Protein function is generally reduced by destabilizing mutations, but this decrease can be rescued by stabilizing mutations. Based on this observation, we introduce partner potentiation, a metric that uses this rescue ability to identify stabilizing mutations, and identify 15 candidate stabilizing mutations in the WW domain. We tested six candidates by thermal denaturation and found two highly stabilizing mutations, one more stabilizing than any previously known mutation. Thus, physicochemical properties such as stability are latent within these large-scale protein functional data and can be revealed by systematic analysis. This approach should allow other protein properties to be discovered.
View details for DOI 10.1073/pnas.1209751109
View details for Web of Science ID 000310515800030
View details for PubMedID 23035249
Measuring the consequences of mutation in proteins is critical to understanding their function. These measurements are essential in such applications as protein engineering, drug development, protein design and genome sequence analysis. Recently, high-throughput sequencing has been coupled to assays of protein activity, enabling the analysis of large numbers of mutations in parallel. We present Enrich, a tool for analyzing such deep mutational scanning data. Enrich identifies all unique variants (mutants) of a protein in high-throughput sequencing datasets and can correct for sequencing errors using overlapping paired-end reads. Enrich uses the frequency of each variant before and after selection to calculate an enrichment ratio, which is used to estimate fitness. Enrich provides an interactive interface to guide users. It generates user-accessible output for downstream analyses as well as several visualizations of the effects of mutation on function, thereby allowing the user to rapidly quantify and comprehend sequence-function relationships.Enrich is implemented in Python and is available under a FreeBSD license at http://depts.washington.edu/sfields/software/enrich/. Enrich includes detailed documentation as well as a small example firstname.lastname@example.org; email@example.comSupplementary data is available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btr577
View details for Web of Science ID 000297860000017
View details for PubMedID 22006916
Analysis of protein mutants is an effective means to understand their function. Protein display is an approach that allows large numbers of mutants of a protein to be selected based on their activity, but only a handful with maximal activity have been traditionally identified for subsequent functional analysis. However, the recent application of high-throughput sequencing (HTS) to protein display and selection has enabled simultaneous assessment of the function of hundreds of thousands of mutants that span the activity range from high to low. Such deep mutational scanning approaches are rapid and inexpensive with the potential for broad utility. In this review, we discuss the emergence of deep mutational scanning, the challenges associated with its use and some of its exciting applications.
View details for DOI 10.1016/j.tibtech.2011.04.003
View details for Web of Science ID 000294943400003
View details for PubMedID 21561674
We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.
View details for DOI 10.1038/nmeth.1492
View details for Web of Science ID 000281429200020
View details for PubMedID 20711194
Experimental evolution of microbial populations provides a unique opportunity to study evolutionary adaptation in response to controlled selective pressures. However, until recently it has been difficult to identify the precise genetic changes underlying adaptation at a genome-wide scale. New DNA sequencing technologies now allow the genome of parental and evolved strains of microorganisms to be rapidly determined.We sequenced >93.5% of the genome of a laboratory-evolved strain of the yeast Saccharomyces cerevisiae and its ancestor at >28x depth. Both single nucleotide polymorphisms and copy number amplifications were found, with specific gains over array-based methodologies previously used to analyze these genomes. Applying a segmentation algorithm to quantify structural changes, we determined the approximate genomic boundaries of a 5x gene amplification. These boundaries guided the recovery of breakpoint sequences, which provide insights into the nature of a complex genomic rearrangement.This study suggests that whole-genome sequencing can provide a rapid approach to uncover the genetic basis of evolutionary adaptations, with further applications in the study of laboratory selections and mutagenesis screens. In addition, we show how single-end, short read sequencing data can provide detailed information about structural rearrangements, and generate predictions about the genomic features and processes that underlie genome plasticity.
View details for DOI 10.1186/1471-2164-11-88
View details for Web of Science ID 000275291200001
View details for PubMedID 20128923
The invariant lineage of the nematode Caenorhabditis elegans has potential as a powerful tool for the description of mutant phenotypes and gene expression patterns. We previously described procedures for the imaging and automatic extraction of the cell lineage from C. elegans embryos. That method uses time-lapse confocal imaging of a strain expressing histone-GFP fusions and a software package, StarryNite, processes the thousands of images and produces output files that describe the location and lineage relationship of each nucleus at each time point.We have developed a companion software package, AceTree, which links the images and the annotations using tree representations of the lineage. This facilitates curation and editing of the lineage. AceTree also contains powerful visualization and interpretive tools, such as space filling models and tree-based expression patterning, that can be used to extract biological significance from the data.By pairing a fast lineaging program written in C with a user interface program written in Java we have produced a powerful software suite for exploring embryonic development.
View details for DOI 10.1186/1471-2105-7-275
View details for Web of Science ID 000238984400001
View details for PubMedID 16740163