Doctor of Philosophy, Yale University (2008)
Michael Snyder, Postdoctoral Faculty Sponsor
Combined immunodeficiency with multiple intestinal atresias (CID-MIA) is a rare hereditary disease characterized by intestinal obstructions and profound immune defects.We sought to determine the underlying genetic causes of CID-MIA by analyzing the exomic sequences of 5 patients and their healthy direct relatives from 5 unrelated families.We performed whole-exome sequencing on 5 patients with CID-MIA and 10 healthy direct family members belonging to 5 unrelated families with CID-MIA. We also performed targeted Sanger sequencing for the candidate gene tetratricopeptide repeat domain 7A (TTC7A) on 3 additional patients with CID-MIA.Through analysis and comparison of the exomic sequence of the subjects from these 5 families, we identified biallelic damaging mutations in the TTC7A gene, for a total of 7 distinct mutations. Targeted TTC7A gene sequencing in 3 additional unrelated patients with CID-MIA revealed biallelic deleterious mutations in 2 of them, as well as an aberrant splice product in the third patient. Staining of normal thymus showed that the TTC7A protein is expressed in thymic epithelial cells, as well as in thymocytes. Moreover, severe lymphoid depletion was observed in the thymus and peripheral lymphoid tissues from 2 patients with CID-MIA.We identified deleterious mutations of the TTC7A gene in 8 unrelated patients with CID-MIA and demonstrated that the TTC7A protein is expressed in the thymus. Our results strongly suggest that TTC7A gene defects cause CID-MIA.
View details for DOI 10.1016/j.jaci.2013.06.013
View details for Web of Science ID 000323612000018
View details for PubMedID 23830146
Personalized medicine is expected to benefit from combining genomic information with regular monitoring of physiological states by multiple high-throughput methods. Here, we present an integrative personal omics profile (iPOP), an analysis that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period. Our iPOP analysis revealed various medical risks, including type 2 diabetes. It also uncovered extensive, dynamic changes in diverse molecular components and biological pathways across healthy and diseased conditions. Extremely high-coverage genomic and transcriptomic data, which provide the basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and an unexpected RNA editing mechanism. This study demonstrates that longitudinal iPOP can be used to interpret healthy and diseased states by connecting genomic information with additional dynamic omics activity.
View details for DOI 10.1016/j.cell.2012.02.009
View details for Web of Science ID 000301889500023
View details for PubMedID 22424236
Whole exome sequencing by high-throughput sequencing of target-enriched genomic DNA (exome-seq) has become common in basic and translational research as a means of interrogating the interpretable part of the human genome at relatively low cost. We present a comparison of three major commercial exome sequencing platforms from Agilent, Illumina and Nimblegen applied to the same human blood sample. Our results suggest that the Nimblegen platform, which is the only one to use high-density overlapping baits, covers fewer genomic regions than the other platforms but requires the least amount of sequencing to sensitively detect small variants. Agilent and Illumina are able to detect a greater total number of variants with additional sequencing. Illumina captures untranslated regions, which are not targeted by the Nimblegen and Agilent platforms. We also compare exome sequencing and whole genome sequencing (WGS) of the same sample, demonstrating that exome sequencing can detect additional small variants missed by WGS.
View details for DOI 10.1038/nbt.1975
View details for Web of Science ID 000296273000017
View details for PubMedID 21947028
We report progress assembling the parts list for chromosome 17 and illustrate the various processes that we have developed to integrate available data from diverse genomic and proteomic knowledge bases. As primary resources, we have used GPMDB, neXtProt, PeptideAtlas, Human Protein Atlas (HPA), and GeneCards. All sites share the common resource of Ensembl for the genome modeling information. We have defined the chromosome 17 parts list with the following information: 1169 protein-coding genes, the numbers of proteins confidently identified by various experimental approaches as documented in GPMDB, neXtProt, PeptideAtlas, and HPA, examples of typical data sets obtained by RNASeq and proteomic studies of epithelial derived tumor cell lines (disease proteome) and a normal proteome (peripheral mononuclear cells), reported evidence of post-translational modifications, and examples of alternative splice variants (ASVs). We have constructed a list of the 59 "missing" proteins as well as 201 proteins that have inconclusive mass spectrometric (MS) identifications. In this report we have defined a process to establish a baseline for the incorporation of new evidence on protein identification and characterization as well as related information from transcriptome analyses. This initial list of "missing" proteins that will guide the selection of appropriate samples for discovery studies as well as antibody reagents. Also we have illustrated the significant diversity of protein variants (including post-translational modifications, PTMs) using regions on chromosome 17 that contain important oncogenes. We emphasize the need for mandated deposition of proteomics data in public databases, the further development of improved PTM, ASV, and single nucleotide variant (SNV) databases, and the construction of Web sites that can integrate and regularly update such information. In addition, we describe the distribution of both clustered and scattered sets of protein families on the chromosome. Since chromosome 17 is rich in cancer-associated genes, we have focused the clustering of cancer-associated genes in such genomic regions and have used the ERBB2 amplicon as an example of the value of a proteogenomic approach in which one integrates transcriptomic with proteomic information and captures evidence of coexpression through coordinated regulation.
View details for DOI 10.1021/pr300985j
View details for Web of Science ID 000313156300007
View details for PubMedID 23259914
The rapid development of high-throughput technologies and computational frameworks enables the examination of biological systems in unprecedented detail. The ability to study biological phenomena at omics levels in turn is expected to lead to significant advances in personalized and precision medicine. Patients can be treated according to their own molecular characteristics. Individual omes as well as the integrated profiles of multiple omes, such as the genome, the epigenome, the transcriptome, the proteome, the metabolome, the antibodyome, and other omics information are expected to be valuable for health monitoring, preventative measures, and precision medicine. Moreover, omics technologies have the potential to transform medicine from traditional symptom-oriented diagnosis and treatment of diseases toward disease prevention and early diagnostics. We discuss here the advances and challenges in systems biology-powered personalized medicine at its current stage, as well as a prospective view of future personalized health care at the end of this review.
View details for DOI 10.1002/wsbm.1198
View details for Web of Science ID 000312736200005
View details for PubMedID 23184638
This unit describes methods for targeted enrichment of the exon-coding portions of the genome using Agilent SureSelect Human All Exon 50 Mb and Roche Nimblegen SeqCap EZ Exome platforms. Each platform targets and enriches a large overlapping portion of the greater human exome. The protocols here describe the biochemical procedures used to enrich exomic DNA with each platform, including recommended modifications to the manufacturers' protocols. In addition, a brief description of the sequencing protocol and estimation of the needed amount of sequencing for each platform is included. Finally, a detailed analytical pipeline for processing the subsequent data is described. These protocols focus specifically on human exome sequencing platforms, but can be applied with some modification to other organisms and targeted enrichment approaches.
View details for DOI 10.1002/0471142727.mb0712s102
View details for PubMedID 23547016
Systems biology is actively transforming the field of modern health care from symptom-based disease diagnosis and treatment to precision medicine in which patients are treated based on their individual characteristics. Development of high-throughput technologies such as high-throughout sequencing and mass spectrometry has enabled scientists and clinicians to examine genomes, transcriptomes, proteomes, metabolomes, and other omics information in unprecedented detail. The combined 'omics' information leads to a global profiling of health and disease, and provides new approaches for personalized health monitoring and preventative medicine. In this article, we review the efforts of systems biology in personalized medicine in the past 2 years, and discuss in detail achievements and concerns, as well as highlights and hurdles for future personalized health care.
View details for DOI 10.1016/j.coph.2012.07.011
View details for Web of Science ID 000310478800017
View details for PubMedID 22858243
Characterized by ventricular dilatation, systolic dysfunction, and progressive heart failure, dilated cardiomyopathy (DCM) is the most common form of cardiomyopathy in patients. DCM is the most common diagnosis leading to heart transplantation and places a significant burden on healthcare worldwide. The advent of induced pluripotent stem cells (iPSCs) offers an exceptional opportunity for creating disease-specific cellular models, investigating underlying mechanisms, and optimizing therapy. Here, we generated cardiomyocytes from iPSCs derived from patients in a DCM family carrying a point mutation (R173W) in the gene encoding sarcomeric protein cardiac troponin T. Compared to control healthy individuals in the same family cohort, cardiomyocytes derived from iPSCs from DCM patients exhibited altered regulation of calcium ion (Ca(2+)), decreased contractility, and abnormal distribution of sarcomeric ?-actinin. When stimulated with a ?-adrenergic agonist, DCM iPSC-derived cardiomyocytes showed characteristics of cellular stress such as reduced beating rates, compromised contraction, and a greater number of cells with abnormal sarcomeric ?-actinin distribution. Treatment with ?-adrenergic blockers or overexpression of sarcoplasmic reticulum Ca(2+) adenosine triphosphatase (Serca2a) improved the function of iPSC-derived cardiomyocytes from DCM patients. Thus, iPSC-derived cardiomyocytes from DCM patients recapitulate to some extent the morphological and functional phenotypes of DCM and may serve as a useful platform for exploring disease mechanisms and for drug screening.
View details for DOI 10.1126/scitranslmed.3003552
View details for Web of Science ID 000303045900004
View details for PubMedID 22517884
Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ?76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ?3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.
View details for DOI 10.1038/nbt.2065
View details for PubMedID 22178993
Protein kinases are key regulators of cellular processes. In spite of considerable effort, a full understanding of the pathways they participate in remains elusive. We globally investigated the proteins that interact with the majority of yeast protein kinases using protein microarrays. Eighty-five kinases were purified and used to probe yeast proteome microarrays. One-thousand-twenty-three interactions were identified, and the vast majority were novel. Coimmunoprecipitation experiments indicate that many of these interactions occurred in vivo. Many novel links of kinases to previously distinct cellular pathways were discovered. For example, the well-studied Kss1 filamentous pathway was found to bind components of diverse cellular pathways, such as those of the stress response pathway and the Ccr4-Not transcriptional/translational regulatory complex; genetic tests revealed that these different components operate in the filamentation pathway in vivo. Overall, our results indicate that kinases operate in a highly interconnected network that coordinates many activities of the proteome. Our results further demonstrate that protein microarrays uncover a diverse set of interactions not observed previously.
View details for DOI 10.1101/gad.1998811
View details for Web of Science ID 000289062700010
View details for PubMedID 21460040
Our understanding of biological processes as well as human diseases has improved greatly thanks to studies on model organisms such as yeast. The power of scientific approaches with yeast lies in its relatively simple genome, its facile classical and molecular genetics, as well as the evolutionary conservation of many basic biological mechanisms. However, even in this simple model organism, systems biology studies, especially proteomic studies had been an intimidating task. During the past decade, powerful high-throughput technologies in proteomic research have been developed for yeast including protein microarray technology. The protein microarray technology allows the interrogation of protein-protein, protein-DNA, protein-small molecule interaction networks as well as post-translational modification networks in a large-scale, high-throughput manner. With this technology, many groundbreaking findings have been established in studies with the budding yeast Saccharomyces cerevisiae, most of which could have been unachievable with traditional approaches. Discovery of these networks has profound impact on explicating biological processes with a proteomic point of view, which may lead to a better understanding of normal biological phenomena as well as various human diseases.
View details for DOI 10.1016/j.jprot.2010.08.003
View details for Web of Science ID 000283903000008
View details for PubMedID 20728591
Our understanding of human disease and potential therapeutics is improving rapidly. In order to take advantage of these developments it is important to be able to identify disease markers. Many new high-throughput genomics and proteomics technologies are being implemented to identify candidate disease markers. These technologies include protein microarrays, next-generation DNA sequencing and mass spectrometry platforms. Such methods are particularly important for elucidating the repertoire of molecular markers in the genome, transcriptome, proteome and metabolome of patients with diseases such as cancer, autoimmune diseases, and viral infections, resulting from the disruption of many biological pathways. These new technologies have identified many potential disease markers. These markers are expected to be valuable to achieve the promise of truly personalized medicine.
View details for DOI 10.3233/DMA-2010-0707
View details for Web of Science ID 000279321200003
View details for PubMedID 20534906