Academic Appointments
-
Assistant Professor, Biomedical Data Science
-
Assistant Professor (By courtesy), Computer Science
-
Assistant Professor (By courtesy), Electrical Engineering
-
Member, Bio-X
Deep learning methods are a class of machine learning techniques capable of identifying highly complex patterns in large datasets. Here, we provide a perspective and primer on deep learning applications for genome analysis. We discuss successful applications in the fields of regulatory genomics, variant calling and pathogenicity scores. We include general guidance for how to effectively use deep learning methods as well as a practical guide to tools and resources. This primer is accompanied by an interactive online tutorial.
View details for DOI 10.1038/s41588-018-0295-5
View details for PubMedID 30478442
The Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group highlights the need to develop guidance on race, ethnicity, and ancestry (REA) data collection and use in clinical genomics. We present quantitative and qualitative evidence to characterize: (1) acquisition of REA data via clinical laboratory requisition forms, and (2) information disparity across populations in the Genome Aggregation Database (gnomAD) at clinically relevant sites ascertained from annotations in ClinVar. Our requisition form analysis showed substantial heterogeneity in clinical laboratory ascertainment of REA, as well as marked incongruity among terms used to define REA categories. There was also striking disparity across REA populations in the amount of information available about clinically relevant variants in gnomAD. European ancestral populations constituted the majority of observations (55.8%), allele counts (59.7%), and private alleles (56.1%) in gnomAD at 550 loci with "pathogenic" and "likely pathogenic" expert-reviewed variants in ClinVar. Our findings highlight the importance of implementing and supporting programs to increase diversity in genome sequencing and clinical genomics, as well as measuring uncertainty around population-level datasets that are used in variant interpretation. Finally, we suggest the need for a standardized REA data collection framework to be developed through partnerships and collaborations and adopted across clinical genomics.
View details for DOI 10.1002/humu.23644
View details for PubMedID 30311373
View details for DOI 10.1038/s41746-018-0067-8
View details for Web of Science ID 000449685400001
Malaria parasites (Plasmodium spp.) and related apicomplexan pathogens contain a nonphotosynthetic plastid called the apicoplast. Derived from an unusual secondary eukaryote-eukaryote endosymbiosis, the apicoplast is a fascinating organelle whose function and biogenesis rely on a complex amalgamation of bacterial and algal pathways. Because these pathways are distinct from the human host, the apicoplast is an excellent source of novel antimalarial targets. Despite its biomedical importance and evolutionary significance, the absence of a reliable apicoplast proteome has limited most studies to the handful of pathways identified by homology to bacteria or primary chloroplasts, precluding our ability to study the most novel apicoplast pathways. Here, we combine proximity biotinylation-based proteomics (BioID) and a new machine learning algorithm to generate a high-confidence apicoplast proteome consisting of 346 proteins. Critically, the high accuracy of this proteome significantly outperforms previous prediction-based methods and extends beyond other BioID studies of unique parasite compartments. Half of identified proteins have unknown function, and 77% are predicted to be important for normal blood-stage growth. We validate the apicoplast localization of a subset of novel proteins and show that an ATP-binding cassette protein ABCF1 is essential for blood-stage survival and plays a previously unknown role in apicoplast biogenesis. These findings indicate critical organellar functions for newly discovered apicoplast proteins. The apicoplast proteome will be an important resource for elucidating unique pathways derived from secondary endosymbiosis and prioritizing antimalarial drug targets.
View details for DOI 10.1371/journal.pbio.2005895
View details for PubMedID 30212465
View details for DOI 10.1038/d41586-018-05707-8
View details for Web of Science ID 000439059800025
View details for PubMedID 30018439
Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.
View details for DOI 10.1038/s41467-018-04608-8
View details for Web of Science ID 000433541700002
View details for PubMedID 29849030
View details for PubMedCentralID PMC5976774
Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts-e.g., the women's movement in the 1960s and Asian immigration into the United States-and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.
View details for DOI 10.1073/pnas.1720347115
View details for Web of Science ID 000430191900008
View details for PubMedID 29615513
View details for PubMedCentralID PMC5910851
Obesity-associated insulin resistance plays a central role in type 2 diabetes. As such, tyrosine phosphatases that dephosphorylate the insulin receptor (IR) are potential therapeutic targets. The low-molecular-weight protein tyrosine phosphatase (LMPTP) is a proposed IR phosphatase, yet its role in insulin signaling in vivo has not been defined. Here we show that global and liver-specific LMPTP deletion protects mice from high-fat diet-induced diabetes without affecting body weight. To examine the role of the catalytic activity of LMPTP, we developed a small-molecule inhibitor with a novel uncompetitive mechanism, a unique binding site at the opening of the catalytic pocket, and an exquisite selectivity over other phosphatases. This inhibitor is orally bioavailable, and it increases liver IR phosphorylation in vivo and reverses high-fat diet-induced diabetes. Our findings suggest that LMPTP is a key promoter of insulin resistance and that LMPTP inhibitors would be beneficial for treating type 2 diabetes.
View details for DOI 10.1038/nchembio.2344
View details for Web of Science ID 000401419300015
View details for PubMedID 28346406
View details for DOI 10.1038/nmeth.4190
View details for PubMedID 28245214
As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.
View details for DOI 10.1038/ncomms13293
View details for PubMedID 27796292
View details for PubMedCentralID PMC5095512
Searches for a light sterile neutrino have been performed independently by the MINOS and the Daya Bay experiments using the muon (anti)neutrino and electron antineutrino disappearance channels, respectively. In this Letter, results from both experiments are combined with those from the Bugey-3 reactor neutrino experiment to constrain oscillations into light sterile neutrinos. The three experiments are sensitive to complementary regions of parameter space, enabling the combined analysis to probe regions allowed by the Liquid Scintillator Neutrino Detector (LSND) and MiniBooNE experiments in a minimally extended four-neutrino flavor framework. Stringent limits on sin^{2}2θ_{μe} are set over 6 orders of magnitude in the sterile mass-squared splitting Δm_{41}^{2}. The sterile-neutrino mixing phase space allowed by the LSND and MiniBooNE experiments is excluded for Δm_{41}^{2}<0.8 eV^{2} at 95% CL_{s}.
View details for DOI 10.1103/PhysRevLett.117.151801
View details for Web of Science ID 000385341900006
View details for PubMedID 27768356
The use of programmed electrical signals to influence biological events has been a widely accepted clinical methodology for neurostimulation. An optimal biocompatible platform for neural activation efficiently transfers electrical signals across the electrode-cell interface and also incorporates large-area neural guidance conduits. Inherently conducting polymers (ICPs) have emerged as frontrunners as soft biocompatible alternatives to traditionally used metal electrodes, which are highly invasive and elicit tissue damage over long-term implantation. However, fabrication techniques for the ICPs suffer a major bottleneck, which limits their usability and medical translation. Herein, we report that these limitations can be overcome using colloidal chemistry to fabricate multimodal conducting polymer nanoparticles. Furthermore, we demonstrate that these polymer nanoparticles can be precisely assembled into large-area linear conduits using surface chemistry. Finally, we validate that this platform can act as guidance conduits for neurostimulation, whereby the presence of electrical current induces remarkable dendritic axonal sprouting of cells.
View details for DOI 10.1021/nn506607x
View details for Web of Science ID 000349940500072
View details for PubMedID 25623615
To report endovascular repair with the chimney technique of type B aortic dissection involving a right-sided aortic arch (RAA).Two hypertensive men aged 48 and 42 years with symptoms of aortic dissection resistant to medical therapy underwent emergent thoracic endovascular aortic repair with the chimney technique to extend the proximal landing zones. Both patients had right-sided arches with mirror image branching. One patient required a bare metal chimney stent to maintain perfusion to the right subclavian artery, while the other patient had a chimney stent to revascularize the right common carotid artery. Short-term follow-up (1 year and 1 month, respectively) showed that there was positive aortic remodeling, and the chimney stents were patent.Chimney TEVAR seems safe and effective for Stanford type B dissection in patients having RAA with mirror image branching and no sufficient proximal fixation zone.
View details for Web of Science ID 000320074100005
View details for PubMedID 23731297
Transdifferentiation of fibroblasts to endothelial cells (ECs) may provide a novel therapeutic avenue for diseases, including ischemia and fibrosis. Here, we demonstrate that human fibroblasts can be transdifferentiated into functional ECs by using only 2 factors, Oct4 and Klf4, under inductive signaling conditions.To determine whether human fibroblasts could be converted into ECs by transient expression of pluripotency factors, human neonatal fibroblasts were transduced with lentiviruses encoding Oct4 and Klf4 in the presence of soluble factors that promote the induction of an endothelial program. After 28 days, clusters of induced endothelial (iEnd) cells seemed and were isolated for further propagation and subsequent characterization. The iEnd cells resembled primary human ECs in their transcriptional signature by expressing endothelial phenotypic markers, such as CD31, vascular endothelial-cadherin, and von Willebrand Factor. Furthermore, the iEnd cells could incorporate acetylated low-density lipoprotein and form vascular structures in vitro and in vivo. When injected into the ischemic limb of mice, the iEnd cells engrafted, increased capillary density, and enhanced tissue perfusion. During the transdifferentiation process, the endogenous pluripotency network was not activated, suggesting that this process bypassed a pluripotent intermediate step.Pluripotent factor-induced transdifferentiation can be successfully applied for generating functional autologous ECs for therapeutic applications.
View details for DOI 10.1161/ATVBAHA.112.301167
View details for Web of Science ID 000319119500038
View details for PubMedID 23520160
The tight association between nitrogen status and pathogenesis has been broadly documented in plant-pathogen interactions. However, the interface between primary metabolism and disease responses remains largely unclear. Here, we show that knockout of a single amino acid transporter, LYSINE HISTIDINE TRANSPORTER1 (LHT1), is sufficient for Arabidopsis thaliana plants to confer a broad spectrum of disease resistance in a salicylic acid-dependent manner. We found that redox fine-tuning in photosynthetic cells was causally linked to the lht1 mutant-associated phenotypes. Furthermore, the enhanced resistance in lht1 could be attributed to a specific deficiency of its main physiological substrate, Gln, and not to a general nitrogen deficiency. Thus, by enabling nitrogen metabolism to moderate the cellular redox status, a plant primary metabolite, Gln, plays a crucial role in plant disease resistance.
View details for DOI 10.1105/tpc.110.079392
View details for Web of Science ID 000285576500025
View details for PubMedID 21097712
View details for PubMedCentralID PMC3015111
This article presents the proceedings of a symposium held at the meeting of the International Society for Biomedical Research on Alcoholism (ISBRA) in Mannheim, Germany, in October, 2004. Chronic alcoholism follows a fluctuating course, which provides a naturalistic experiment in vulnerability, resilience, and recovery of human neural systems in response to presence, absence, and history of the neurotoxic effects of alcoholism. Alcohol dependence is a progressive chronic disease that is associated with changes in neuroanatomy, neurophysiology, neural gene expression, psychology, and behavior. Specifically, alcohol dependence is characterized by a neuropsychological profile of mild to moderate impairment in executive functions, visuospatial abilities, and postural stability, together with relative sparing of declarative memory, language skills, and primary motor and perceptual abilities. Recovery from alcoholism is associated with a partial reversal of CNS deficits that occur in alcoholism. The reversal of deficits during recovery from alcoholism indicates that brain structure is capable of repair and restructuring in response to insult in adulthood. Indirect support of this repair model derives from studies of selective neuropsychological processes, structural and functional neuroimaging studies, and preclinical studies on degeneration and regeneration during the development of alcohol dependence and recovery form dependence. Genetics and brain regional specificity contribute to unique changes in neuropsychology and neuroanatomy in alcoholism and recovery. This symposium includes state-of-the-art presentations on changes that occur during active alcoholism as well as those that may occur during recovery-abstinence from alcohol dependence. Included are human neuroimaging and neuropsychological assessments, changes in human brain gene expression, allelic combinations of genes associated with alcohol dependence and preclinical studies investigating mechanisms of alcohol induced neurotoxicity, and neuroprogenetor cell expansion during recovery from alcohol dependence.
View details for DOI 10.1097/01.alc.0000175013.50644.61
View details for Web of Science ID 000231767900018
View details for PubMedID 16156047
Potentially fatal physiologic and metabolic derangements can occur in response to bacterial infection in animals and man. Recently it has been shown that alterations in the levels of circulating cytokines such as IL-6 and TNF-alpha occur shortly after bacterial challenge. To understand better the role of IL-6 in inflammation, we investigated the effects of in vivo anti-mouse IL-6 antibody treatment in a mouse model of septic shock. Rat anti-mouse IL-6 neutralizing mAb was produced from splenocytes of an animal immunized with mouse rIL-6. This mAb, MP5-20F3, was a very potent and specific antagonist of mouse IL-6 in vitro bioactivity, demonstrated using the NFS60 myelomonocytic and KD83 plasmacytoma target cell lines, and also immunoprecipitated radiolabeled IL-6. Anti-IL-6 mAb pretreatment of mice subsequently challenged with lethal doses of i.p. Escherichia coli or i.v. TNF-alpha protected mice from death caused by these treatments. Pretreatment of E. coli-challenged mice with anti-IL-6 led to an increase in serum TNF bioactivity, in comparison to isotype control antibody, implicating IL-6 as a negative modulator of TNF in vivo. Anti-TNF-alpha treatment of mice challenged i.p. with live E. coli resulted in a 70% decrease in serum IL-6 levels, determined by immunoenzymetric assay, compared to control antibody, thereby supporting a role for TNF-alpha as a positive regulator of IL-6 levels. We conclude that IL-6 is a mediator in lethal E. coli infection, and suggest that antagonists of IL-6 may be beneficial therapeutically in life-threatening bacterial infection.
View details for Web of Science ID A1990EP04100033
View details for PubMedID 2124237