Russ Altman, Doctoral Dissertation Advisor (AC)
Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertaining disease cases from heterogeneous data sources such as hospital records, digital questionnaire responses, or interviews. In this study, we use genetic parameters, including genetic correlation, to evaluate whether GWAS performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of disease implicate similar disease genetics across a range of effect sizes. We find that hospital record and questionnaire GWAS largely identify similar genetic effects for many complex phenotypes and that combining together both phenotyping methods improves power to detect genetic associations. We also show that family history GWAS using cases ascertained on family history of disease agrees with combined hospital record and questionnaire GWAS and that family history GWAS has better power to detect genetic associations for some phenotypes. Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations.
View details for DOI 10.1016/j.ajhg.2020.03.007
View details for PubMedID 32275883
The small molecule Retro-2 prevents ricin toxicity through a poorly-defined mechanism of action (MOA), which involves halting retrograde vesicle transport to the endoplasmic reticulum (ER). CRISPRi genetic interaction analysis revealed Retro-2 activity resembles disruption of the transmembrane domain recognition complex (TRC) pathway, which mediates post-translational ER-targeting and insertion of tail-anchored (TA) proteins, including SNAREs required for retrograde transport. Cell-based and in vitro assays show that Retro-2 blocks delivery of newly-synthesized TA-proteins to the ER-targeting factor ASNA1 (TRC40). An ASNA1 point mutant identified using CRISPR-mediated mutagenesis abolishes both the cytoprotective effect of Retro-2 against ricin and its inhibitory effect on ASNA1-mediated ER-targeting. Together, our work explains how Retro-2 prevents retrograde trafficking of toxins by inhibiting TA-protein targeting, describes a general CRISPR strategy for predicting the MOA of small molecules, and paves the way for drugging the TRC pathway to treat broad classes of viruses known to be inhibited by Retro-2.
View details for DOI 10.7554/eLife.48434
View details for PubMedID 31674906
Social media has been identified as a promising potential source of information for pharmacovigilance. The adoption of social media data has been hindered by the massive and noisy nature of the data. Initial attempts to use social media data have relied on exact text matches to drugs of interest, and therefore suffer from the gap between formal drug lexicons and the informal nature of social media. The Reddit comment archive represents an ideal corpus for bridging this gap. We trained a word embedding model, RedMed, to facilitate the identification and retrieval of health entities from Reddit data. We compare the performance of our model trained on a consumer-generated corpus against publicly available models trained on expert-generated corpora. Our automated classification pipeline achieves an accuracy of 0.88 and a specificity of > 0.9 across four different term classes. Of all drug mentions, an average of 79% (±0.5%) were exact matches to a generic or trademark drug name, 14% (±0.5%) were misspellings, 6.4% (±0.3%) were synonyms, and 0.13% (±0.05%) were pill marks. We find that our system captures an additional 20% of mentions; these would have been missed by approaches that rely solely on exact string matches. We provide a lexicon of misspellings and synonyms for 2,978 drugs and a word embedding model trained on a health-oriented subset of Reddit.
View details for DOI 10.1016/j.jbi.2019.103307
View details for PubMedID 31627020
Pharmacogenomics (PGx) decision support and return of results is an active area of precision medicine. One challenge of implementing PGx is extracting genomic variants and assigning haplotypes in order to apply prescribing recommendations and information from CPIC, FDA, PharmGKB, etc. PharmCAT (1) extracts variants specified in guidelines from a genetic dataset derived from sequencing or genotyping technologies; (2) infers haplotypes and diplotypes; and (3) generates a report containing genotype/diplotype-based annotations and guideline recommendations. We describe PharmCAT and a pilot validation project comparing results for 1000 Genomes sequences of Coriell samples with corresponding Genetic Testing Reference Materials Coordination Program (GeT-RM) sample characterization. PharmCAT was highly concordant with the GeT-RM data. PharmCAT is available in GitHub to evaluate, test and report results back to the community. As precision medicine becomes more prevalent, our ability to consistently, accurately, and clearly define and report PGx annotations and prescribing recommendations is critical. This article is protected by copyright. All rights reserved.
View details for DOI 10.1002/cpt.1568
View details for PubMedID 31306493
Electronic communication is becoming increasingly popular worldwide, as evidenced by its widespread and rapidly growing use. In medicine however, it remains a novel approach to reach out to patients. Yet, they have the potential for further improving current health care. Electronic platforms could support therapy adherence and communication between physicians and patients. The power of social media as well as other electronic devices can improve adherence as evidenced by the development of the app bant. Additionally, systemic analysis of social media content by Screenome can identify health events not always captured by regular health care. By better identifying these health care events we can improve our current health care system as we will be able to better tailor to the patients' needs. All these techniques are a valuable component of modern health care and will help us into the future of increasingly digital health care. This article is protected by copyright. All rights reserved.
View details for DOI 10.1111/cts.12687
View details for PubMedID 31392837
Summary: Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests, and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities.Availability and implementation: GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.
View details for PubMedID 30520965
The field of pharmacogenomics is an area of great potential for near-term human health impacts from the big genomic data revolution. Pharmacogenomics research momentum is building with numerous hypotheses currently being investigated through the integration of molecular profiles of different cell lines and large genomic data sets containing information on cellular and human responses to therapies. Additionally, the results of previous pharmacogenetic research efforts have been formulated into clinical guidelines that are beginning to impact how healthcare is conducted on the level of the individual patient. This trend will only continue with the recent release of new datasets containing linked genotype and electronic medical record data. This review discusses key resources available for pharmacogenomics and pharmacogenetics research and highlights recent work within the field.
View details for PubMedID 29635477
Protein-truncating variants can have profound effects on gene function and are critical for clinical genome interpretation and generating therapeutic hypotheses, but their relevance to medical phenotypes has not been systematically assessed. Here, we characterize the effect of 18,228 protein-truncating variants across 135 phenotypes from the UK Biobank and find 27 associations between medical phenotypes and protein-truncating variants in genes outside the major histocompatibility complex. We perform phenome-wide analyses and directly measure the effect in homozygous carriers, commonly referred to as "human knockouts," across medical phenotypes for genes implicated as being protective against disease or associated with at least one phenotype in our study. We find several genes with strong pleiotropic or non-additive effects. Our results illustrate the importance of protein-truncating variants in a variety of diseases.
View details for PubMedID 29691392
The introduction of frameshift indels by genome editing has emerged as a powerful technique to study the functions of uncharacterized genes in cell lines and model organisms. Such mutations should lead to mRNA degradation owing to nonsense-mediated mRNA decay or the production of severely truncated proteins. Here, we show that frameshift indels engineered by genome editing can also lead to skipping of "multiple of three nucleotides" exons. Such splicing events result in in-frame mRNA that may encode fully or partially functional proteins. We also characterize a segregating nonsense variant (rs2273865) located in a "multiple of three nucleotides" exon of LGALS8 that increases exon skipping in human erythroblast samples. Our results highlight the potentially frequent contribution of exonic splicing regulatory elements and are important for the interpretation of negative results in genome editing experiments. Moreover, they may contribute to a better annotation of loss-of-function mutations in the human genome.
View details for DOI 10.1371/journal.pone.0178700
View details for PubMedID 28570605