Bio

Current Role at Stanford


Data mining, meta-analysis, and databasing in immunoinformatics; design and development of data extraction programs, including semantic encoding, of biomedical data; multiple roles in Human Immunology Project initiative, such as development of metadata standards development and automatic flow cytometry analysis. Next-Gen sequence analysis, including using Amazon Cloud deployments. Prior projects have focused on the prediction of adverse reactions to pharmaceuticals, and predicting birth prematurity using a meta-analysis approach.

Education & Certifications


  • PhD, McGill University, Biology (1994)
  • MBA, University of Phoenix, Technology Management (1998)

Projects


  • Human Immunology Project Consortium, Stanford University School of Medicine (9/1/2011)

    Location

    stanford, ca

  • ImmPort datatabase 2.0, NIAID (10/2/2012)

    ImmPort provides reference and experiment data for immunologists. ImmPort provides advanced information technology support in the production, analysis, archiving, and exchange of scientific data for the diverse community of life science researchers supported by NIAID/DAIT.

    Location

    Stanford, CA

    Collaborators

Professional

Professional Interests


ontologies, data mining, knowledge representation, computational biology

Publications

Journal Articles


  • Predicting Adverse Drug Reactions Using Publicly Available PubChem BioAssay Data CLINICAL PHARMACOLOGY & THERAPEUTICS Pouliot, Y., Chiang, A. P., Butte, A. J. 2011; 90 (1): 90-99

    Abstract

    Adverse drug reactions (ADRs) can have severe consequences, and therefore the ability to predict ADRs prior to market introduction of a drug is desirable. Computational approaches applied to preclinical data could be one way to inform drug labeling and marketing with respect to potential ADRs. Based on the premise that some of the molecular actors of ADRs involve interactions that are detectable in large, and increasingly public, compound screening campaigns, we generated logistic regression models that correlate postmarketing ADRs with screening data from the PubChem BioAssay database. These models analyze ADRs at the level of organ systems, using the system organ classes (SOCs). Of the 19 SOCs under consideration, nine were found to be significantly correlated with preclinical screening data. With regard to six of the eight established drugs for which we could retropredict SOC-specific ADRs, prior knowledge was found that supports these predictions. We conclude this paper by predicting that SOC-specific ADRs will be associated with three unapproved or recently introduced drugs.

    View details for DOI 10.1038/clpt.2011.81

    View details for Web of Science ID 000291853800018

    View details for PubMedID 21613989

  • SmartSearch: automated recommendations using librarian expertise and the National Center for Biotechnology Information's Entrez Programming Utilities JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION Steinberg, R. M., Zwies, R., Yates, C., Stave, C., Pouliot, Y., Heilemann, H. A. 2010; 98 (2): 171-175

    View details for DOI 10.3163/1536-5050.98.2.012

    View details for Web of Science ID 000277447300012

    View details for PubMedID 20428285

  • Translational bioinformatics in the cloud: an affordable alternative. Genome medicine Dudley, J. T., Pouliot, Y., Chen, R., Morgan, A. A., Butte, A. J. 2010; 2 (8): 51-?

    Abstract

    With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine.

    View details for DOI 10.1186/gm172

    View details for PubMedID 20691073

  • A survey of orphan enzyme activities BMC BIOINFORMATICS Pouliot, Y., Karp, P. D. 2007; 8

    Abstract

    Using computational database searches, we have demonstrated previously that no gene sequences could be found for at least 36% of enzyme activities that have been assigned an Enzyme Commission number. Here we present a follow-up literature-based survey involving a statistically significant sample of such "orphan" activities. The survey was intended to determine whether sequences for these enzyme activities are truly unknown, or whether these sequences are absent from the public sequence databases but can be found in the literature.We demonstrate that for ~80% of sampled orphans, the absence of sequence data is bona fide. Our analyses further substantiate the notion that many of these enzyme activities play biologically important roles.This survey points toward significant scientific cost of having such a large fraction of characterized enzyme activities disconnected from sequence data. It also suggests that a larger effort, beginning with a comprehensive survey of all putative orphan activities, would resolve nearly 300 artifactual orphans and reconnect a wealth of enzyme research with modern genomics. For these reasons, we propose that a systematic effort to identify the cognate genes of orphan enzymes be undertaken.

    View details for DOI 10.1186/1471-2105-8-244

    View details for Web of Science ID 000248738900001

    View details for PubMedID 17623104

  • BioWarehouse: a bioinformatics database warehouse toolkit BMC BIOINFORMATICS Lee, T. J., Pouliot, Y., Wagner, V., Gupta, P., Stringer-Calvert, D. W., Tenenbaum, J. D., Karp, P. D. 2006; 7

    Abstract

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases.We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research.BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

    View details for DOI 10.1186/1471-2105-7-170

    View details for Web of Science ID 000236972500001

    View details for PubMedID 16556315

  • DIAN: A novel algorithm for genome ontological classification GENOME RESEARCH Pouliot, Y., Gao, J., Su, Q. J., Liu, G. Z., Ling, X. F. 2001; 11 (10): 1766-1779

    Abstract

    Faced with the determination of many completely sequenced genomes, computational biology is now faced with the challenge of interpreting the significance of these data sets. A multiplicity of data-related problems impedes this goal: Biological annotations associated with raw data are often not normalized, and the data themselves are often poorly interrelated and their interpretation unclear. All of these problems make interpretation of genomic databases increasingly difficult. With the current explosion of sequences now available from the human genome as well as from model organisms, the importance of sorting this vast amount of conceptually unstructured source data into a limited universe of genes, proteins, functions, structures, and pathways has become a bottleneck for the field. To address this problem, we have developed a method of interrelating data sources by applying a novel method of associating biological objects to ontologies. We have developed an intelligent knowledge-based algorithm, to support biological knowledge mapping, and, in particular, to facilitate the interpretation of genomic data. In this respect, the method makes it possible to inventory genomes by collapsing multiple types of annotations and normalizing them to various ontologies. By relying on a conceptual view of the genome, researchers can now easily navigate the human genome in a biologically intuitive, scientifically accurate manner.

    View details for Web of Science ID 000171456000019

    View details for PubMedID 11591654

  • DEVELOPMENTAL REGULATION OF M-CADHERIN IN THE TERMINAL DIFFERENTIATION OF SKELETAL MYOBLASTS DEVELOPMENTAL DYNAMICS Pouliot, Y., Gravel, M., Holland, P. C. 1994; 200 (4): 305-312

    Abstract

    Cadherins form a large family of membrane glycoproteins which mediate homophilic calcium-dependent cell adhesion. They are thought to mediate the initial calcium-dependent cell adhesion which precedes the plasma membrane fusion of skeletal myoblasts. Two cadherin subtypes are known to be expressed in mammalian skeletal myoblasts: muscle cadherin (M-cadherin) and neural cadherin (N-cadherin). In the present study we demonstrate that 1) the expression of M- and N-cadherin is differentially regulated during myoblast differentiation in vitro, 2) the expression of M-cadherin but not N-cadherin is inhibited by 5-bromo-2'-deoxyuridine (BUdR), an agent which selectively inhibits skeletal myoblast differentiation, and 3) fusion and differentiation-competent rat L6 myoblasts do not express detectable levels of N-cadherin mRNA. In vivo, M-cadherin mRNA was detectable exclusively in skeletal muscle. M-cadherin mRNA levels peaked during the secondary myogenic wave in rat hindlimb muscle, becoming barely detectable in 1-week-old and adult rats. These observations indicate that M-cadherin is unique in two ways: It is the first cadherin to be included in the family of skeletal muscle-specific genes, and it shows peak levels of expression in developing skeletal muscle tissue. Taken together, these results suggest that M-cadherin plays an important role in skeletal myogenesis.

    View details for Web of Science ID A1994PD77400004

    View details for PubMedID 7994077

  • PHYLOGENETIC ANALYSIS OF THE CADHERIN SUPERFAMILY BIOESSAYS Pouliot, Y. 1992; 14 (11): 743-748

    Abstract

    Cadherins are a multigene family of proteins which mediate homophilic calcium-dependent cell adhesion and are thought to play an important role in morphogenesis by mediating specific intercellular adhesion. Different lines of experimental evidence have recently indicated that the site responsible for mediating adhesive interactions is localized to the first extracellular domain of cadherin. Based upon an analysis of the sequence of this domain, I show that cadherins can be classified into three groups with distinct structural features. Furthermore, using this sequence information a phylogenetic tree relating the known cadherins was assembled. This is the first such tree to be published for the cadherins. One cadherin subtype, neural cadherin (N-cadherin), shows very little sequence divergence between species, whereas all other cadherin subtypes show more substantial divergence, suggesting that selective pressure upon this domain may be greater for N-cadherin than for other cadherins. Phylogenetic analysis also suggests that the gene duplications which established the main branches leading to the different cadherin subtypes occurred very early in their history. These duplications set the stage for the diversified superfamily we now observe.

    View details for Web of Science ID A1992KA82400003

    View details for PubMedID 1365887

  • EFFICIENT RECOGNITION OF IMMUNOGLOBULIN DOMAINS FROM AMINO-ACID-SEQUENCES USING A NEURAL NETWORK COMPUTER APPLICATIONS IN THE BIOSCIENCES Bengio, Y., Pouliot, Y. 1990; 6 (4): 319-324

    Abstract

    A neural network was trained using back propagation to recognize immunoglobulin domains from amino acid sequences. The program was designed to identify proteins exhibiting such domains with minimal rates of false positives and false negatives. The National Biomedical Research Foundation NEW protein sequences database was scanned to evaluate the performance of the program in recognizing mouse immunoglobulin sequences. The program correctly recognized 55 out of 56 mouse immunoglobulin sequences, corresponding to a recognition efficiency of 98.2% with an overall false positive rate of 7.3%. These data demonstrate that neural network-based search programs are well suited to search for sequences characterized by only a few well-conserved subsequences.

    View details for Web of Science ID A1990EG25500002

    View details for PubMedID 2257492

  • DEVELOPMENTAL REGULATION OF A-CADHERIN DURING THE DIFFERENTIATION OF SKELETAL MYOBLASTS DEVELOPMENTAL BIOLOGY Pouliot, Y., Holland, P. C., BLASCHUK, O. W. 1990; 141 (2): 292-298

    Abstract

    Cadherins are a family of integral membrane glycoproteins which mediate calcium-dependent intercellular adhesion in vertebrate species. Here we present evidence that fusion-competent rat L6 myoblasts express a cadherin (Mr 127 kDa). The levels of this cadherin were found to be developmentally regulated. Maximal levels were expressed prior to fusion. The increase in cadherin levels observed during differentiation was prevented by the differentiation inhibitor, 5-bromo-2'-deoxyuridine. L6 myoblasts grown in the presence of anti-cadherin antibodies exhibited an altered morphology in comparison to control cultures, coupled with decreased myoblast fusion. These data indicate that the developmental regulation of cadherin is part of the program of terminal differentiation of skeletal myoblasts, and that cadherins are involved in the process of myoblast fusion.

    View details for Web of Science ID A1990EB19300008

    View details for PubMedID 2210038

  • IDENTIFICATION OF A CADHERIN CELL-ADHESION RECOGNITION SEQUENCE DEVELOPMENTAL BIOLOGY BLASCHUK, O. W., Sullivan, R., David, S., Pouliot, Y. 1990; 139 (1): 227-229

    Abstract

    The molecular mechanisms by which the cadherins interact with one another to promote cell adhesion have not been elucidated. In particular, the amino acid sequences of the cadherin cell adhesion recognition sites have not been determined. Here we demonstrate that synthetic peptides containing the sequence HAV, which is common to all of the cadherins, inhibit two processes (compaction of eight-cell-stage mouse embryos and rat neurite outgrowth on astrocytes) that are known to be mediated by cadherins. The data suggest that the tripeptide HAV is a component of a cadherin cell adhesion recognition sequence.

    View details for Web of Science ID A1990DB02400018

    View details for PubMedID 2328837

  • IDENTIFICATION OF A CONSERVED REGION COMMON TO CADHERINS AND INFLUENZA STRAIN A HEMAGGLUTININS JOURNAL OF MOLECULAR BIOLOGY BLASCHUK, O. W., Pouliot, Y., Holland, P. C. 1990; 211 (4): 679-682

    Abstract

    Cadherins are a family of integral membrane glycoproteins that mediate homophilic, calcium-dependent cell adhesion in vertebrate species. The primary structures of six members of the cadherin family have recently been determined. The extracellular portion of these proteins is composed of five domains, the first of which is the most highly conserved among cadherins. Previous searches of protein sequence databases have revealed little or no sequence homology between cadherins and other proteins. Here we report that the first extracellular domain of cadherins exhibits substantial sequence homology with the amino termini of influenza strain A hemagglutinins. These regions of sequence homology have been shown to be functionally important in both cadherins and hemagglutinins. Our observations suggest that a functional domain of cadherins is conserved among other proteins.

    View details for Web of Science ID A1990CU14200003

    View details for PubMedID 2313692

  • EBV IG-LIKE DOMAINS NATURE Cashman, N. R., Pouliot, Y. 1990; 343 (6256): 319-319

    View details for Web of Science ID A1990CK91800037

    View details for PubMedID 2153934

  • DYSTROPHIN IS EXPRESSED IN MDX SKELETAL-MUSCLE FIBERS AFTER NORMAL MYOBLAST IMPLANTATION AMERICAN JOURNAL OF PATHOLOGY Karpati, G., Pouliot, Y., ZUBRZYCKAGAARN, E., Carpenter, S., Ray, P. N., WORTON, R. G., Holland, P. 1989; 135 (1): 27-32

    Abstract

    In mdx mice, the dystrophin gene of the X chromosome is defective and, as a result, immunoreactive dystrophin is undetectable in all muscle fibers of all animals of this highly inbred strain. This study showed that implantation of suspensions of clonal cultures of normal human myoblasts into different regions of quadriceps muscles of 6-to-10-day-old mdx mice or 60-day-old mdx mice (whose muscles have been crushed 4 days before implantation) results in the appearance of scattered fiber segments containing microscopically demonstrable immunoreactive dystrophin. In the animals that received the normal myoblast implantation in the prenecrotic stage of the disease (6 to 10 days of age), the dystrophin-positive fiber segments (demonstrated at ages 35, 45, and 60 days) escaped necrosis. This was determined by the absence of the characteristic chains of central nuclei, a reliable marker of prior necrosis in mdx muscle fibers. By heavy labeling of the nuclear DNA of the transplantable human myoblasts with H3-thymidine during culturing, and by sequential performance of an immunocytochemical staining for dystrophin and autoradiography on the same sections, some dystrophin-positive fiber segments were shown to contain radiolabeled myonuclei. It was concluded that nondystrophic myoblasts fused with host muscle fibers to form mosaic muscle fibers in which the normal dystrophin gene of the implanted myoblasts was expressed. This approach may be employed for the mitigation of the deleterious consequences of a gene defect in recessively inherited human muscle diseases such as Duchenne dystrophy.

    View details for Web of Science ID A1989AP09500005

    View details for PubMedID 2672825

  • EXPRESSION OF IMMUNOREACTIVE MAJOR HISTOCOMPATIBILITY COMPLEX PRODUCTS IN HUMAN SKELETAL-MUSCLES ANNALS OF NEUROLOGY Karpati, G., Pouliot, Y., Carpenter, S. 1988; 23 (1): 64-72

    Abstract

    Immunoreactive class 1 and class 2 major histocompatibility complex gene products (MHCP) and beta 2 microglobulin (beta 2 MG) were demonstrated by microscopic immunocytochemistry in cryostat sections of skeletal muscle biopsies of 67 patients with various neuromuscular diseases. Diagnoses included normal muscle, chronic partial denervation, Duchenne dystrophy, polymyositis, dermatomyositis, inclusion body myositis, and miscellaneous neuromuscular diseases. Normal mature muscle fibers did not express MHCP, but blood vessels showed both class 1 and 2 MHCP and beta 2 MG. Regenerating muscle fibers showed consistent sarcolemmal class 1 MHCP expression irrespective of the disease. In polymyositis, the majority of extrafusal muscle fibers of most patients showed strong sarcolemmal class 1 MHCP expression. In dermatomyositis, muscle fibers situated either in perifascicular or in randomly clustered distribution revealed strong class 1 MHCP reactivity. In inclusion body myositis, scattered small clusters of muscle fibers were positive for class 1 MHCP. In polymyositis and inclusion body myositis, particularly strong class 1 MHCP expression was invariably seen in nonnecrotic muscle fibers partially invaded by lymphocytes whose cytotoxic effects are believed to be class 1 MHCP restricted. Factors or agents that trigger class 1 MHCP expression are presumed also to sensitize lymphocytes to muscle fibers in these diseases, but their identity remains obscure at this time. In dermatomyositis, the expression of MHCP in perifascicular muscle fibers and in areas of capillary loss may represent the triggering of MHCP expression by a nonspecific cellular stress reaction, in this case probably low-grade ischemia.

    View details for Web of Science ID A1988L843200009

    View details for PubMedID 3278673

Stanford Medicine Resources: