Assistant Professor, Institute for Immunity, Transplantation and Infection (2014 - Present)
Findings from clinical and biological studies are often not reproducible when tested in independent cohorts. Due to the testing of a large number of hypotheses and relatively small sample sizes, results from whole-genome expression studies in particular are often not reproducible. Compared to single-study analysis, gene expression meta-analysis can improve reproducibility by integrating data from multiple studies. However, there are multiple choices in designing and carrying out a meta-analysis. Yet, clear guidelines on best practices are scarce. Here, we hypothesized that studying subsets of very large meta-analyses would allow for systematic identification of best practices to improve reproducibility. We therefore constructed three very large gene expression meta-analyses from clinical samples, and then examined meta-analyses of subsets of the datasets (all combinations of datasets with up to N/2 samples and K/2 datasets) compared to a 'silver standard' of differentially expressed genes found in the entire cohort. We tested three random-effects meta-analysis models using this procedure. We showed relatively greater reproducibility with more-stringent effect size thresholds with relaxed significance thresholds; relatively lower reproducibility when imposing extraneous constraints on residual heterogeneity; and an underestimation of actual false positive rate by Benjamini-Hochberg correction. In addition, multivariate regression showed that the accuracy of a meta-analysis increased significantly with more included datasets even when controlling for sample size.
View details for DOI 10.1093/nar/gkw797
View details for PubMedID 27634930
View details for PubMedCentralID PMC5224496
Improved diagnostics for acute infections could decrease morbidity and mortality by increasing early antibiotics for patients with bacterial infections and reducing unnecessary antibiotics for patients without bacterial infections. Several groups have used gene expression microarrays to build classifiers for acute infections, but these have been hampered by the size of the gene sets, use of overfit models, or lack of independent validation. We used multicohort analysis to derive a set of seven genes for robust discrimination of bacterial and viral infections, which we then validated in 30 independent cohorts. We next used our previously published 11-gene Sepsis MetaScore together with the new bacterial/viral classifier to build an integrated antibiotics decision model. In a pooled analysis of 1057 samples from 20 cohorts (excluding infants), the integrated antibiotics decision model had a sensitivity and specificity for bacterial infections of 94.0 and 59.8%, respectively (negative likelihood ratio, 0.10). Prospective clinical validation will be needed before these findings are implemented for patient care.
View details for DOI 10.1126/scitranslmed.aaf7165
View details for PubMedID 27384347
Active pulmonary tuberculosis is difficult to diagnose and treatment response is difficult to effectively monitor. A WHO consensus statement has called for new non-sputum diagnostics. The aim of this study was to use an integrated multicohort analysis of samples from publically available datasets to derive a diagnostic gene set in the peripheral blood of patients with active tuberculosis.We searched two public gene expression microarray repositories and retained datasets that examined clinical cohorts of active pulmonary tuberculosis infection in whole blood. We compared gene expression in patients with either latent tuberculosis or other diseases versus patients with active tuberculosis using our validated multicohort analysis framework. Three datasets were used as discovery datasets and meta-analytical methods were used to assess gene effects in these cohorts. We then validated the diagnostic capacity of the three gene set in the remaining 11 datasets.A total of 14 datasets containing 2572 samples from 10 countries from both adult and paediatric patients were included in the analysis. Of these, three datasets (N=1023) were used to discover a set of three genes (GBP5, DUSP3, and KLF2) that are highly diagnostic for active tuberculosis. We validated the diagnostic power of the three gene set to separate active tuberculosis from healthy controls (global area under the ROC curve (AUC) 0·90 [95% CI 0·85-0·95]), latent tuberculosis (0·88 [0·84-0·92]), and other diseases (0·84 [0·80-0·95]) in eight independent datasets composed of both children and adults from ten countries. Expression of the three-gene set was not confounded by HIV infection status, bacterial drug resistance, or BCG vaccination. Furthermore, in four additional cohorts, we showed that the tuberculosis score declined during treatment of patients with active tuberculosis.Overall, our integrated multicohort analysis yielded a three-gene set in whole blood that is robustly diagnostic for active tuberculosis, that was validated in multiple independent cohorts, and that has potential clinical application for diagnosis and monitoring treatment response. Prospective laboratory validation will be required before it can be used in a clinical setting.National Institute of Allergy and Infectious Diseases, National Library of Medicine, the Stanford Child Health Research Institute, the Society for University Surgeons, and the Bill and Melinda Gates Foundation.
View details for DOI 10.1016/S2213-2600(16)00048-5
View details for PubMedID 26907218
Respiratory viral infections are a significant burden to healthcare worldwide. Many whole genome expression profiles have identified different respiratory viral infection signatures, but these have not translated to clinical practice. Here, we performed two integrated, multi-cohort analyses of publicly available transcriptional data of viral infections. First, we identified a common host signature across different respiratory viral infections that could distinguish (1) individuals with viral infections from healthy controls and from those with bacterial infections, and (2) symptomatic from asymptomatic subjects prior to symptom onset in challenge studies. Second, we identified an influenza-specific host response signature that (1) could distinguish influenza-infected samples from those with bacterial and other respiratory viral infections, (2) was a diagnostic and prognostic marker in influenza-pneumonia patients and influenza challenge studies, and (3) was predictive of response to influenza vaccine. Our results have applications in the diagnosis, prognosis, and identification of drug targets in viral infections.
View details for DOI 10.1016/j.immuni.2015.11.003
View details for Web of Science ID 000366846600022
View details for PubMedID 26682989
View details for PubMedCentralID PMC4684904
Lung cancer remains the most common cause of cancer-related death worldwide and it continues to lack effective treatment. The increasingly large and diverse public databases of lung cancer gene expression constitute a rich source of candidate oncogenic drivers and therapeutic targets. To define novel targets for lung adenocarcinoma, we conducted a large-scale meta-analysis of genes specifically overexpressed in adenocarcinoma. We identified an 11-gene signature that was overexpressed consistently in adenocarcinoma specimens relative to normal lung tissue. Six genes in this signature were specifically overexpressed in adenocarcinoma relative to other subtypes of non-small cell lung cancer (NSCLC). Among these genes was the little studied protein tyrosine kinase PTK7. Immunohistochemical analysis confirmed that PTK7 is highly expressed in primary adenocarcinoma patient samples. RNA interference-mediated attenuation of PTK7 decreased cell viability and increased apoptosis in a subset of adenocarcinoma cell lines. Further, loss of PTK7 activated the MKK7-JNK stress response pathway and impaired tumor growth in xenotransplantation assays. Our work defines PTK7 as a highly and specifically expressed gene in adenocarcinoma and a potential therapeutic target in this subset of NSCLC. Cancer Res; 74(10); 2892-902. ©2014 AACR.
View details for DOI 10.1158/0008-5472.CAN-13-2775
View details for PubMedID 24654231
Using meta-analysis of eight independent transplant datasets (236 graft biopsy samples) from four organs, we identified a common rejection module (CRM) consisting of 11 genes that were significantly overexpressed in acute rejection (AR) across all transplanted organs. The CRM genes could diagnose AR with high specificity and sensitivity in three additional independent cohorts (794 samples). In another two independent cohorts (151 renal transplant biopsies), the CRM genes correlated with the extent of graft injury and predicted future injury to a graft using protocol biopsies. Inferred drug mechanisms from the literature suggested that two FDA-approved drugs (atorvastatin and dasatinib), approved for nontransplant indications, could regulate specific CRM genes and reduce the number of graft-infiltrating cells during AR. We treated mice with HLA-mismatched mouse cardiac transplant with atorvastatin and dasatinib and showed reduction of the CRM genes, significant reduction of graft-infiltrating cells, and extended graft survival. We further validated the beneficial effect of atorvastatin on graft survival by retrospective analysis of electronic medical records of a single-center cohort of 2,515 renal transplant patients followed for up to 22 yr. In conclusion, we identified a CRM in transplantation that provides new opportunities for diagnosis, drug repositioning, and rational drug design.
View details for DOI 10.1084/jem.20122709
View details for PubMedID 24127489
View details for PubMedCentralID PMC3804941
Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.
View details for DOI 10.1371/journal.pcbi.1002375
View details for Web of Science ID 000300729900019
View details for PubMedID 22383865
We describe cell type-specific significance analysis of microarrays (csSAM) for analyzing differential gene expression for each cell type in a biological sample from microarray data and relative cell-type frequencies. First, we validated csSAM with predesigned mixtures and then applied it to whole-blood gene expression datasets from stable post-transplant kidney transplant recipients and those experiencing acute transplant rejection, which revealed hundreds of differentially expressed genes that were otherwise undetectable.
View details for DOI 10.1038/NMETH.1439
View details for Web of Science ID 000276150600017
View details for PubMedID 20208531
A common challenge in the analysis of genomics data is trying to understand the underlying phenomenon in the context of all complex interactions taking place on various signaling pathways. A statistical approach using various models is universally used to identify the most relevant pathways in a given experiment. Here, we show that the existing pathway analysis methods fail to take into consideration important biological aspects and may provide incorrect results in certain situations. By using a systems biology approach, we developed an impact analysis that includes the classical statistics but also considers other crucial factors such as the magnitude of each gene's expression change, their type and position in the given pathways, their interactions, etc. The impact analysis is an attempt to a deeper level of statistical analysis, informed by more pathway-specific biology than the existing techniques. On several illustrative data sets, the classical analysis produces both false positives and false negatives, while the impact analysis provides biologically meaningful results. This analysis method has been implemented as a Web-based tool, Pathway-Express, freely available as part of the Onto-Tools (http://vortex.cs.wayne.edu).
View details for DOI 10.1101/gr.6202607
View details for Web of Science ID 000249869200015
View details for PubMedID 17785539
Findings from several studies support the conclusion that spermatozoa contain a complex repertoire of mRNAs. Even though these mRNAs are thought to provide an insight into past events of spermatogenesis, their complexity and function have yet to be established. Our aim was to determine whether we could use spermatozoal mRNAs to generate a genetic fingerprint of normal fertile men.We used a suite of microarrays containing 27016 unique expressed sequence tags (ESTs) to investigate cDNAs from a pool of 19 testes, cDNAs from a pool of nine individual ejaculate spermatozoal mRNAs, and cDNAs constructed from spermatozoal mRNAs from a single ejaculate. We also used ontological data mining to determine the function of the genes identified in each EST profile.The cDNAs from the testes, pooled ejaculate, and single ejaculate hybridised to 7157, 3281, and 2780 ESTs, respectively. The testicular population contained all of the ESTs identified by the cDNAs from the pooled and individual ejaculate. The pooled ejaculate population contained all but four ESTs identified from the individual ejaculate. A subset of the spermatozoal mRNAs was associated with embryo development.The microarray data from testes and spermatozoa (pooled and individual) were concordant, supporting the view that a spermatozoal mRNA fingerprint can be obtained from normal fertile men. Thus, profiling can be used to monitor past events-ie, gene expression of spermatogenesis. Moreover, the data suggest that, in addition to delivering the paternal genome, spermatozoa provide the zygote with a unique suite of paternal mRNAs. Ejaculate spermatozoa can now be used as a non-invasive proxy for investigations of testis-specific infertility.
View details for Web of Science ID 000177933000019
View details for PubMedID 12241836
Celiac disease (CeD) provides an opportunity to study autoimmunity and the transition in immune cells as dietary gluten induces small intestinal lesions.Seventy-three celiac disease patients on a long-term, gluten-free diet ingested a known amount of gluten daily for 6 weeks. A peripheral blood sample and intestinal biopsy specimens were taken before and 6 weeks after initiating the gluten challenge. Biopsy results were reported on a continuous numeric scale that measured the villus-height-to-crypt-depth ratio to quantify gluten-induced intestinal injury. Pooled B and T cells were isolated from whole blood, and RNA was analyzed by DNA microarray looking for changes in peripheral B- and T-cell gene expression that correlated with changes in villus height to crypt depth, as patients maintained a relatively healthy intestinal mucosa or deteriorated in the face of a gluten challenge.Gluten-dependent intestinal damage from baseline to 6 weeks varied widely across all patients, ranging from no change to extensive damage. Genes differentially expressed in B cells correlated strongly with the extent of intestinal damage. A relative increase in B-cell gene expression correlated with a lack of sensitivity to gluten whereas their relative decrease correlated with gluten-induced mucosal injury. A core B-cell gene module, representing a subset of B-cell genes analyzed, accounted for the correlation with intestinal injury.Genes comprising the core B-cell module showed a net increase in expression from baseline to 6 weeks in patients with little to no intestinal damage, suggesting that these individuals may have mounted a B-cell immune response to maintain mucosal homeostasis and circumvent inflammation. DNA microarray data were deposited at the GEO repository (accession number: GSE87629; available: https://www.ncbi.nlm.nih.gov/geo/).
View details for DOI 10.1016/j.jcmgh.2017.01.011
View details for PubMedID 28508029
Neonates are at increased risk for developing sepsis, but this population often exhibits ambiguous clinical signs that complicate the diagnosis of infection. No biomarker has yet shown enough diagnostic accuracy to rule out sepsis at the time of clinical suspicion.We show that a gene-expression-based signature is an accurate objective measure of the risk of sepsis in a neonate or preterm infant, and it substantially improves diagnostic accuracy over that of commonly used laboratory-based testing. Implementation might decrease inappropriate antibiotic use.Neonatal sepsis can have devastating consequences, but accurate diagnosis is difficult. As a result, up to 200 neonates with suspected sepsis are treated with empiric antibiotics for every 1 case of microbiologically confirmed sepsis. These unnecessary antibiotics enhance bacterial antibiotic resistance, increase economic costs, and alter gut microbiota composition. We recently reported an 11-gene diagnostic test for sepsis (Sepsis MetaScore) based on host whole-blood gene expression in children and adults, but this test has not been evaluated in neonates.We identified existing gene expression microarray-based cohorts of neonates with sepsis. We then tested the accuracy of the Sepsis MetaScore both alone and in combination with standard diagnostic laboratory tests in diagnosing sepsis.We found 3 cohorts with a total of 213 samples from control neonates and neonates with sepsis. The Sepsis MetaScore had an area under the receiver operating characteristic curve of 0.92-0.93 in all 3 cohorts. We also found that, as a diagnostic test for sepsis, it outperformed standard laboratory measurements alone and, when used in combination with another test(s), resulted in a significant net reclassification index (0.3-0.69) in 5 of 6 comparisons. The mean point estimates for sensitivity and specificity were 95% and 60%, respectively, which, if confirmed prospectively and applied in a high-risk cohort, could reduce inappropriate antibiotic usage substantially.The Sepsis MetaScore had excellent diagnostic accuracy across 3 separate cohorts of neonates from 3 different countries. Further prospective targeted study will be needed before clinical application.
View details for DOI 10.1093/jpids/pix021
View details for PubMedID 28419265
We describe a new library generation method, Machine-based Identification of Molecules Inside Characterized Space (MIMICS), that generates sets of molecules inspired by a text-based input. MIMICS-generated libraries were found to preserve distributions of properties while simultaneously increasing structural diversity. Newly identified MIMICS-generated compounds were found to be bioactive as inhibitors of specific components of the unfolded protein response (UPR) and the VEGFR2 pathway in cell-based assays, thus confirming the applicability of this methodology toward drug design applications. Wider application of MIMICS could facilitate the efficient utilization of chemical space.
View details for DOI 10.1021/acs.jcim.6b00754
View details for Web of Science ID 000400204900023
View details for PubMedID 28257191
Rationale The relevance of animal models to human diseases is an area of intense scientific debate. The degree to which mouse models of lung injury recapitulate human lung injury has never been assessed. Integrating data from both human and animal expression studies allows for increased statistical power and identification of conserved differential gene expression across organisms and conditions. Objectives Comprehensive integration of gene expression data in experimental ALI in rodents compared to humans. Methods We performed two separate gene expression multi-cohort analyses to determine differential gene expression in experimental animal and human lung injury. We used correlational and pathway analyses combined with external in vitro gene expression data to identify both potential drivers of underlying inflammation and therapeutic drug candidates. Main Results We identified 21 animal lung tissue datasets and 3 human lung injury BAL datasets. We show that the meta-signatures of animal and human experimental ALI are significantly correlated despite these widely varying experimental conditions. The gene expression changes among mice and rats across diverse injury models (ozone, VILI, LPS) are significantly correlated with human models of lung injury (Pearson r 0.33-0.45, P<1e-16). Neutrophil signatures are enriched in both animal and human lung injury. Predicted therapeutic targets, peptide ligand signatures, and pathway analyses are also all highly overlapping. Conclusions Gene expression changes are similar in animal and human experimental ALI, and provide several physiologic and therapeutic insights to the disease.
View details for DOI 10.1165/rcmb.2016-0395OC
View details for PubMedID 28324666
KRAS mutated tumours represent a large fraction of human cancers, but the vast majority remains refractory to current clinical therapies. Thus, a deeper understanding of the molecular mechanisms triggered by KRAS oncogene may yield alternative therapeutic strategies. Here we report the identification of a common transcriptional signature across mutant KRAS cancers of distinct tissue origin that includes the transcription factor FOSL1. High FOSL1 expression identifies mutant KRAS lung and pancreatic cancer patients with the worst survival outcome. Furthermore, FOSL1 genetic inhibition is detrimental to both KRAS-driven tumour types. Mechanistically, FOSL1 links the KRAS oncogene to components of the mitotic machinery, a pathway previously postulated to function orthogonally to oncogenic KRAS. FOSL1 targets include AURKA, whose inhibition impairs viability of mutant KRAS cells. Lastly, combination of AURKA and MEK inhibitors induces a deleterious effect on mutant KRAS cells. Our findings unveil KRAS downstream effectors that provide opportunities to treat KRAS-driven cancers.
View details for DOI 10.1038/ncomms14294
View details for PubMedID 28220783
View details for PubMedCentralID PMC5321758
In response to a need for better sepsis diagnostics, several new gene expression classifiers have been recently published, including the 11-gene "Sepsis MetaScore," the "FAIM3-to-PLAC8" ratio, and the Septicyte Lab. We performed a systematic search for publicly available gene expression data in sepsis and tested each gene expression classifier in all included datasets. We also created a public repository of sepsis gene expression data to encourage their future reuse.We searched National Institutes of Health Gene Expression Omnibus and EBI ArrayExpress for human gene expression microarray datasets. We also included the Glue Grant trauma gene expression cohorts.We selected clinical, time-matched, whole blood studies of sepsis and acute infections as compared to healthy and/or noninfectious inflammation patients. We identified 39 datasets composed of 3,241 samples from 2,604 patients.All data were renormalized from raw data, when available, using consistent methods.Mean validation areas under the receiver operating characteristic curve for discriminating septic patients from patients with noninfectious inflammation for the Sepsis MetaScore, the FAIM3-to-PLAC8 ratio, and the Septicyte Lab were 0.82 (range, 0.73-0.89), 0.78 (range, 0.49-0.96), and 0.73 (range, 0.44-0.90), respectively. Paired-sample t tests of validation datasets showed no significant differences in area under the receiver operating characteristic curves. Mean validation area under the receiver operating characteristic curves for discriminating infected patients from healthy controls for the Sepsis MetaScore, FAIM3-to-PLAC8 ratio, and Septicyte Lab were 0.97 (range, 0.85-1.0), 0.94 (range, 0.65-1.0), and 0.71 (range, 0.24-1.0), respectively. There were few significant differences in any diagnostics due to pathogen type.The three diagnostics do not show significant differences in overall ability to distinguish noninfectious systemic inflammatory response syndrome from sepsis, though the performance in some datasets was low (area under the receiver operating characteristic curve, < 0.7) for the FAIM3-to-PLAC8 ratio and Septicyte Lab. The Septicyte Lab also demonstrated significantly worse performance in discriminating infections as compared to healthy controls. Overall, public gene expression data are a useful tool for benchmarking gene expression diagnostics.
View details for DOI 10.1097/CCM.0000000000002021
View details for Web of Science ID 000390619000001
View details for PubMedID 27681387
Systemic sclerosis (SSc) is a rare autoimmune disease with the highest case-fatality rate of all connective tissue diseases. Current efforts to determine patient response to a given treatment using the modified Rodnan skin score (mRSS) are complicated by interclinician variability, confounding, and the time required between sequential mRSS measurements to observe meaningful change. There is an unmet critical need for an objective metric of SSc disease severity. Here, we performed an integrated, multicohort analysis of SSc transcriptome data across 7 datasets from 6 centers composed of 515 samples. Using 158 skin samples from SSc patients and healthy controls recruited at 2 centers as a discovery cohort, we identified a 415-gene expression signature specific for SSc, and validated its ability to distinguish SSc patients from healthy controls in an additional 357 skin samples from 5 independent cohorts. Next, we defined the SSc skin severity score (4S). In every SSc cohort of skin biopsy samples analyzed in our study, 4S correlated significantly with mRSS, allowing objective quantification of SSc disease severity. Using transcriptome data from the largest longitudinal trial of SSc patients to date, we showed that 4S allowed us to objectively monitor individual SSc patients over time, as (a) the change in 4S of a patient is significantly correlated with change in the mRSS, and (b) the change in 4S at 12 months of treatment could predict the change in mRSS at 24 months. Our results suggest that 4S could be used to distinguish treatment responders from nonresponders prior to mRSS change. Our results demonstrate the potential clinical utility of a novel robust molecular signature and a computational approach to SSc disease severity quantification.
View details for DOI 10.1172/jci.insight.89073
View details for PubMedID 28018971
View details for PubMedCentralID PMC5161207
Cell death and release of proinflammatory mediators contribute to mortality during sepsis. Specifically, caspase-11-dependent cell death contributes to pathology and decreases in survival time in sepsis models. Priming of the host cell, through TLR4 and interferon receptors, induces caspase-11 expression, and cytosolic LPS directly stimulates caspase-11 activation, promoting the release of proinflammatory cytokines through pyroptosis and caspase-1 activation. Using a CRISPR-Cas9-mediated genome-wide screen, we identified novel mediators of caspase-11-dependent cell death. We found a complement-related peptidase, carboxypeptidase B1 (Cpb1), to be required for caspase-11 gene expression and subsequent caspase-11-dependent cell death. Cpb1 modifies a cleavage product of C3, which binds to and activates C3aR, and then modulates innate immune signaling. We find the Cpb1-C3-C3aR pathway induces caspase-11 expression through amplification of MAPK activity downstream of TLR4 and Ifnar activation, and mediates severity of LPS-induced sepsis (endotoxemia) and disease outcome in mice. We show C3aR is required for up-regulation of caspase-11 orthologues, caspase-4 and -5, in primary human macrophages during inflammation and that c3aR1 and caspase-5 transcripts are highly expressed in patients with severe sepsis; thus, suggesting that these pathways are important in human sepsis. Our results highlight a novel role for complement and the Cpb1-C3-C3aR pathway in proinflammatory signaling, caspase-11 cell death, and sepsis severity.
View details for PubMedID 27697835
View details for PubMedCentralID PMC5068231
Pancreatic ductal adenocarcinoma (PDAC) is a lethal form of cancer with few therapeutic options. We found that levels of the lysine methyltransferase SMYD2 (SET and MYND domain 2) are elevated in PDAC and that genetic and pharmacological inhibition of SMYD2 restricts PDAC growth. We further identified the stress response kinase MAPKAPK3 (MK3) as a new physiologic substrate of SMYD2 in PDAC cells. Inhibition of MAPKAPK3 impedes PDAC growth, identifying a potential new kinase target in PDAC. Finally, we show that inhibition of SMYD2 cooperates with standard chemotherapy to treat PDAC cells and tumors. These findings uncover a pivotal role for SMYD2 in promoting pancreatic cancer.
View details for DOI 10.1101/gad.275529.115
View details for PubMedID 26988419
The utility of multi-cohort two-class meta-analysis to identify robust differentially expressed gene signatures has been well established. However, many biomedical applications, such as gene signatures of disease progression, require one-class analysis. Here we describe an R package, MetaCorrelator, that can identify a reproducible transcriptional signature that is correlated with a continuous disease phenotype across multiple datasets. We successfully applied this framework to extract a pattern of gene expression that can predict lung function in patients with chronic obstructive pulmonary disease (COPD) in both peripheral blood mononuclear cells (PBMCs) and tissue. Our results point to a disregulation in the oxidation state of the lungs of patients with COPD, as well as underscore the classically recognized inammatory state that underlies this disease.
View details for PubMedID 27896981
A major contributor to the scientific reproducibility crisis has been that the results from homogeneous, single-center studies do not generalize to heterogeneous, real world populations. Multi-cohort gene expression analysis has helped to increase reproducibility by aggregating data from diverse populations into a single analysis. To make the multi-cohort analysis process more feasible, we have assembled an analysis pipeline which implements rigorously studied meta-analysis best practices. We have compiled and made publicly available the results of our own multi-cohort gene expression analysis of 103 diseases, spanning 615 studies and 36,915 samples, through a novel and interactive web application. As a result, we have made both the process of and the results from multi-cohort gene expression analysis more approachable for non-technical users.
View details for PubMedID 27896970
View details for PubMedCentralID PMC5167529
The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.
View details for DOI 10.1093/jamia/ocv048
View details for PubMedID 26112029
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal human cancers and shows resistance to any therapeutic strategy used. Here we tested small-molecule inhibitors targeting chromatin regulators as possible therapeutic agents in PDAC. We show that JQ1, an inhibitor of the bromodomain and extraterminal (BET) family of proteins, suppresses PDAC development in mice by inhibiting both MYC activity and inflammatory signals. The histone deacetylase (HDAC) inhibitor SAHA synergizes with JQ1 to augment cell death and more potently suppress advanced PDAC. Finally, using a CRISPR-Cas9-based method for gene editing directly in the mouse adult pancreas, we show that de-repression of p57 (also known as KIP2 or CDKN1C) upon combined BET and HDAC inhibition is required for the induction of combination therapy-induced cell death in PDAC. SAHA is approved for human use, and molecules similar to JQ1 are being tested in clinical trials. Thus, these studies identify a promising epigenetic-based therapeutic strategy that may be rapidly implemented in fatal human tumors.
View details for DOI 10.1038/nm.3952
View details for PubMedID 26390243
View details for Web of Science ID 000346211801253
Deregulation of lysine methylation signalling has emerged as a common aetiological factor in cancer pathogenesis, with inhibitors of several histone lysine methyltransferases (KMTs) being developed as chemotherapeutics. The largely cytoplasmic KMT SMYD3 (SET and MYND domain containing protein 3) is overexpressed in numerous human tumours. However, the molecular mechanism by which SMYD3 regulates cancer pathways and its relationship to tumorigenesis in vivo are largely unknown. Here we show that methylation of MAP3K2 by SMYD3 increases MAP kinase signalling and promotes the formation of Ras-driven carcinomas. Using mouse models for pancreatic ductal adenocarcinoma and lung adenocarcinoma, we found that abrogating SMYD3 catalytic activity inhibits tumour development in response to oncogenic Ras. We used protein array technology to identify the MAP3K2 kinase as a target of SMYD3. In cancer cell lines, SMYD3-mediated methylation of MAP3K2 at lysine 260 potentiates activation of the Ras/Raf/MEK/ERK signalling module and SMYD3 depletion synergizes with a MEK inhibitor to block Ras-driven tumorigenesis. Finally, the PP2A phosphatase complex, a key negative regulator of the MAP kinase pathway, binds to MAP3K2 and this interaction is blocked by methylation. Together, our results elucidate a new role for lysine methylation in integrating cytoplasmic kinase-signalling cascades and establish a pivotal role for SMYD3 in the regulation of oncogenic Ras signalling.
View details for DOI 10.1038/nature13320
View details for PubMedID 24847881
We propose and discuss a method for doing gene expression meta-analysis (multiple datasets) across multiplex measurement modalities measuring the expression of many genes simultaneously (e.g. microarrays and RNAseq) using external control samples and a method of heterogeneity detection to identify and filter on comparable gene expression measurements. We demonstrate this approach on publicly available gene expression datasets from samples of medulloblastoma and normal cerebellar tissue and identify some potential new targets in the treatment of medulloblastoma.
View details for PubMedID 24297537
Neurodegenerative diseases share common pathologic features including neuroinflammation, mitochondrial dysfunction and protein aggregation, suggesting common underlying mechanisms of neurodegeneration. We undertook a meta-analysis of public gene expression data for neurodegenerative diseases to identify a common transcriptional signature of neurodegeneration.Using 1,270 post-mortem central nervous system tissue samples from 13 patient cohorts covering four neurodegenerative diseases, we identified 243 differentially expressed genes, which were similarly dysregulated in 15 additional patient cohorts of 205 samples including seven neurodegenerative diseases. This gene signature correlated with histologic disease severity. Metallothioneins featured prominently among differentially expressed genes, and functional pathway analysis identified specific convergent themes of dysregulation. MetaCore network analyses revealed various novel candidate hub genes (e.g. STAU2). Genes associated with M1-polarized macrophages and reactive astrocytes were strongly enriched in the meta-analysis data. Evaluation of genes enriched in neurons revealed 70 down-regulated genes, over half not previously associated with neurodegeneration. Comparison with aging brain data (3 patient cohorts, 221 samples) revealed 53 of these to be unique to neurodegenerative disease, many of which are strong candidates to be important in neuropathogenesis (e.g. NDN, NAP1L2). ENCODE ChIP-seq analysis predicted common upstream transcriptional regulators not associated with normal aging (REST, RBBP5, SIN3A, SP2, YY1, ZNF143, IKZF1). Finally, we removed genes common to neurodegeneration from disease-specific gene signatures, revealing uniquely robust immune response and JAK-STAT signaling in amyotrophic lateral sclerosis.Our results implicate pervasive bioenergetic deficits, M1-type microglial activation and gliosis as unifying themes of neurodegeneration, and identify numerous novel genes associated with neurodegenerative processes.
View details for DOI 10.1186/s40478-014-0093-y
View details for PubMedID 25187168
View details for PubMedCentralID PMC4167139
Small cell lung cancer (SCLC) is an aggressive neuroendocrine subtype of lung cancer with high mortality. We used a systematic drug repositioning bioinformatics approach querying a large compendium of gene expression profiles to identify candidate U.S. Food and Drug Administration (FDA)-approved drugs to treat SCLC. We found that tricyclic antidepressants and related molecules potently induce apoptosis in both chemonaïve and chemoresistant SCLC cells in culture, in mouse and human SCLC tumors transplanted into immunocompromised mice, and in endogenous tumors from a mouse model for human SCLC. The candidate drugs activate stress pathways and induce cell death in SCLC cells, at least in part by disrupting autocrine survival signals involving neurotransmitters and their G protein-coupled receptors. The candidate drugs inhibit the growth of other neuroendocrine tumors, including pancreatic neuroendocrine tumors and Merkel cell carcinoma. These experiments identify novel targeted strategies that can be rapidly evaluated in patients with neuroendocrine tumors through the repurposing of approved drugs.Our work shows the power of bioinformatics-based drug approaches to rapidly repurpose FDA-approved drugs and identifies a novel class of molecules to treat patients with SCLC, a cancer for which no effective novel systemic treatments have been identified in several decades. In addition, our experiments highlight the importance of novel autocrine mechanisms in promoting the growth of neuroendocrine tumor cells.
View details for DOI 10.1158/2159-8290.CD-13-0183
View details for Web of Science ID 000328257500023
View details for PubMedID 24078773
View details for PubMedCentralID PMC3864571
Cancer-associated fibroblasts (CAF) have been reported to support tumor progression by a variety of mechanisms. However, their role in the progression of non-small cell lung cancer (NSCLC) remains poorly defined. In addition, the extent to which specific proteins secreted by CAFs contribute directly to tumor growth is unclear. To study the role of CAFs in NSCLCs, a cross-species functional characterization of mouse and human lung CAFs was conducted. CAFs supported the growth of lung cancer cells in vivo by secretion of soluble factors that directly stimulate the growth of tumor cells. Gene expression analysis comparing normal mouse lung fibroblasts and mouse lung CAFs identified multiple genes that correlate with the CAF phenotype. A gene signature of secreted genes upregulated in CAFs was an independent marker of poor survival in patients with NSCLC. This secreted gene signature was upregulated in normal lung fibroblasts after long-term exposure to tumor cells, showing that lung fibroblasts are "educated" by tumor cells to acquire a CAF-like phenotype. Functional studies identified important roles for CLCF1-CNTFR and interleukin (IL)-6-IL-6R signaling in promoting growth of NSCLCs. This study identifies novel soluble factors contributing to the CAF protumorigenic phenotype in NSCLCs and suggests new avenues for the development of therapeutic strategies.
View details for DOI 10.1158/0008-5472.CAN-12-1097
View details for Web of Science ID 000311141300012
View details for PubMedID 22962265
Monitoring of renal graft status through peripheral blood (PB) rather than invasive biopsy is important as it will lessen the risk of infection and other stresses, while reducing the costs of rejection diagnosis. Blood gene biomarker panels were discovered by microarrays at a single center and subsequently validated and cross-validated by QPCR in the NIH SNSO1 randomized study from 12 US pediatric transplant programs. A total of 367 unique human PB samples, each paired with a graft biopsy for centralized, blinded phenotype classification, were analyzed (115 acute rejection (AR), 180 stable and 72 other causes of graft injury). Of the differentially expressed genes by microarray, Q-PCR analysis of a five gene-set (DUSP1, PBEF1, PSEN1, MAPK9 and NKTR) classified AR with high accuracy. A logistic regression model was built on independent training-set (n = 47) and validated on independent test-set (n = 198)samples, discriminating AR from STA with 91% sensitivity and 94% specificity and AR from all other non-AR phenotypes with 91% sensitivity and 90% specificity. The 5-gene set can diagnose AR potentially avoiding the need for invasive renal biopsy. These data support the conduct of a prospective study to validate the clinical predictive utility of this diagnostic tool.
View details for DOI 10.1111/j.1600-6143.2012.04253.x
View details for Web of Science ID 000309180000018
View details for PubMedID 23009139
Chronic allograft injury (CAI) results from a humoral response to mismatches in immunogenic epitopes between the donor and recipient. Although alloantibodies against HLA antigens contribute to the pathogenesis of CAI, alloantibodies against non-HLA antigens likely contribute as well. Here, we used high-density protein arrays to identify non-HLA antibodies in CAI and subsequently validated a subset in a cohort of 172 serum samples collected serially post-transplantation. There were 38 de novo non-HLA antibodies that significantly associated with the development of CAI (P<0.01) on protocol post-transplant biopsies, with enrichment of their corresponding antigens in the renal cortex. Baseline levels of preformed antibodies to MIG (also called CXCL9), ITAC (also called CXCL11), IFN-γ, and glial-derived neurotrophic factor positively correlated with histologic injury at 24 months. Measuring levels of these four antibodies could help clinicians predict the development of CAI with >80% sensitivity and 100% specificity. In conclusion, pretransplant serum levels of a defined panel of alloantibodies targeting non-HLA immunogenic antigens associate with histologic CAI in the post-transplant period. Validation in a larger, prospective transplant cohort may lead to a noninvasive method to predict and monitor for CAI.
View details for DOI 10.1681/ASN.2011060596
View details for Web of Science ID 000302333300022
View details for PubMedID 22302197
IgG commonly co-exists with IgA in the glomerular mesangium of patients with IgA nephropathy (IgAN) with unclear clinical relevance. Autoantibody (autoAb) biomarkers to detect and track progression of IgAN are an unmet clinical need. The objective of the study was to identify IgA-specific autoAbs specific to IgAN.High-density protein microarrays were evaluated IgG autoAbs in the serum of IgAN patients (n = 22) and controls (n = 10). Clinical parameters, including annual GFR and urine protein measurements, were collected on all patients over 5 years. Bioinformatic data analysis was performed to select targets for further validation by immunohistochemistry (IHC).One hundred seventeen (1.4%) specific antibodies were increased in IgAN. Among the most significant were the autoAb to the Ig family of proteins. IgAN-specific autoAbs (approximately 50%) were mounted against proteins predominantly expressed in glomeruli and tubules, and selected candidates were verified by IHC. Receiver operating characteristic analysis of our study demonstrated that IgG autoAb levels (matriline 2, ubiquitin-conjugating enzyme E2W, DEAD box protein, and protein kinase D1) might be used in combination with 24-hour proteinuria to improve prediction of the progression of IgAN (area under the curve = 0.86, P = 0.02).IgAN is associated with elevated IgG autoAbs to multiple proteins in the kidney. This first analysis of the repertoire of autoAbs in IgAN identifies novel, immunogenic protein targets that are highly expressed in the kidney glomerulus and tubules that may bear relevance in the pathogenesis and progression of IgAN.
View details for DOI 10.2215/CJN.04600511
View details for Web of Science ID 000297948900009
View details for PubMedID 22157707
View details for PubMedCentralID PMC3255376
The degree of progressive chronic histological damage is associated with long-term renal allograft survival. In order to identify promising molecular targets for timely intervention, we examined renal allograft protocol and indication biopsies from 120 low-risk pediatric and adolescent recipients by whole-genome microarray expression profiling. In data-driven analysis, we found a highly regulated pattern of adaptive and innate immune gene expression that correlated with established or ongoing histological chronic injury, and also with development of future chronic histological damage, even in histologically pristine kidneys. Hence, histologically unrecognized immunological injury at a molecular level sets the stage for the development of chronic tissue injury, while the same molecular response is accentuated during established and worsening chronic allograft damage. Irrespective of the hypothesized immune or nonimmune trigger for chronic allograft injury, a highly orchestrated regulation of innate and adaptive immune responses was found in the graft at the molecular level. This occurred months before histologic lesions appear, and quantitatively below the diagnostic threshold of classic T-cell or antibody-mediated rejection. Thus, measurement of specific immune gene expression in protocol biopsies may be warranted to predict the development of subsequent chronic injury in histologically quiescent grafts and as a means to titrate immunosuppressive therapy.
View details for DOI 10.1038/ki.2011.245
View details for Web of Science ID 000297541900014
View details for PubMedID 21881554
Technological advances in molecular and in silico research have enabled significant progress towards personalized transplantation medicine. It is now possible to conduct comprehensive biomarker development studies of transplant organ pathologies, correlating genomic, transcriptomic and proteomic information from donor and recipient with clinical and histological phenotypes. Translation of these advances to the clinical setting will allow assessment of an individual patient's risk of allograft damage or accommodation. Transplantation biomarkers are needed for active monitoring of immunosuppression, to reduce patient morbidity, and to improve long-term allograft function and life expectancy. Here, we highlight recent pre- and post-transplantation biomarkers of acute and chronic allograft damage or adaptation, focusing on peripheral blood-based methodologies for non-invasive application. We then critically discuss current findings with respect to their future application in routine clinical transplantation medicine. Complement-system-associated SNPs present potential biomarkers that may be used to indicate the baseline risk for allograft damage prior to transplantation. The detection of antibodies against novel, non-HLA, MICA antigens, and the expression of cytokine genes and proteins and cytotoxicity-related genes have been correlated with allograft damage and are potential post-transplantation biomarkers indicating allograft damage at the molecular level, although these do not have clinical relevance yet. Several multi-gene expression-based biomarker panels have been identified that accurately predicted graft accommodation in liver transplant recipients and may be developed into a predictive biomarker assay.
View details for DOI 10.1186/gm253
View details for PubMedID 21658299
Combining the results of studies using highly parallelized measurements of gene expression such as microarrays and RNAseq offer unique challenges in meta analysis. Motivated by a need for a deeper understanding of organ transplant rejection, we combine the data from five separate studies to compare acute rejection versus stability after solid organ transplantation, and use this data to examine approaches to multiplex meta analysis.We demonstrate that a commonly used parametric effect size estimate approach and a commonly used non-parametric method give very different results in prioritizing genes. The parametric method providing a meta effect estimate was superior at ranking genes based on our gold-standard of identifying immune response genes in the transplant rejection datasets.Different methods of multiplex analysis can give substantially different results. The method which is best for any given application will likely depend on the particular domain, and it remains for future work to see if any one method is consistently better at identifying important biological signal across gene expression experiments.
View details for DOI 10.1186/1471-2105-11-S9-S6
View details for Web of Science ID 000290218700006
View details for PubMedID 21044364
View details for PubMedCentralID PMC2967747
The gene expression changes produced by moderate hypothermia are not fully known, but appear to differ in important ways from those produced by heat shock. We examined the gene expression changes produced by moderate hypothermia and tested the hypothesis that rewarming after hypothermia approximates a heat-shock response. Six sets of human HepG2 hepatocytes were subjected to moderate hypothermia (31 degrees C for 16 h), a conventional in vitro heat shock (43 degrees C for 30 min) or control conditions (37 degrees C), then harvested immediately or allowed to recover for 3 h at 37 degrees C. Expression analysis was performed with Affymetrix U133A gene chips, using analysis of variance-based techniques. Moderate hypothermia led to distinct time-dependent expression changes, as did heat shock. Hypothermia initially caused statistically significant, greater than or equal to twofold changes in expression (relative to controls) of 409 sequences (143 increased and 266 decreased), whereas heat shock affected 71 (35 increased and 36 decreased). After 3 h of recovery, 192 sequences (83 increased, 109 decreased) were affected by hypothermia and 231 (146 increased, 85 decreased) by heat shock. Expression of many heat shock proteins was decreased by hypothermia but significantly increased after rewarming. A comparison of sequences affected by thermal stress without regard to the magnitude of change revealed that the overlap between heat and cold stress was greater after 3 h of recovery than immediately following thermal stress. Thus, while some overlap occurs (particularly after rewarming), moderate hypothermia produces extensive, time-dependent gene expression changes in HepG2 cells that differ in important ways from those induced by heat shock.
View details for DOI 10.1007/s12192-010-0181-2
View details for Web of Science ID 000280781800021
View details for PubMedID 20526826
View details for Web of Science ID 000275921701073
We have developed NetPath as a resource of curated human signaling pathways. As an initial step, NetPath provides detailed maps of a number of immune signaling pathways, which include approximately 1,600 reactions annotated from the literature and more than 2,800 instances of transcriptionally regulated genes - all linked to over 5,500 published articles. We anticipate NetPath to become a consolidated resource for human signaling pathways that should enable systems biology approaches.
View details for DOI 10.1186/gb-2010-11-1-r3
View details for Web of Science ID 000276433600011
View details for PubMedID 20067622
The correct interpretation of many molecular biology experiments depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are meant to act as repositories for our biological knowledge as we acquire and refine it. Hence, by definition, they are incomplete at any given time. In this paper, we describe a technique that improves our previous method for predicting novel GO annotations by extracting implicit semantic relationships between genes and functions. In this work, we use a vector space model and a number of weighting schemes in addition to our previous latent semantic indexing approach. The technique described here is able to take into consideration the hierarchical structure of the Gene Ontology (GO) and can weight differently GO terms situated at different depths. The prediction abilities of 15 different weighting schemes are compared and evaluated. Nine such schemes were previously used in other problem domains, while six of them are introduced in this paper. The best weighting scheme was a novel scheme, n2tn. Out of the top 50 functional annotations predicted using this weighting scheme, we found support in the literature for 84 percent of them, while 6 percent of the predictions were contradicted by the existing literature. For the remaining 10 percent, we did not find any relevant publications to confirm or contradict the predictions. The n2tn weighting scheme also outperformed the simple binary scheme used in our previous approach.
View details for DOI 10.1109/TCBB.2008.29
View details for Web of Science ID 000274063600008
View details for PubMedID 20150671
In the last decade, microarray technology has revolutionized biological research by allowing the screening of tens of thousands of genes simultaneously. This article reviews recent studies in organ transplantation using microarrays and highlights the issues that should be addressed in order to use microarrays in diagnosis of rejection.Microarrays have been useful in identifying potential biomarkers for chronic rejection in peripheral blood mononuclear cells, novel pathways for induction of tolerance, and genes involved in protecting the graft from the host immune system. Microarray analysis of peripheral blood mononuclear cells from chronic antibody-mediated rejection has identified potential noninvasive biomarkers. In a recent study, correlation of pathogenesis-based transcripts with histopathologic lesions is a promising step towards inclusion of microarrays in clinics for organ transplants.Despite promising results in diagnosis of histopathologic lesions using microarrays, the low dynamic range of microarrays and large measured expression changes within the probes for the same gene continue to cast doubts on their readiness for diagnosis of rejection. More studies must be performed to resolve these issues. Dominating expression of globin genes in whole blood poses another challenge for identification of noninvasive biomarkers. In addition, studies are also needed to demonstrate effects of different immunosuppression therapies and their outcomes.
View details for DOI 10.1097/MOT.0b013e32831e13d0
View details for Web of Science ID 000264312900007
View details for PubMedID 19337144
Gene expression class comparison studies may identify hundreds or thousands of genes as differentially expressed (DE) between sample groups. Gaining biological insight from the result of such experiments can be approached, for instance, by identifying the signaling pathways impacted by the observed changes. Most of the existing pathway analysis methods focus on either the number of DE genes observed in a given pathway (enrichment analysis methods), or on the correlation between the pathway genes and the class of the samples (functional class scoring methods). Both approaches treat the pathways as simple sets of genes, disregarding the complex gene interactions that these pathways are built to describe.We describe a novel signaling pathway impact analysis (SPIA) that combines the evidence obtained from the classical enrichment analysis with a novel type of evidence, which measures the actual perturbation on a given pathway under a given condition. A bootstrap procedure is used to assess the significance of the observed total pathway perturbation. Using simulations we show that the evidence derived from perturbations is independent of the pathway enrichment evidence. This allows us to calculate a global pathway significance P-value, which combines the enrichment and perturbation P-values. We illustrate the capabilities of the novel method on four real datasets. The results obtained on these data show that SPIA has better specificity and more sensitivity than several widely used pathway analysis methods.SPIA was implemented as an R package available at http://vortex.cs.wayne.edu/ontoexpress/
View details for DOI 10.1093/bioinformatics/btn577
View details for Web of Science ID 000261996400012
View details for PubMedID 18990722
View details for Web of Science ID 000263827202029
Onto-Tools is a freely available web-accessible software suite, composed of an annotation database and nine complementary data-mining tools. This article describes a new tool, Onto-Express-to-go (OE2GO), as well as some new features implemented in Pathway-Express and Onto-Miner over the past year. Pathway-Express (PE) has been enhanced to identify significantly perturbed pathways in a given condition using the differentially expressed genes in the input. OE2GO is a tool for functional profiling using custom annotations. The development of this tool was aimed at the researchers working with organisms for which annotations are not yet available in the public domain. OE2GO allows researchers to use either annotation data from the Onto-Tools database, or their own custom annotations. By removing the necessity to use any specific database, OE2GO makes the functional profiling available for all organisms, with annotations using any ontology. The Onto-Tools are freely available at http://vortex.cs.wayne.edu/projects.htm.
View details for DOI 10.1093/nar/gkm327
View details for Web of Science ID 000255311500039
View details for PubMedID 17584796
View details for Web of Science ID 000248516200030
View details for Web of Science ID 000252725900004
Annotation databases are widely used as public repositories of biological knowledge. However, most of these resources have been developed by independent groups which used different designs and different identifiers for the same biological entities. As we show in this article, incoherent name spaces between various databases represent a serious impediment to using the existing annotations at their full potential. Navigating between various such name spaces by mapping IDs from one database to another is a very important issue which is not properly addressed at the moment.We have developed a web-based resource, Onto-Translate (OT), which effectively addresses this problem. OT is able to map onto each other different types of biological entities from the following annotation databases: Swiss-Prot, TrEMBL, NREF, PIR, Gene Ontology, KEGG, Entrez Gene, GenBank, GenPept, IMAGE, RefSeq, UniGene, OMIM, PDB, Eukaryotic Promoter Database, HUGO Gene Nomenclature Committee and NetAffx. Currently, OT is able to perform 462 types of mappings between 29 different types of IDs from 17 databases concerning 53 organisms. Among these, over 300 types of translations and 15 types of IDs are not currently supported by any other tool or resource. On average, OT is able to correctly map between 96 and 99% of the biological entities provided as input. In terms of speed, sets of approximately 20 000 IDs can be translated in <30 s, in most cases.OT is a part of Onto-Tools, which is freely available at http://vortex.cs.wayne.edu/Projects.html
View details for DOI 10.1093/bioinofrmatics/btl372
View details for Web of Science ID 000242246300015
View details for PubMedID 17068090
The Onto-Tools suite is composed of an annotation database and eight complementary, web-accessible data mining tools: Onto-Express, Onto-Compare, Onto-Design, Onto-Translate, Onto-Miner, Pathway-Express, Promoter-Express and nsSNPCounter. Promoter-Express is a new tool added to the Onto-Tools ensemble that facilitates the identification of transcription factor binding sites active in specific conditions. nsSNPCounter is another new tool that allows computation and analysis of synonymous and non-synonymous codon substitutions for studying evolutionary rates of protein coding genes. Onto-Translate has also been enhanced to expand its scope and accuracy by fully utilizing the capabilities of the Onto-Tools database. Currently, Onto-Translate allows arbitrary mappings between 28 types of IDs for 53 organisms. Onto-Tools are freely available at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gkl213
View details for Web of Science ID 000245650200126
View details for PubMedID 16845086
DNA microarrays enable researchers to monitor the expression of thousands of genes simultaneously. However, the current technology has several limitations. Here we discuss problems related to the sensitivity, accuracy, specificity and reproducibility of microarray results. The existing data suggest that for relatively abundant transcripts the existence and direction (but not the magnitude) of expression changes can be reliably detected. However, accurate measurements of absolute expression levels and the reliable detection of low abundance genes are difficult to achieve. The main problems seem to be the sub-optimal design or choice of probes and some incorrect probe annotations. Well-designed data-analysis approaches can rectify some of these problems.
View details for DOI 10.1016/j.tig.2005.12.005
View details for Web of Science ID 000235576900009
View details for PubMedID 16380191
Independent of the platform and the analysis methods used, the result of a microarray experiment is, in most cases, a list of differentially expressed genes. An automatic ontological analysis approach has been recently proposed to help with the biological interpretation of such results. Currently, this approach is the de facto standard for the secondary analysis of high throughput experiments and a large number of tools have been developed for this purpose. We present a detailed comparison of 14 such tools using the following criteria: scope of the analysis, visualization capabilities, statistical model(s) used, correction for multiple comparisons, reference microarrays available, installation issues and sources of annotation data. This detailed analysis of the capabilities of these tools will help researchers choose the most appropriate tool for a given type of analysis. More importantly, in spite of the fact that this type of analysis has been generally adopted, this approach has several important intrinsic drawbacks. These drawbacks are associated with all tools discussed and represent conceptual limitations of the current state-of-the-art in ontological analysis. We propose these as challenges for the next generation of secondary data analysis tools.
View details for DOI 10.1093/bioinformatics/bti565
View details for Web of Science ID 000231694600001
View details for PubMedID 15994189
The correct interpretation of any biological experiment depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are ubiquitous and used by all life scientists in most experiments. However, it is well known that such databases are incomplete and many annotations may also be incorrect. In this paper we describe a technique that can be used to analyze the semantic content of such annotation databases. Our approach is able to extract implicit semantic relationships between genes and functions. This ability allows us to discover novel functions for known genes. This approach is able to identify missing and inaccurate annotations in existing annotation databases, and thus help improve their accuracy. We used our technique to analyze the current annotations of the human genome. From this body of annotations, we were able to predict 212 additional gene-function assignments. A subsequent literature search found that 138 of these gene-functions assignments are supported by existing peer-reviewed papers. An additional 23 assignments have been confirmed in the meantime by the addition of the respective annotations in later releases of the Gene Ontology database. Overall, the 161 confirmed assignments represent 75.95% of the proposed gene-function assignments. Only one of our predictions (0.4%) was contradicted by the existing literature. We could not find any relevant articles for 50 of our predictions (23.58%). The method is independent of the organism and can be used to analyze and improve the quality of the data of any public or private annotation database.
View details for DOI 10.1093/bioinformatics/bti538
View details for Web of Science ID 000231360600012
View details for PubMedID 15955782
The Onto-Tools suite is composed of an annotation database and six seamlessly integrated, web-accessible data mining tools: Onto-Express, Onto-Compare, Onto-Design, Onto-Translate, Onto-Miner and Pathway-Express. The Onto-Tools database has been expanded to include various types of data from 12 new databases. Our database now integrates different types of genomic data from 19 sequence, gene, protein and annotation databases. Additionally, our database is also expanded to include complete Gene Ontology (GO) annotations. Using the enhanced database and GO annotations, Onto-Express now allows functional profiling for 24 organisms and supports 17 different types of input IDs. Onto-Translate is also enhanced to fully utilize the capabilities of the new Onto-Tools database with an ultimate goal of providing the users with a non-redundant and complete mapping from any type of identification system to any other type. Currently, Onto-Translate allows arbitrary mappings between 29 types of IDs. Pathway-Express is a new tool that helps the users find the most interesting pathways for their input list of genes. Onto-Tools are freely available at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gki472
View details for Web of Science ID 000230271400156
View details for PubMedID 15980579
Sequences that are present in a given species or strain while absent from or different in any other organisms can be used to distinguish the target organism from other related or un-related species. Such DNA signatures are particularly important for the identification of genetic source of drug resistance of a strain or for the detection of organisms that can be used as biological agents in warfare or terrorism. Most approaches used to find DNA signatures are laboratory based, require a great deal of effort and can only distinguish between two organisms at a time. We propose a more efficient and cost-effective bioinformatics approach that allows identification of genomic fingerprints for a target organism. We validated our approach using a custom microarray, using sequences identified as DNA fingerprints of Bacillus anthracis. Hybridization results showed that the sequences found using our algorithm were truly unique to B. anthracis and were able to distinguish B. anthracis from its close relatives B. cereus and B. thuringiensis.
View details for Web of Science ID 000230169100021
View details for PubMedID 15759631
View details for Web of Science ID 000235518600028
The Onto-Tools suite is composed of an annotation database and five seamlessly integrated web-accessible data mining tools: Onto-Express (OE), Onto-Compare (OC), Onto-Design (OD), Onto-Translate (OT) and Onto-Miner (OM). OM is a new tool that provides a unified access point and an application programming interface for most annotations available. Our database has been enhanced with more than 120 new commercial microarrays and annotations for Rattus norvegicus, Drosophila melanogaster and Carnorhabditis elegans. The Onto-Tools have been redesigned to provide better biological insight, improved performance and user convenience. The new features implemented in OE include support for gene names, LocusLink IDs and Gene Ontology (GO) IDs, ability to specify fold changes for the input genes, links to the KEGG pathway database and detailed output files. OC allows comparisons of the functional bias of more than 170 commercial microarrays. The latest version of OD allows the user to specify keywords if the exact GO term is not known as well as providing more details than the previous version. OE, OC and OD now have an integrated GO browser that allows the user to customize the level of abstraction for each GO category. The Onto-Tools are available online at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gkh409
View details for Web of Science ID 000222273100090
View details for PubMedID 15215428
Onto-Tools is a set of four seamlessly integrated databases: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Onto-Express is able to automatically translate lists of genes found to be differentially regulated in a given condition into functional profiles characterizing the impact of the condition studied upon various biological processes and pathways. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function and chromosome location. Statistical significance values are calculated for each category. Once the initial exploratory analysis identified a number of relevant biological processes, specific mechanisms of interactions can be hypothesized for the conditions studied. Currently, many commercial arrays are available for the investigation of specific mechanisms. Each such array is characterized by a biological bias determined by the extent to which the genes present on the array represent specific pathways. Onto-Compare is a tool that allows efficient comparisons of any sets of commercial or custom arrays. Using Onto-Compare, a researcher can determine quickly which array, or set of arrays, covers best the hypotheses studied. In many situations, no commercial arrays are available for specific biological mechanisms. Onto-Design is a tool that allows the user to select genes that represent given functional categories. Onto-Translate allows the user to translate easily lists of accession numbers, UniGene clusters and Affymetrix probes into one another. All tools above are seamlessly integrated. The Onto-Tools are available online at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gkg624
View details for Web of Science ID 000183832900108
View details for PubMedID 12824416
Microarrays are at the center of a revolution in biotechnology, allowing researchers to screen tens of thousands of genes simultaneously. Typically, they have been used in exploratory research to help formulate hypotheses. In most cases, this phase is followed by a more focused, hypothesis-driven stage in which certain specific biological processes and pathways are thought to be involved. Since a single biological process can still involve hundreds of genes, microarrays are still the preferred approach as proven by the availability of focused arrays from several manufacturers. Because focused arrays from different manufacturers use different sets of genes, each array will represent any given regulatory pathway to a different extent. We argue that a functional analysis of the arrays available should be the most important criterion used in the array selection. We developed Onto-Compare as a database that can provide this functionality, based on the Gene Ontology Consortium nomenclature. We used this tool to compare several arrays focused on apoptosis, oncogenes, and tumor suppressors. We considered arrays from BD Biosciences Clontech, PerkinElmer, Sigma-Genosys, and SuperArray. We showed that among the oncogene arrays, the PerkinElmer MICROMAX oncogene microarray has a better representation of oncogenesis, protein phosphorylation, and negative control of cell proliferation. The comparison of the apoptosis arrays showed that most apoptosis-related biological processes are equally well represented on the arrays considered. However, functional categories such as immune response, cell-cell signaling, cell-surface receptor linked signal transduction, and interleukins are better represented on the Sigma-Genoys Panorama human apoptosis array. At the same time, processes such as cell cycle control, oncogenesis, and negative control of cell proliferation are better represented on the BD Biosciences Clontech Atlas Select human apoptosis array.
View details for Web of Science ID 000181595900009
View details for PubMedID 12664686
The typical result of a microarray experiment is a list of tens or hundreds of genes found to be differentially regulated in the condition under study. Independent of the methods used to select these genes, the common task faced by any researcher is to translate these lists of genes into a better understanding of the biological phenomena involved. Currently, this is done through a tedious combination of searches through the literature and a number of public databases. We developed Onto-Express (OE) as a novel tool able to automatically translate such lists of differentially regulated genes into functional profiles characterizing the impact of the condition studied. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function, and chromosome location. Statistical significance values are calculated for each category. We demonstrate the validity and the utility of this comprehensive global analysis of gene function by analyzing two breast cancer datasets from two separate laboratories. OE was able to identify correctly all biological processes postulated by the original authors, as well as discover novel relevant mechanisms.
View details for DOI 10.1016/S0888-7543(02)00021-6
View details for Web of Science ID 000181532700002
View details for PubMedID 12620386
Gene expression profiles obtained through microarray or data mining analyses often exist as vast data strings. To interpret the biology of these genetic profiles, investigators must analyze this data in the context of other information such as the biological, biochemical, or molecular function of the translated proteins. This is particularly challenging for a human analyst because large quantities of less than relevant data often bury such information. To address this need we implemented an automated routine, called Onto-Express (http://vortex.cs.wayne.edu:8080), to systematically translate genetic fingerprints into functional profiles. Using strings of accession or cluster identification numbers, Onto-Express searches the public databases and returns tables that correlate expression profiles with the cytogenetic locations, biochemical and molecular functions, biological processes, cellular components, and cellular roles of the translated proteins. The profiles created by Onto-Express fundamentally increase the value of gene expression analyses by facilitating the translation of quantitative value sets to records that contain biological implications.
View details for DOI 10.1006/geno.2002.6698
View details for Web of Science ID 000173628100016
View details for PubMedID 11829497