Assistant Professor, Institute for Immunity, Transplantation and Infection (2014 - Present)
In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of deconvolution method. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We find that immunoStates significantly reduces biological and technical biases. Importantly, we find that different methods have virtually no or minimal effect once the basis matrix is chosen. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy.
View details for DOI 10.1038/s41467-018-07242-6
View details for PubMedID 30413720
The spelling of author Qianting Yang was corrected; the affiliation of author Stephanus T. Malherbe was corrected; and graphs in Fig. 4b and c were corrected owing to reanalysis of the data into the correct timed intervals.
View details for DOI 10.1038/s41586-018-0635-8
View details for PubMedID 30377311
Most infections with Mycobacterium tuberculosis (Mtb) manifest as a clinically asymptomatic, contained state, known as latent tuberculosis infection, that affects approximately one-quarter of the global population1. Although fewer than one in ten individuals eventually progress to active disease2, tuberculosis is a leading cause of death from infectious disease worldwide3. Despite intense efforts, immune factors that influence the infection outcomes remain poorly defined. Here we used integrated analyses of multiple cohorts to identify stage-specific host responses to Mtb infection. First, using high-dimensional mass cytometry analyses and functional assays of a cohort of South African adolescents, we show that latent tuberculosis is associated with enhanced cytotoxic responses, which are mostly mediated by CD16 (also known as FcgammaRIIIa) and natural killer cells, and continuous inflammation coupled with immune deviations in both T and B cell compartments.Next, using cell-type deconvolution of transcriptomic data from several cohorts of different ages, genetic backgrounds, geographical locations and infection stages, we show that although deviations in peripheral B and T cell compartments generally start at latency, they are heterogeneous across cohorts. However, an increase in the abundance of circulating natural killer cells in tuberculosis latency, with a corresponding decrease during active disease and a return to baseline levels upon clinical cure are features that are common to all cohorts. Furthermore, by analysing three longitudinal cohorts, we find that changes inperipheral levels of natural killer cells can inform disease progression and treatment responses, and inversely correlate with the inflammatory state of the lungs of patients with active tuberculosis. Together, our findings offer crucial insights into the underlying pathophysiology of tuberculosis latency, and identify factors that may influence infection outcomes.
View details for DOI 10.1038/s41586-018-0439-x
View details for PubMedID 30135583
Influenza infects tens of millions of people every year in the USA. Other than notable risk groups, such as children and the elderly, it is difficult to predict what subpopulations are at higher risk of infection. Viral challenge studies, where healthy human volunteers are inoculated with live influenza virus, provide a unique opportunity to study infection susceptibility. Biomarkers predicting influenza susceptibility would be useful for identifying risk groups and designing vaccines.We applied cell mixture deconvolution to estimate immune cell proportions from whole blood transcriptome data in four independent influenza challenge studies. We compared immune cell proportions in the blood between symptomatic shedders and asymptomatic nonshedders across three discovery cohorts prior to influenza inoculation and tested results in a held-out validation challenge cohort.Natural killer (NK) cells were significantly lower in symptomatic shedders at baseline in both discovery and validation cohorts. Hematopoietic stem and progenitor cells (HSPCs) were higher in symptomatic shedders at baseline in discovery cohorts. Although the HSPCs were higher in symptomatic shedders in the validation cohort, the increase was statistically nonsignificant. We observed that a gene associated with NK cells, KLRD1, which encodes CD94, was expressed at lower levels in symptomatic shedders at baseline in discovery and validation cohorts. KLRD1 expression in the blood at baseline negatively correlated with influenza infection symptom severity. KLRD1 expression 8 h post-infection in the nasal epithelium from a rhinovirus challenge study also negatively correlated with symptom severity.We identified KLRD1-expressing NK cells as a potential biomarker for influenza susceptibility. Expression of KLRD1 was inversely correlated with symptom severity. Our results support a model where an early response by KLRD1-expressing NK cells may control influenza infection.
View details for DOI 10.1186/s13073-018-0554-1
View details for Web of Science ID 000435421500001
View details for PubMedID 29898768
View details for PubMedCentralID PMC6001128
Post-translational modifications of histone proteins and exchanges of histone variants of chromatin are central to the regulation of nearly all DNA-templated biological processes. However, the degree and variability of chromatin modifications in specific human immune cells remain largely unknown. Here, we employ a highly multiplexed mass cytometry analysis to profile the global levels of a broad array of chromatin modifications in primary human immune cells at the single-cell level. Our data reveal markedly different cell-type- and hematopoietic-lineage-specific chromatin modification patterns. Differential analysis between younger and older adults shows that aging is associated with increased heterogeneity between individuals and elevated cell-to-cell variability in chromatin modifications. Analysis of a twin cohort unveils heritability of chromatin modifications and demonstrates that aging-related chromatin alterations are predominantly driven by non-heritable influences. Together, we present a powerful platform for chromatin and immunology research. Our discoveries highlight the profound impacts of aging on chromatin modifications.
View details for DOI 10.1016/j.cell.2018.03.079
View details for PubMedID 29706550
Improved risk stratification and prognosis prediction in sepsis is a critical unmet need. Clinical severity scores and available assays such as blood lactate reflect global illness severity with suboptimal performance, and do not specifically reveal the underlying dysregulation of sepsis. Here, we present prognostic models for 30-day mortality generated independently by three scientific groups by using 12 discovery cohorts containing transcriptomic data collected from primarily community-onset sepsis patients. Predictive performance is validated in five cohorts of community-onset sepsis patients in which the models show summary AUROCs ranging from 0.765-0.89. Similar performance is observed in four cohorts of hospital-acquired sepsis. Combining the new gene-expression-based prognostic models with prior clinical severity scores leads to significant improvement in prediction of 30-day mortality as measured via AUROC and net reclassification improvement index These models provide an opportunity to develop molecular bedside tests that may improve risk stratification and mortality prediction in patients with sepsis.
View details for DOI 10.1038/s41467-018-03078-2
View details for Web of Science ID 000425088000007
View details for PubMedID 29449546
View details for PubMedCentralID PMC5814463
The immune system can mount T cell responses against tumors; however, the antigen specificities of tumor-infiltrating lymphocytes (TILs) are not well understood. We used yeast-display libraries of peptide-human leukocyte antigen (pHLA) to screen for antigens of "orphan" T cell receptors (TCRs) expressed on TILs from human colorectal adenocarcinoma. Four TIL-derived TCRs exhibited strong selection for peptides presented in a highly diverse pHLA-A∗02:01 library. Three of the TIL TCRs were specific for non-mutated self-antigens, two of which were present in separate patient tumors, and shared specificity for a non-mutated self-antigen derived from U2AF2. These results show that the exposed recognition surface of MHC-bound peptides accessible to the TCR contains sufficient structural information to enable the reconstruction of sequences of peptide targets for pathogenic TCRs of unknown specificity. This finding underscores the surprising specificity of TCRs for their cognate antigens and enables the facile indentification of tumor antigens through unbiased screening.
View details for DOI 10.1016/j.cell.2017.11.043
View details for Web of Science ID 000423447600014
View details for PubMedID 29275860
View details for PubMedCentralID PMC5786495
To find and validate generalizable sepsis subtypes using data-driven clustering.We used advanced informatics techniques to pool data from 14 bacterial sepsis transcriptomic datasets from eight different countries (n = 700).Retrospective analysis.Persons admitted to the hospital with bacterial sepsis.None.A unified clustering analysis across 14 discovery datasets revealed three subtypes, which, based on functional analysis, we termed "Inflammopathic, Adaptive, and Coagulopathic." We then validated these subtypes in nine independent datasets from five different countries (n = 600). In both discovery and validation data, the Adaptive subtype is associated with a lower clinical severity and lower mortality rate, and the Coagulopathic subtype is associated with higher mortality and clinical coagulopathy. Further, these clusters are statistically associated with clusters derived by others in independent single sepsis cohorts.The three sepsis subtypes may represent a unifying framework for understanding the molecular heterogeneity of the sepsis syndrome. Further study could potentially enable a precision medicine approach of matching novel immunomodulatory therapies with septic patients most likely to benefit.
View details for DOI 10.1097/CCM.0000000000003084
View details for PubMedID 29537985
View details for Web of Science ID 000434326500001
Findings from clinical and biological studies are often not reproducible when tested in independent cohorts. Due to the testing of a large number of hypotheses and relatively small sample sizes, results from whole-genome expression studies in particular are often not reproducible. Compared to single-study analysis, gene expression meta-analysis can improve reproducibility by integrating data from multiple studies. However, there are multiple choices in designing and carrying out a meta-analysis. Yet, clear guidelines on best practices are scarce. Here, we hypothesized that studying subsets of very large meta-analyses would allow for systematic identification of best practices to improve reproducibility. We therefore constructed three very large gene expression meta-analyses from clinical samples, and then examined meta-analyses of subsets of the datasets (all combinations of datasets with up to N/2 samples and K/2 datasets) compared to a 'silver standard' of differentially expressed genes found in the entire cohort. We tested three random-effects meta-analysis models using this procedure. We showed relatively greater reproducibility with more-stringent effect size thresholds with relaxed significance thresholds; relatively lower reproducibility when imposing extraneous constraints on residual heterogeneity; and an underestimation of actual false positive rate by Benjamini-Hochberg correction. In addition, multivariate regression showed that the accuracy of a meta-analysis increased significantly with more included datasets even when controlling for sample size.
View details for DOI 10.1093/nar/gkw797
View details for PubMedID 27634930
View details for PubMedCentralID PMC5224496
Improved diagnostics for acute infections could decrease morbidity and mortality by increasing early antibiotics for patients with bacterial infections and reducing unnecessary antibiotics for patients without bacterial infections. Several groups have used gene expression microarrays to build classifiers for acute infections, but these have been hampered by the size of the gene sets, use of overfit models, or lack of independent validation. We used multicohort analysis to derive a set of seven genes for robust discrimination of bacterial and viral infections, which we then validated in 30 independent cohorts. We next used our previously published 11-gene Sepsis MetaScore together with the new bacterial/viral classifier to build an integrated antibiotics decision model. In a pooled analysis of 1057 samples from 20 cohorts (excluding infants), the integrated antibiotics decision model had a sensitivity and specificity for bacterial infections of 94.0 and 59.8%, respectively (negative likelihood ratio, 0.10). Prospective clinical validation will be needed before these findings are implemented for patient care.
View details for DOI 10.1126/scitranslmed.aaf7165
View details for PubMedID 27384347
Active pulmonary tuberculosis is difficult to diagnose and treatment response is difficult to effectively monitor. A WHO consensus statement has called for new non-sputum diagnostics. The aim of this study was to use an integrated multicohort analysis of samples from publically available datasets to derive a diagnostic gene set in the peripheral blood of patients with active tuberculosis.We searched two public gene expression microarray repositories and retained datasets that examined clinical cohorts of active pulmonary tuberculosis infection in whole blood. We compared gene expression in patients with either latent tuberculosis or other diseases versus patients with active tuberculosis using our validated multicohort analysis framework. Three datasets were used as discovery datasets and meta-analytical methods were used to assess gene effects in these cohorts. We then validated the diagnostic capacity of the three gene set in the remaining 11 datasets.A total of 14 datasets containing 2572 samples from 10 countries from both adult and paediatric patients were included in the analysis. Of these, three datasets (N=1023) were used to discover a set of three genes (GBP5, DUSP3, and KLF2) that are highly diagnostic for active tuberculosis. We validated the diagnostic power of the three gene set to separate active tuberculosis from healthy controls (global area under the ROC curve (AUC) 0·90 [95% CI 0·85-0·95]), latent tuberculosis (0·88 [0·84-0·92]), and other diseases (0·84 [0·80-0·95]) in eight independent datasets composed of both children and adults from ten countries. Expression of the three-gene set was not confounded by HIV infection status, bacterial drug resistance, or BCG vaccination. Furthermore, in four additional cohorts, we showed that the tuberculosis score declined during treatment of patients with active tuberculosis.Overall, our integrated multicohort analysis yielded a three-gene set in whole blood that is robustly diagnostic for active tuberculosis, that was validated in multiple independent cohorts, and that has potential clinical application for diagnosis and monitoring treatment response. Prospective laboratory validation will be required before it can be used in a clinical setting.National Institute of Allergy and Infectious Diseases, National Library of Medicine, the Stanford Child Health Research Institute, the Society for University Surgeons, and the Bill and Melinda Gates Foundation.
View details for DOI 10.1016/S2213-2600(16)00048-5
View details for PubMedID 26907218
Respiratory viral infections are a significant burden to healthcare worldwide. Many whole genome expression profiles have identified different respiratory viral infection signatures, but these have not translated to clinical practice. Here, we performed two integrated, multi-cohort analyses of publicly available transcriptional data of viral infections. First, we identified a common host signature across different respiratory viral infections that could distinguish (1) individuals with viral infections from healthy controls and from those with bacterial infections, and (2) symptomatic from asymptomatic subjects prior to symptom onset in challenge studies. Second, we identified an influenza-specific host response signature that (1) could distinguish influenza-infected samples from those with bacterial and other respiratory viral infections, (2) was a diagnostic and prognostic marker in influenza-pneumonia patients and influenza challenge studies, and (3) was predictive of response to influenza vaccine. Our results have applications in the diagnosis, prognosis, and identification of drug targets in viral infections.
View details for DOI 10.1016/j.immuni.2015.11.003
View details for Web of Science ID 000366846600022
View details for PubMedID 26682989
View details for PubMedCentralID PMC4684904
Lung cancer remains the most common cause of cancer-related death worldwide and it continues to lack effective treatment. The increasingly large and diverse public databases of lung cancer gene expression constitute a rich source of candidate oncogenic drivers and therapeutic targets. To define novel targets for lung adenocarcinoma, we conducted a large-scale meta-analysis of genes specifically overexpressed in adenocarcinoma. We identified an 11-gene signature that was overexpressed consistently in adenocarcinoma specimens relative to normal lung tissue. Six genes in this signature were specifically overexpressed in adenocarcinoma relative to other subtypes of non-small cell lung cancer (NSCLC). Among these genes was the little studied protein tyrosine kinase PTK7. Immunohistochemical analysis confirmed that PTK7 is highly expressed in primary adenocarcinoma patient samples. RNA interference-mediated attenuation of PTK7 decreased cell viability and increased apoptosis in a subset of adenocarcinoma cell lines. Further, loss of PTK7 activated the MKK7-JNK stress response pathway and impaired tumor growth in xenotransplantation assays. Our work defines PTK7 as a highly and specifically expressed gene in adenocarcinoma and a potential therapeutic target in this subset of NSCLC. Cancer Res; 74(10); 2892-902. ©2014 AACR.
View details for DOI 10.1158/0008-5472.CAN-13-2775
View details for PubMedID 24654231
Using meta-analysis of eight independent transplant datasets (236 graft biopsy samples) from four organs, we identified a common rejection module (CRM) consisting of 11 genes that were significantly overexpressed in acute rejection (AR) across all transplanted organs. The CRM genes could diagnose AR with high specificity and sensitivity in three additional independent cohorts (794 samples). In another two independent cohorts (151 renal transplant biopsies), the CRM genes correlated with the extent of graft injury and predicted future injury to a graft using protocol biopsies. Inferred drug mechanisms from the literature suggested that two FDA-approved drugs (atorvastatin and dasatinib), approved for nontransplant indications, could regulate specific CRM genes and reduce the number of graft-infiltrating cells during AR. We treated mice with HLA-mismatched mouse cardiac transplant with atorvastatin and dasatinib and showed reduction of the CRM genes, significant reduction of graft-infiltrating cells, and extended graft survival. We further validated the beneficial effect of atorvastatin on graft survival by retrospective analysis of electronic medical records of a single-center cohort of 2,515 renal transplant patients followed for up to 22 yr. In conclusion, we identified a CRM in transplantation that provides new opportunities for diagnosis, drug repositioning, and rational drug design.
View details for DOI 10.1084/jem.20122709
View details for PubMedID 24127489
View details for PubMedCentralID PMC3804941
Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base-driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis.
View details for DOI 10.1371/journal.pcbi.1002375
View details for Web of Science ID 000300729900019
View details for PubMedID 22383865
We describe cell type-specific significance analysis of microarrays (csSAM) for analyzing differential gene expression for each cell type in a biological sample from microarray data and relative cell-type frequencies. First, we validated csSAM with predesigned mixtures and then applied it to whole-blood gene expression datasets from stable post-transplant kidney transplant recipients and those experiencing acute transplant rejection, which revealed hundreds of differentially expressed genes that were otherwise undetectable.
View details for DOI 10.1038/NMETH.1439
View details for Web of Science ID 000276150600017
View details for PubMedID 20208531
A common challenge in the analysis of genomics data is trying to understand the underlying phenomenon in the context of all complex interactions taking place on various signaling pathways. A statistical approach using various models is universally used to identify the most relevant pathways in a given experiment. Here, we show that the existing pathway analysis methods fail to take into consideration important biological aspects and may provide incorrect results in certain situations. By using a systems biology approach, we developed an impact analysis that includes the classical statistics but also considers other crucial factors such as the magnitude of each gene's expression change, their type and position in the given pathways, their interactions, etc. The impact analysis is an attempt to a deeper level of statistical analysis, informed by more pathway-specific biology than the existing techniques. On several illustrative data sets, the classical analysis produces both false positives and false negatives, while the impact analysis provides biologically meaningful results. This analysis method has been implemented as a Web-based tool, Pathway-Express, freely available as part of the Onto-Tools (http://vortex.cs.wayne.edu).
View details for DOI 10.1101/gr.6202607
View details for Web of Science ID 000249869200015
View details for PubMedID 17785539
Findings from several studies support the conclusion that spermatozoa contain a complex repertoire of mRNAs. Even though these mRNAs are thought to provide an insight into past events of spermatogenesis, their complexity and function have yet to be established. Our aim was to determine whether we could use spermatozoal mRNAs to generate a genetic fingerprint of normal fertile men.We used a suite of microarrays containing 27016 unique expressed sequence tags (ESTs) to investigate cDNAs from a pool of 19 testes, cDNAs from a pool of nine individual ejaculate spermatozoal mRNAs, and cDNAs constructed from spermatozoal mRNAs from a single ejaculate. We also used ontological data mining to determine the function of the genes identified in each EST profile.The cDNAs from the testes, pooled ejaculate, and single ejaculate hybridised to 7157, 3281, and 2780 ESTs, respectively. The testicular population contained all of the ESTs identified by the cDNAs from the pooled and individual ejaculate. The pooled ejaculate population contained all but four ESTs identified from the individual ejaculate. A subset of the spermatozoal mRNAs was associated with embryo development.The microarray data from testes and spermatozoa (pooled and individual) were concordant, supporting the view that a spermatozoal mRNA fingerprint can be obtained from normal fertile men. Thus, profiling can be used to monitor past events-ie, gene expression of spermatogenesis. Moreover, the data suggest that, in addition to delivering the paternal genome, spermatozoa provide the zygote with a unique suite of paternal mRNAs. Ejaculate spermatozoa can now be used as a non-invasive proxy for investigations of testis-specific infertility.
View details for Web of Science ID 000177933000019
View details for PubMedID 12241836
Systems immunology has the potential to offer invaluable insights into the development of the immune system. Two recent studies offer an in-depth view of both the dynamics of immune system development and the heritability of the levels of key immune modulators at birth.
View details for DOI 10.1186/s13073-018-0599-1
View details for PubMedID 30470248
RATIONALE: Pulmonary arterial hypertension (PAH) is characterized by progressive narrowing of pulmonary arteries resulting in right heart failure and death. Bone Morphogenetic Protein Receptor type-2 (BMPR2) mutations account for most familial PAH (FPAH) forms while reduced BMPR2 is present in many idiopathic PAH (IPAH) forms, suggesting dysfunctional BMPR2 signaling to be a key feature of PAH. Modulating BMPR2 signaling is therapeutically promising, yet how BMPR2 is downregulated in PAH is unclear.OBJECTIVES: We intended to identify and pharmaceutically target BMPR2 modifier genes to improve PAH.METHODS: We combined siRNA High Throughput Screening (HTS) of >20,000 genes with a multi-cohort analysis of publicly available PAH RNA expression data to identify clinically relevant BMPR2-modifiers. After confirming gene dysregulation in PAH patient tissue, we determined the functional roles of BMPR2-modifiers in vitro and tested the repurposed drug Enzastaurin for its propensity to improve experimental PH.MEASUREMENTS AND MAIN RESULTS: We discovered Fragile Histidine Triad (FHIT) as a novel BMPR2-modifier. BMPR2 and FHIT expression were reduced in PAH patients. FHIT reductions were associated with endothelial and smooth muscle cell dysfunction, rescued by Enzastaurin through a dual mechanism: upregulation of FHIT as well as miR17-5 repression. Fhit-/- mice had exaggerated hypoxic PH and failed to recover in normoxia. Enzastaurin reversed PH in the Sugen5416/Hypoxia/Normoxia rat model, by improving Right Ventricular Systolic Pressure (RVSP), RV hypertrophy, cardiac fibrosis and vascular remodeling.CONCLUSIONS: This study highlights the importance of the novel BMPR2 modifier FHIT in PH and the clinical value of the repurposed drug Enzastaurin as a potential novel therapeutic strategy to improve PAH.
View details for DOI 10.1164/rccm.201712-2553OC
View details for PubMedID 30107138
Pneumonia is a complex pulmonary disease in need of new clinical approaches. Although triggered by a pathogen, pneumonia often results from dysregulations of host defense that likely precede infection. The coordinated activities of immune resistance and tissue resilience then dictate whether and how pneumonia progresses or resolves. Inadequate or inappropriate host responses lead to more severe outcomes such as acute respiratory distress syndrome and to organ dysfunction beyond the lungs and over extended time frames after pathogen clearance, some of which increase the risk for subsequent pneumonia. Improved understanding of such host responses will guide the development of novel approaches for preventing and curing pneumonia and for mitigating the subsequent pulmonary and extrapulmonary complications of pneumonia. The NHLBI assembled a working group of extramural investigators to prioritize avenues of host-directed pneumonia research that should yield novel approaches for interrupting the cycle of unhealthy decline caused by pneumonia. This report summarizes the working group's specific recommendations in the areas of pneumonia susceptibility, host response, and consequences. Overarching goals include the development of more host-focused clinical approaches for preventing and treating pneumonia, the generation of predictive tools (for pneumonia occurrence, severity, and outcome), and the elucidation of mechanisms mediating immune resistance and tissue resilience in the lung. Specific areas of research are highlighted as especially promising for making advances against pneumonia.
View details for DOI 10.1164/rccm.201801-0139WS
View details for Web of Science ID 000438880000017
View details for PubMedID 29546996
Modifications of histone proteins are fundamental to the regulation of epigenetic phenotypes. Dysregulations of histone modifications have been linked to the pathogenesis of diverse human diseases. However, identifying differential histone modifications in patients with immune-mediated diseases has been challenging, in part due to the lack of a powerful analytic platform to study histone modifications in the complex human immune system. We recently developed a highly multiplexed platform, Epigenetic landscape profiling using cytometry by Time-Of-Flight (EpiTOF), to analyze the global levels of a broad array of histone modifications in single cells using mass cytometry. In this review, we summarize the development of EpiTOF and discuss its potential applications in biomedical research. We anticipate that this platform will provide new insights into the roles of epigenetic regulation in hematopoiesis, immune cell functions and immune system aging, and reveal aberrant epigenetic patterns associated with immune-mediated diseases.
View details for DOI 10.1016/j.clim.2018.06.009
View details for PubMedID 29960011
Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, the two building blocks of this analysis - the ontology and the annotations - evolve rapidly. We used gene signatures derived from 104 disease analyses to systematically evaluate how enrichment analysis results were affected by evolution of the GO over a decade. We found low consistency between enrichment analyses results obtained with early and more recent GO versions. Furthermore, there continues to be a strong annotation bias in the GO annotations where 58% of the annotations are for 16% of the human genes. Our analysis suggests that GO evolution may have affected the interpretation and possibly reproducibility of experiments over time. Hence, researchers must exercise caution when interpreting GO enrichment analyses and should reexamine previous analyses with the most recent GO version.
View details for DOI 10.1038/s41598-018-23395-2
View details for Web of Science ID 000428162800001
View details for PubMedID 29572502
View details for PubMedCentralID PMC5865181
We found tremendous inequality across gene and protein annotation resources. We observed that this bias leads biomedical researchers to focus on richly annotated genes instead of those with the strongest molecular data. We advocate that researchers reduce these biases by pursuing data-driven hypotheses.
View details for DOI 10.1038/s41598-018-19333-x
View details for Web of Science ID 000422912100045
View details for PubMedID 29358745
View details for PubMedCentralID PMC5778030
Activation of the unfolded protein response (UPR) signaling pathways is linked to multiple human diseases including cancer. The inositol-requiring kinase 1 (IRE1)-X-box binding protein 1 (XBP1) pathway is the most evolutionarily conserved of the three major signaling branches of the UPR. Here, we performed a genome-wide siRNA screen to obtain a systematic assessment of genes integrated in the IRE1-XBP1 axis. We monitored the expression of an XBP1-luciferase chimeric protein in which luciferase was fused in-frame with the spliced (active) form of XBP1. Using cells expressing this reporter construct, we identified 162 genes for which siRNA inhibition resulted in alteration in XBP1 splicing. These genes express diverse types of proteins modulating a wide range of cellular processes. Pathway analysis identified a set of genes implicated in the pathogenesis of breast cancer. Several genes including BCL10, GCLM, and IGF1R correlated with worse relapse-free survival (RFS) in an analysis of patients with triple negative breast cancer (TNBC). However, in this cohort of 1908 patients, only high GCLM expression correlated with worse RFS in both TNBC and non-TNBC patients. Altogether, our study revealed unidentified roles of novel pathways regulating the UPR and these findings may serve as a paradigm for exploring novel therapeutic opportunities based on modulating the UPR.Genome-wide RNAi screen identifies novel genes/pathways that modulate IRE1-XBP1 signaling in human tumor cells and leads to the development of improved therapeutic approaches targeting the UPR.
View details for DOI 10.1158/1541-7786.MCR-17-0307
View details for PubMedID 29440447
To identify a novel, generalizable diagnostic for acute respiratory distress syndrome using whole-blood gene expression arrays from multiple acute respiratory distress syndrome cohorts of varying etiologies.We performed a systematic search for human whole-blood gene expression arrays of acute respiratory distress syndrome in National Institutes of Health Gene Expression Omnibus and ArrayExpress. We also included the Glue Grant gene expression cohorts.We included investigator-defined acute respiratory distress syndrome within 48 hours of diagnosis and compared these with relevant critically ill controls.We used multicohort analysis of gene expression to identify genes significantly associated with acute respiratory distress syndrome, both with and without adjustment for clinical severity score. We performed gene ontology enrichment using Database for Annotation, Visualization and Integrated Discovery and cell type enrichment tests for both immune cells and pneumocyte gene expression. Finally, we selected a gene set optimized for diagnostic power across the datasets and used leave-one-dataset-out cross validation to assess robustness of the model.We identified datasets from three adult cohorts with sepsis, one pediatric cohort with acute respiratory failure, and two datasets of adult patients with trauma and burns, for a total of 148 acute respiratory distress syndrome cases and 268 critically ill controls. We identified 30 genes that were significantly associated with acute respiratory distress syndrome (false discovery rate < 20% and effect size >1.3), many of which had been previously associated with sepsis. When metaregression was used to adjust for clinical severity scores, none of these genes remained significant. Cell type enrichment was notable for bands and neutrophils, suggesting that the gene expression signature is one of acute inflammation rather than lung injury per se. Finally, an attempt to develop a generalizable diagnostic gene set for acute respiratory distress syndrome showed a mean area under the receiver-operating characteristic curve of only 0.63 on leave-one-dataset-out cross validation.The whole-blood gene expression signature across a wide clinical spectrum of acute respiratory distress syndrome is likely confounded by systemic inflammation, limiting the utility of whole-blood gene expression studies for uncovering a generalizable diagnostic gene signature.
View details for DOI 10.1097/CCM.0000000000002839
View details for PubMedID 29337789
View details for PubMedCentralID PMC5774019
Late allograft failure is characterized by cumulative subclinical insults manifesting over many years. Although immunomodulatory therapies targeting host T cells have improved short-term survival rates, rates of chronic allograft loss remain high. We hypothesized that other immune cell types may drive subclinical injury, ultimately leading to graft failure. We collected whole-genome transcriptome profiles from 15 independent cohorts composed of 1,697 biopsy samples to assess the association of an inflammatory macrophage polarization-specific gene signature with subclinical injury. We applied penalized regression to a subset of the data sets and identified a 3-gene inflammatory macrophage-derived signature. We validated discriminatory power of the 3-gene signature in 3 independent renal transplant data sets with mean AUC of 0.91. In a longitudinal cohort, the 3-gene signature strongly correlated with extent of injury and accurately predicted progression of subclinical injury 18 months before clinical manifestation. The 3-gene signature also stratified patients at high risk of graft failure as soon as 15 days after biopsy. We found that the 3-gene signature also distinguished acute rejection (AR) accurately in 3 heart transplant data sets but not in lung transplant. Overall, we identified a parsimonious signature capable of diagnosing AR, recognizing subclinical injury, and risk-stratifying renal transplant patients. Our results strongly suggest that inflammatory macrophages may be a viable therapeutic target to improve long-term outcomes for organ transplantation patients.
View details for DOI 10.1172/jci.insight.95659
View details for PubMedID 29367465
Recent transcriptomic studies describe two subgroups of adults with sepsis differentiated by a sepsis response signature. The implied biology and related clinical associations are comparable with recently reported pediatric sepsis endotypes, labeled "A" and "B." We classified adults with sepsis using the pediatric endotyping strategy and the sepsis response signature and determined how endotype assignment, sepsis response signature membership, and age interact with respect to mortality.Retrospective analysis of publically available transcriptomic data representing critically ill adults with sepsis from which the sepsis response signature groups were derived and validated.Multiple ICUs.Adults with sepsis.None.Transcriptomic data were conormalized into a single dataset yielding 549 unique cases with sepsis response signature assignments. Each subject was assigned to endotype A or B using the expression data for the 100 endotyping genes. There were 163 subjects (30%) assigned to endotype A and 386 to endotype B. There was a weak, positive correlation between endotype assignment and sepsis response signature membership. Mortality rates were similar between patients assigned endotype A and those assigned endotype B. A multivariable logistic regression model fit to endotype assignment, sepsis response signature membership, age, and the respective two-way interactions revealed that endotype A, sepsis response signature 1 membership, older age, and the interactions between them were associated with mortality. Subjects coassigned to endotype A, and sepsis response signature 1 had the highest mortality.Combining the pediatric endotyping strategy with sepsis response signature membership might provide complementary, age-dependent, biological, and prognostic information.
View details for DOI 10.1097/CCM.0000000000002733
View details for Web of Science ID 000416235200011
View details for PubMedID 28991828
View details for PubMedCentralID PMC5693699
View details for Web of Science ID 000411824103204
Sepsis is a major cause of morbidity and mortality, especially at the extremes of age. To understand the human age-specific transcriptomic response to sepsis, a multi-cohort, pooled analysis was conducted on adults, children, infants, and neonates with and without sepsis. Nine public whole-blood gene expression datasets (636 patients) were employed. Age impacted the transcriptomic host response to sepsis. Gene expression from septic neonates and adults was more dissimilar whereas infants and children were more similar. Neonates showed reductions in inflammatory recognition and signaling pathways compared to all other age groups. Likewise, adults demonstrated decreased pathogen sensing, inflammation, and myeloid cell function, as compared to children. This may help to explain the increased incidence of sepsis-related organ failure and death in adults. The number of dysregulated genes in septic patients was proportional to age and significantly differed among septic adults, children, infants, and neonates. Overall, children manifested a greater transcriptomic intensity to sepsis as compared to the other age groups. The transcriptomic magnitude for adults and neonates was dramatically reduced as compared to children and infants. These findings suggest that the transcriptomic response to sepsis is age-dependent, and diagnostic and therapeutic efforts to identify and treat sepsis will have to consider age as an important variable.
View details for DOI 10.1371/journal.pone.0184159
View details for Web of Science ID 000410001100071
View details for PubMedID 28886074
View details for PubMedCentralID PMC5590890
Celiac disease (CeD) provides an opportunity to study autoimmunity and the transition in immune cells as dietary gluten induces small intestinal lesions.Seventy-three celiac disease patients on a long-term, gluten-free diet ingested a known amount of gluten daily for 6 weeks. A peripheral blood sample and intestinal biopsy specimens were taken before and 6 weeks after initiating the gluten challenge. Biopsy results were reported on a continuous numeric scale that measured the villus-height-to-crypt-depth ratio to quantify gluten-induced intestinal injury. Pooled B and T cells were isolated from whole blood, and RNA was analyzed by DNA microarray looking for changes in peripheral B- and T-cell gene expression that correlated with changes in villus height to crypt depth, as patients maintained a relatively healthy intestinal mucosa or deteriorated in the face of a gluten challenge.Gluten-dependent intestinal damage from baseline to 6 weeks varied widely across all patients, ranging from no change to extensive damage. Genes differentially expressed in B cells correlated strongly with the extent of intestinal damage. A relative increase in B-cell gene expression correlated with a lack of sensitivity to gluten whereas their relative decrease correlated with gluten-induced mucosal injury. A core B-cell gene module, representing a subset of B-cell genes analyzed, accounted for the correlation with intestinal injury.Genes comprising the core B-cell module showed a net increase in expression from baseline to 6 weeks in patients with little to no intestinal damage, suggesting that these individuals may have mounted a B-cell immune response to maintain mucosal homeostasis and circumvent inflammation. DNA microarray data were deposited at the GEO repository (accession number: GSE87629; available: https://www.ncbi.nlm.nih.gov/geo/).
View details for DOI 10.1016/j.jcmgh.2017.01.011
View details for PubMedID 28508029
Neonates are at increased risk for developing sepsis, but this population often exhibits ambiguous clinical signs that complicate the diagnosis of infection. No biomarker has yet shown enough diagnostic accuracy to rule out sepsis at the time of clinical suspicion.We show that a gene-expression-based signature is an accurate objective measure of the risk of sepsis in a neonate or preterm infant, and it substantially improves diagnostic accuracy over that of commonly used laboratory-based testing. Implementation might decrease inappropriate antibiotic use.Neonatal sepsis can have devastating consequences, but accurate diagnosis is difficult. As a result, up to 200 neonates with suspected sepsis are treated with empiric antibiotics for every 1 case of microbiologically confirmed sepsis. These unnecessary antibiotics enhance bacterial antibiotic resistance, increase economic costs, and alter gut microbiota composition. We recently reported an 11-gene diagnostic test for sepsis (Sepsis MetaScore) based on host whole-blood gene expression in children and adults, but this test has not been evaluated in neonates.We identified existing gene expression microarray-based cohorts of neonates with sepsis. We then tested the accuracy of the Sepsis MetaScore both alone and in combination with standard diagnostic laboratory tests in diagnosing sepsis.We found 3 cohorts with a total of 213 samples from control neonates and neonates with sepsis. The Sepsis MetaScore had an area under the receiver operating characteristic curve of 0.92-0.93 in all 3 cohorts. We also found that, as a diagnostic test for sepsis, it outperformed standard laboratory measurements alone and, when used in combination with another test(s), resulted in a significant net reclassification index (0.3-0.69) in 5 of 6 comparisons. The mean point estimates for sensitivity and specificity were 95% and 60%, respectively, which, if confirmed prospectively and applied in a high-risk cohort, could reduce inappropriate antibiotic usage substantially.The Sepsis MetaScore had excellent diagnostic accuracy across 3 separate cohorts of neonates from 3 different countries. Further prospective targeted study will be needed before clinical application.
View details for DOI 10.1093/jpids/pix021
View details for PubMedID 28419265
We describe a new library generation method, Machine-based Identification of Molecules Inside Characterized Space (MIMICS), that generates sets of molecules inspired by a text-based input. MIMICS-generated libraries were found to preserve distributions of properties while simultaneously increasing structural diversity. Newly identified MIMICS-generated compounds were found to be bioactive as inhibitors of specific components of the unfolded protein response (UPR) and the VEGFR2 pathway in cell-based assays, thus confirming the applicability of this methodology toward drug design applications. Wider application of MIMICS could facilitate the efficient utilization of chemical space.
View details for DOI 10.1021/acs.jcim.6b00754
View details for Web of Science ID 000400204900023
View details for PubMedID 28257191
Rationale The relevance of animal models to human diseases is an area of intense scientific debate. The degree to which mouse models of lung injury recapitulate human lung injury has never been assessed. Integrating data from both human and animal expression studies allows for increased statistical power and identification of conserved differential gene expression across organisms and conditions. Objectives Comprehensive integration of gene expression data in experimental ALI in rodents compared to humans. Methods We performed two separate gene expression multi-cohort analyses to determine differential gene expression in experimental animal and human lung injury. We used correlational and pathway analyses combined with external in vitro gene expression data to identify both potential drivers of underlying inflammation and therapeutic drug candidates. Main Results We identified 21 animal lung tissue datasets and 3 human lung injury BAL datasets. We show that the meta-signatures of animal and human experimental ALI are significantly correlated despite these widely varying experimental conditions. The gene expression changes among mice and rats across diverse injury models (ozone, VILI, LPS) are significantly correlated with human models of lung injury (Pearson r 0.33-0.45, P<1e-16). Neutrophil signatures are enriched in both animal and human lung injury. Predicted therapeutic targets, peptide ligand signatures, and pathway analyses are also all highly overlapping. Conclusions Gene expression changes are similar in animal and human experimental ALI, and provide several physiologic and therapeutic insights to the disease.
View details for DOI 10.1165/rcmb.2016-0395OC
View details for PubMedID 28324666
KRAS mutated tumours represent a large fraction of human cancers, but the vast majority remains refractory to current clinical therapies. Thus, a deeper understanding of the molecular mechanisms triggered by KRAS oncogene may yield alternative therapeutic strategies. Here we report the identification of a common transcriptional signature across mutant KRAS cancers of distinct tissue origin that includes the transcription factor FOSL1. High FOSL1 expression identifies mutant KRAS lung and pancreatic cancer patients with the worst survival outcome. Furthermore, FOSL1 genetic inhibition is detrimental to both KRAS-driven tumour types. Mechanistically, FOSL1 links the KRAS oncogene to components of the mitotic machinery, a pathway previously postulated to function orthogonally to oncogenic KRAS. FOSL1 targets include AURKA, whose inhibition impairs viability of mutant KRAS cells. Lastly, combination of AURKA and MEK inhibitors induces a deleterious effect on mutant KRAS cells. Our findings unveil KRAS downstream effectors that provide opportunities to treat KRAS-driven cancers.
View details for DOI 10.1038/ncomms14294
View details for PubMedID 28220783
View details for PubMedCentralID PMC5321758
In response to a need for better sepsis diagnostics, several new gene expression classifiers have been recently published, including the 11-gene "Sepsis MetaScore," the "FAIM3-to-PLAC8" ratio, and the Septicyte Lab. We performed a systematic search for publicly available gene expression data in sepsis and tested each gene expression classifier in all included datasets. We also created a public repository of sepsis gene expression data to encourage their future reuse.We searched National Institutes of Health Gene Expression Omnibus and EBI ArrayExpress for human gene expression microarray datasets. We also included the Glue Grant trauma gene expression cohorts.We selected clinical, time-matched, whole blood studies of sepsis and acute infections as compared to healthy and/or noninfectious inflammation patients. We identified 39 datasets composed of 3,241 samples from 2,604 patients.All data were renormalized from raw data, when available, using consistent methods.Mean validation areas under the receiver operating characteristic curve for discriminating septic patients from patients with noninfectious inflammation for the Sepsis MetaScore, the FAIM3-to-PLAC8 ratio, and the Septicyte Lab were 0.82 (range, 0.73-0.89), 0.78 (range, 0.49-0.96), and 0.73 (range, 0.44-0.90), respectively. Paired-sample t tests of validation datasets showed no significant differences in area under the receiver operating characteristic curves. Mean validation area under the receiver operating characteristic curves for discriminating infected patients from healthy controls for the Sepsis MetaScore, FAIM3-to-PLAC8 ratio, and Septicyte Lab were 0.97 (range, 0.85-1.0), 0.94 (range, 0.65-1.0), and 0.71 (range, 0.24-1.0), respectively. There were few significant differences in any diagnostics due to pathogen type.The three diagnostics do not show significant differences in overall ability to distinguish noninfectious systemic inflammatory response syndrome from sepsis, though the performance in some datasets was low (area under the receiver operating characteristic curve, < 0.7) for the FAIM3-to-PLAC8 ratio and Septicyte Lab. The Septicyte Lab also demonstrated significantly worse performance in discriminating infections as compared to healthy controls. Overall, public gene expression data are a useful tool for benchmarking gene expression diagnostics.
View details for DOI 10.1097/CCM.0000000000002021
View details for Web of Science ID 000390619000001
View details for PubMedID 27681387
Purpose: Triple-negative breast cancers (TNBCs) are associated with a worse prognosis and patients with TNBC have fewer therapeutic options than patients with non-TNBC. Recently, the IRE1alpha-XBP1 branch of the unfolded protein response (UPR) was implicated in TNBC prognosis on the basis of a relatively small patient population, suggesting the diagnostic and therapeutic value of this pathway in TNBCs. In addition, the IRE1alpha-XBP1 and hypoxia-induced factor 1 alpha (HIF1alpha) pathways have been identified as interacting partners in TNBC, suggesting a novel mechanism of regulation. To comprehensively evaluate and validate these findings, we investigated the relative activities and relevance to patient survival of the UPR and HIF1alpha pathways in different breast cancer subtypes in large populations of patients.Materials and Methods: We performed a comprehensive analysis of gene expression and survival data from large cohorts of patients with breast cancer. The patients were stratified based on the average expression of the UPR or HIF1alpha gene signatures.Results: We identified a strong positive association between the XBP1 gene signature and estrogen receptor-positive status or the HIF1alpha gene signature, as well as the predictive value of the XBP1 gene signature for survival of patients who are estrogen receptor negative, or have TNBC or HER2+. In contrast, another important UPR branch, the ATF4/CHOP pathway, lacks prognostic value in breast cancer in general. Activity of the HIF1alpha pathway is correlated with patient survival in all the subtypes evaluated.Conclusion: These findings clarify the relevance of the UPR pathways in different breast cancer subtypes and underscore the potential therapeutic importance of the IRE1alpha-XBP1 branch in breast cancer treatment.
View details for DOI 10.1200/PO.16.00073
View details for PubMedID 29888341
Systemic sclerosis (SSc) is a rare autoimmune disease with the highest case-fatality rate of all connective tissue diseases. Current efforts to determine patient response to a given treatment using the modified Rodnan skin score (mRSS) are complicated by interclinician variability, confounding, and the time required between sequential mRSS measurements to observe meaningful change. There is an unmet critical need for an objective metric of SSc disease severity. Here, we performed an integrated, multicohort analysis of SSc transcriptome data across 7 datasets from 6 centers composed of 515 samples. Using 158 skin samples from SSc patients and healthy controls recruited at 2 centers as a discovery cohort, we identified a 415-gene expression signature specific for SSc, and validated its ability to distinguish SSc patients from healthy controls in an additional 357 skin samples from 5 independent cohorts. Next, we defined the SSc skin severity score (4S). In every SSc cohort of skin biopsy samples analyzed in our study, 4S correlated significantly with mRSS, allowing objective quantification of SSc disease severity. Using transcriptome data from the largest longitudinal trial of SSc patients to date, we showed that 4S allowed us to objectively monitor individual SSc patients over time, as (a) the change in 4S of a patient is significantly correlated with change in the mRSS, and (b) the change in 4S at 12 months of treatment could predict the change in mRSS at 24 months. Our results suggest that 4S could be used to distinguish treatment responders from nonresponders prior to mRSS change. Our results demonstrate the potential clinical utility of a novel robust molecular signature and a computational approach to SSc disease severity quantification.
View details for DOI 10.1172/jci.insight.89073
View details for PubMedID 28018971
View details for PubMedCentralID PMC5161207
Cell death and release of proinflammatory mediators contribute to mortality during sepsis. Specifically, caspase-11-dependent cell death contributes to pathology and decreases in survival time in sepsis models. Priming of the host cell, through TLR4 and interferon receptors, induces caspase-11 expression, and cytosolic LPS directly stimulates caspase-11 activation, promoting the release of proinflammatory cytokines through pyroptosis and caspase-1 activation. Using a CRISPR-Cas9-mediated genome-wide screen, we identified novel mediators of caspase-11-dependent cell death. We found a complement-related peptidase, carboxypeptidase B1 (Cpb1), to be required for caspase-11 gene expression and subsequent caspase-11-dependent cell death. Cpb1 modifies a cleavage product of C3, which binds to and activates C3aR, and then modulates innate immune signaling. We find the Cpb1-C3-C3aR pathway induces caspase-11 expression through amplification of MAPK activity downstream of TLR4 and Ifnar activation, and mediates severity of LPS-induced sepsis (endotoxemia) and disease outcome in mice. We show C3aR is required for up-regulation of caspase-11 orthologues, caspase-4 and -5, in primary human macrophages during inflammation and that c3aR1 and caspase-5 transcripts are highly expressed in patients with severe sepsis; thus, suggesting that these pathways are important in human sepsis. Our results highlight a novel role for complement and the Cpb1-C3-C3aR pathway in proinflammatory signaling, caspase-11 cell death, and sepsis severity.
View details for PubMedID 27697835
View details for PubMedCentralID PMC5068231
Pancreatic ductal adenocarcinoma (PDAC) is a lethal form of cancer with few therapeutic options. We found that levels of the lysine methyltransferase SMYD2 (SET and MYND domain 2) are elevated in PDAC and that genetic and pharmacological inhibition of SMYD2 restricts PDAC growth. We further identified the stress response kinase MAPKAPK3 (MK3) as a new physiologic substrate of SMYD2 in PDAC cells. Inhibition of MAPKAPK3 impedes PDAC growth, identifying a potential new kinase target in PDAC. Finally, we show that inhibition of SMYD2 cooperates with standard chemotherapy to treat PDAC cells and tumors. These findings uncover a pivotal role for SMYD2 in promoting pancreatic cancer.
View details for DOI 10.1101/gad.275529.115
View details for PubMedID 26988419
The utility of multi-cohort two-class meta-analysis to identify robust differentially expressed gene signatures has been well established. However, many biomedical applications, such as gene signatures of disease progression, require one-class analysis. Here we describe an R package, MetaCorrelator, that can identify a reproducible transcriptional signature that is correlated with a continuous disease phenotype across multiple datasets. We successfully applied this framework to extract a pattern of gene expression that can predict lung function in patients with chronic obstructive pulmonary disease (COPD) in both peripheral blood mononuclear cells (PBMCs) and tissue. Our results point to a disregulation in the oxidation state of the lungs of patients with COPD, as well as underscore the classically recognized inammatory state that underlies this disease.
View details for PubMedID 27896981
A major contributor to the scientific reproducibility crisis has been that the results from homogeneous, single-center studies do not generalize to heterogeneous, real world populations. Multi-cohort gene expression analysis has helped to increase reproducibility by aggregating data from diverse populations into a single analysis. To make the multi-cohort analysis process more feasible, we have assembled an analysis pipeline which implements rigorously studied meta-analysis best practices. We have compiled and made publicly available the results of our own multi-cohort gene expression analysis of 103 diseases, spanning 615 studies and 36,915 samples, through a novel and interactive web application. As a result, we have made both the process of and the results from multi-cohort gene expression analysis more approachable for non-technical users.
View details for PubMedID 27896970
View details for PubMedCentralID PMC5167529
The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.
View details for DOI 10.1093/jamia/ocv048
View details for PubMedID 26112029
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal human cancers and shows resistance to any therapeutic strategy used. Here we tested small-molecule inhibitors targeting chromatin regulators as possible therapeutic agents in PDAC. We show that JQ1, an inhibitor of the bromodomain and extraterminal (BET) family of proteins, suppresses PDAC development in mice by inhibiting both MYC activity and inflammatory signals. The histone deacetylase (HDAC) inhibitor SAHA synergizes with JQ1 to augment cell death and more potently suppress advanced PDAC. Finally, using a CRISPR-Cas9-based method for gene editing directly in the mouse adult pancreas, we show that de-repression of p57 (also known as KIP2 or CDKN1C) upon combined BET and HDAC inhibition is required for the induction of combination therapy-induced cell death in PDAC. SAHA is approved for human use, and molecules similar to JQ1 are being tested in clinical trials. Thus, these studies identify a promising epigenetic-based therapeutic strategy that may be rapidly implemented in fatal human tumors.
View details for DOI 10.1038/nm.3952
View details for PubMedID 26390243
View details for Web of Science ID 000346211801253
Deregulation of lysine methylation signalling has emerged as a common aetiological factor in cancer pathogenesis, with inhibitors of several histone lysine methyltransferases (KMTs) being developed as chemotherapeutics. The largely cytoplasmic KMT SMYD3 (SET and MYND domain containing protein 3) is overexpressed in numerous human tumours. However, the molecular mechanism by which SMYD3 regulates cancer pathways and its relationship to tumorigenesis in vivo are largely unknown. Here we show that methylation of MAP3K2 by SMYD3 increases MAP kinase signalling and promotes the formation of Ras-driven carcinomas. Using mouse models for pancreatic ductal adenocarcinoma and lung adenocarcinoma, we found that abrogating SMYD3 catalytic activity inhibits tumour development in response to oncogenic Ras. We used protein array technology to identify the MAP3K2 kinase as a target of SMYD3. In cancer cell lines, SMYD3-mediated methylation of MAP3K2 at lysine 260 potentiates activation of the Ras/Raf/MEK/ERK signalling module and SMYD3 depletion synergizes with a MEK inhibitor to block Ras-driven tumorigenesis. Finally, the PP2A phosphatase complex, a key negative regulator of the MAP kinase pathway, binds to MAP3K2 and this interaction is blocked by methylation. Together, our results elucidate a new role for lysine methylation in integrating cytoplasmic kinase-signalling cascades and establish a pivotal role for SMYD3 in the regulation of oncogenic Ras signalling.
View details for DOI 10.1038/nature13320
View details for PubMedID 24847881
We propose and discuss a method for doing gene expression meta-analysis (multiple datasets) across multiplex measurement modalities measuring the expression of many genes simultaneously (e.g. microarrays and RNAseq) using external control samples and a method of heterogeneity detection to identify and filter on comparable gene expression measurements. We demonstrate this approach on publicly available gene expression datasets from samples of medulloblastoma and normal cerebellar tissue and identify some potential new targets in the treatment of medulloblastoma.
View details for PubMedID 24297537
Neurodegenerative diseases share common pathologic features including neuroinflammation, mitochondrial dysfunction and protein aggregation, suggesting common underlying mechanisms of neurodegeneration. We undertook a meta-analysis of public gene expression data for neurodegenerative diseases to identify a common transcriptional signature of neurodegeneration.Using 1,270 post-mortem central nervous system tissue samples from 13 patient cohorts covering four neurodegenerative diseases, we identified 243 differentially expressed genes, which were similarly dysregulated in 15 additional patient cohorts of 205 samples including seven neurodegenerative diseases. This gene signature correlated with histologic disease severity. Metallothioneins featured prominently among differentially expressed genes, and functional pathway analysis identified specific convergent themes of dysregulation. MetaCore network analyses revealed various novel candidate hub genes (e.g. STAU2). Genes associated with M1-polarized macrophages and reactive astrocytes were strongly enriched in the meta-analysis data. Evaluation of genes enriched in neurons revealed 70 down-regulated genes, over half not previously associated with neurodegeneration. Comparison with aging brain data (3 patient cohorts, 221 samples) revealed 53 of these to be unique to neurodegenerative disease, many of which are strong candidates to be important in neuropathogenesis (e.g. NDN, NAP1L2). ENCODE ChIP-seq analysis predicted common upstream transcriptional regulators not associated with normal aging (REST, RBBP5, SIN3A, SP2, YY1, ZNF143, IKZF1). Finally, we removed genes common to neurodegeneration from disease-specific gene signatures, revealing uniquely robust immune response and JAK-STAT signaling in amyotrophic lateral sclerosis.Our results implicate pervasive bioenergetic deficits, M1-type microglial activation and gliosis as unifying themes of neurodegeneration, and identify numerous novel genes associated with neurodegenerative processes.
View details for DOI 10.1186/s40478-014-0093-y
View details for PubMedID 25187168
View details for PubMedCentralID PMC4167139
Small cell lung cancer (SCLC) is an aggressive neuroendocrine subtype of lung cancer with high mortality. We used a systematic drug repositioning bioinformatics approach querying a large compendium of gene expression profiles to identify candidate U.S. Food and Drug Administration (FDA)-approved drugs to treat SCLC. We found that tricyclic antidepressants and related molecules potently induce apoptosis in both chemonaïve and chemoresistant SCLC cells in culture, in mouse and human SCLC tumors transplanted into immunocompromised mice, and in endogenous tumors from a mouse model for human SCLC. The candidate drugs activate stress pathways and induce cell death in SCLC cells, at least in part by disrupting autocrine survival signals involving neurotransmitters and their G protein-coupled receptors. The candidate drugs inhibit the growth of other neuroendocrine tumors, including pancreatic neuroendocrine tumors and Merkel cell carcinoma. These experiments identify novel targeted strategies that can be rapidly evaluated in patients with neuroendocrine tumors through the repurposing of approved drugs.Our work shows the power of bioinformatics-based drug approaches to rapidly repurpose FDA-approved drugs and identifies a novel class of molecules to treat patients with SCLC, a cancer for which no effective novel systemic treatments have been identified in several decades. In addition, our experiments highlight the importance of novel autocrine mechanisms in promoting the growth of neuroendocrine tumor cells.
View details for DOI 10.1158/2159-8290.CD-13-0183
View details for Web of Science ID 000328257500023
View details for PubMedID 24078773
View details for PubMedCentralID PMC3864571
Cancer-associated fibroblasts (CAF) have been reported to support tumor progression by a variety of mechanisms. However, their role in the progression of non-small cell lung cancer (NSCLC) remains poorly defined. In addition, the extent to which specific proteins secreted by CAFs contribute directly to tumor growth is unclear. To study the role of CAFs in NSCLCs, a cross-species functional characterization of mouse and human lung CAFs was conducted. CAFs supported the growth of lung cancer cells in vivo by secretion of soluble factors that directly stimulate the growth of tumor cells. Gene expression analysis comparing normal mouse lung fibroblasts and mouse lung CAFs identified multiple genes that correlate with the CAF phenotype. A gene signature of secreted genes upregulated in CAFs was an independent marker of poor survival in patients with NSCLC. This secreted gene signature was upregulated in normal lung fibroblasts after long-term exposure to tumor cells, showing that lung fibroblasts are "educated" by tumor cells to acquire a CAF-like phenotype. Functional studies identified important roles for CLCF1-CNTFR and interleukin (IL)-6-IL-6R signaling in promoting growth of NSCLCs. This study identifies novel soluble factors contributing to the CAF protumorigenic phenotype in NSCLCs and suggests new avenues for the development of therapeutic strategies.
View details for DOI 10.1158/0008-5472.CAN-12-1097
View details for Web of Science ID 000311141300012
View details for PubMedID 22962265
Monitoring of renal graft status through peripheral blood (PB) rather than invasive biopsy is important as it will lessen the risk of infection and other stresses, while reducing the costs of rejection diagnosis. Blood gene biomarker panels were discovered by microarrays at a single center and subsequently validated and cross-validated by QPCR in the NIH SNSO1 randomized study from 12 US pediatric transplant programs. A total of 367 unique human PB samples, each paired with a graft biopsy for centralized, blinded phenotype classification, were analyzed (115 acute rejection (AR), 180 stable and 72 other causes of graft injury). Of the differentially expressed genes by microarray, Q-PCR analysis of a five gene-set (DUSP1, PBEF1, PSEN1, MAPK9 and NKTR) classified AR with high accuracy. A logistic regression model was built on independent training-set (n = 47) and validated on independent test-set (n = 198)samples, discriminating AR from STA with 91% sensitivity and 94% specificity and AR from all other non-AR phenotypes with 91% sensitivity and 90% specificity. The 5-gene set can diagnose AR potentially avoiding the need for invasive renal biopsy. These data support the conduct of a prospective study to validate the clinical predictive utility of this diagnostic tool.
View details for DOI 10.1111/j.1600-6143.2012.04253.x
View details for Web of Science ID 000309180000018
View details for PubMedID 23009139
Chronic allograft injury (CAI) results from a humoral response to mismatches in immunogenic epitopes between the donor and recipient. Although alloantibodies against HLA antigens contribute to the pathogenesis of CAI, alloantibodies against non-HLA antigens likely contribute as well. Here, we used high-density protein arrays to identify non-HLA antibodies in CAI and subsequently validated a subset in a cohort of 172 serum samples collected serially post-transplantation. There were 38 de novo non-HLA antibodies that significantly associated with the development of CAI (P<0.01) on protocol post-transplant biopsies, with enrichment of their corresponding antigens in the renal cortex. Baseline levels of preformed antibodies to MIG (also called CXCL9), ITAC (also called CXCL11), IFN-γ, and glial-derived neurotrophic factor positively correlated with histologic injury at 24 months. Measuring levels of these four antibodies could help clinicians predict the development of CAI with >80% sensitivity and 100% specificity. In conclusion, pretransplant serum levels of a defined panel of alloantibodies targeting non-HLA immunogenic antigens associate with histologic CAI in the post-transplant period. Validation in a larger, prospective transplant cohort may lead to a noninvasive method to predict and monitor for CAI.
View details for DOI 10.1681/ASN.2011060596
View details for Web of Science ID 000302333300022
View details for PubMedID 22302197
IgG commonly co-exists with IgA in the glomerular mesangium of patients with IgA nephropathy (IgAN) with unclear clinical relevance. Autoantibody (autoAb) biomarkers to detect and track progression of IgAN are an unmet clinical need. The objective of the study was to identify IgA-specific autoAbs specific to IgAN.High-density protein microarrays were evaluated IgG autoAbs in the serum of IgAN patients (n = 22) and controls (n = 10). Clinical parameters, including annual GFR and urine protein measurements, were collected on all patients over 5 years. Bioinformatic data analysis was performed to select targets for further validation by immunohistochemistry (IHC).One hundred seventeen (1.4%) specific antibodies were increased in IgAN. Among the most significant were the autoAb to the Ig family of proteins. IgAN-specific autoAbs (approximately 50%) were mounted against proteins predominantly expressed in glomeruli and tubules, and selected candidates were verified by IHC. Receiver operating characteristic analysis of our study demonstrated that IgG autoAb levels (matriline 2, ubiquitin-conjugating enzyme E2W, DEAD box protein, and protein kinase D1) might be used in combination with 24-hour proteinuria to improve prediction of the progression of IgAN (area under the curve = 0.86, P = 0.02).IgAN is associated with elevated IgG autoAbs to multiple proteins in the kidney. This first analysis of the repertoire of autoAbs in IgAN identifies novel, immunogenic protein targets that are highly expressed in the kidney glomerulus and tubules that may bear relevance in the pathogenesis and progression of IgAN.
View details for DOI 10.2215/CJN.04600511
View details for Web of Science ID 000297948900009
View details for PubMedID 22157707
View details for PubMedCentralID PMC3255376
The degree of progressive chronic histological damage is associated with long-term renal allograft survival. In order to identify promising molecular targets for timely intervention, we examined renal allograft protocol and indication biopsies from 120 low-risk pediatric and adolescent recipients by whole-genome microarray expression profiling. In data-driven analysis, we found a highly regulated pattern of adaptive and innate immune gene expression that correlated with established or ongoing histological chronic injury, and also with development of future chronic histological damage, even in histologically pristine kidneys. Hence, histologically unrecognized immunological injury at a molecular level sets the stage for the development of chronic tissue injury, while the same molecular response is accentuated during established and worsening chronic allograft damage. Irrespective of the hypothesized immune or nonimmune trigger for chronic allograft injury, a highly orchestrated regulation of innate and adaptive immune responses was found in the graft at the molecular level. This occurred months before histologic lesions appear, and quantitatively below the diagnostic threshold of classic T-cell or antibody-mediated rejection. Thus, measurement of specific immune gene expression in protocol biopsies may be warranted to predict the development of subsequent chronic injury in histologically quiescent grafts and as a means to titrate immunosuppressive therapy.
View details for DOI 10.1038/ki.2011.245
View details for Web of Science ID 000297541900014
View details for PubMedID 21881554
Technological advances in molecular and in silico research have enabled significant progress towards personalized transplantation medicine. It is now possible to conduct comprehensive biomarker development studies of transplant organ pathologies, correlating genomic, transcriptomic and proteomic information from donor and recipient with clinical and histological phenotypes. Translation of these advances to the clinical setting will allow assessment of an individual patient's risk of allograft damage or accommodation. Transplantation biomarkers are needed for active monitoring of immunosuppression, to reduce patient morbidity, and to improve long-term allograft function and life expectancy. Here, we highlight recent pre- and post-transplantation biomarkers of acute and chronic allograft damage or adaptation, focusing on peripheral blood-based methodologies for non-invasive application. We then critically discuss current findings with respect to their future application in routine clinical transplantation medicine. Complement-system-associated SNPs present potential biomarkers that may be used to indicate the baseline risk for allograft damage prior to transplantation. The detection of antibodies against novel, non-HLA, MICA antigens, and the expression of cytokine genes and proteins and cytotoxicity-related genes have been correlated with allograft damage and are potential post-transplantation biomarkers indicating allograft damage at the molecular level, although these do not have clinical relevance yet. Several multi-gene expression-based biomarker panels have been identified that accurately predicted graft accommodation in liver transplant recipients and may be developed into a predictive biomarker assay.
View details for DOI 10.1186/gm253
View details for PubMedID 21658299
Combining the results of studies using highly parallelized measurements of gene expression such as microarrays and RNAseq offer unique challenges in meta analysis. Motivated by a need for a deeper understanding of organ transplant rejection, we combine the data from five separate studies to compare acute rejection versus stability after solid organ transplantation, and use this data to examine approaches to multiplex meta analysis.We demonstrate that a commonly used parametric effect size estimate approach and a commonly used non-parametric method give very different results in prioritizing genes. The parametric method providing a meta effect estimate was superior at ranking genes based on our gold-standard of identifying immune response genes in the transplant rejection datasets.Different methods of multiplex analysis can give substantially different results. The method which is best for any given application will likely depend on the particular domain, and it remains for future work to see if any one method is consistently better at identifying important biological signal across gene expression experiments.
View details for DOI 10.1186/1471-2105-11-S9-S6
View details for Web of Science ID 000290218700006
View details for PubMedID 21044364
View details for PubMedCentralID PMC2967747
The gene expression changes produced by moderate hypothermia are not fully known, but appear to differ in important ways from those produced by heat shock. We examined the gene expression changes produced by moderate hypothermia and tested the hypothesis that rewarming after hypothermia approximates a heat-shock response. Six sets of human HepG2 hepatocytes were subjected to moderate hypothermia (31 degrees C for 16 h), a conventional in vitro heat shock (43 degrees C for 30 min) or control conditions (37 degrees C), then harvested immediately or allowed to recover for 3 h at 37 degrees C. Expression analysis was performed with Affymetrix U133A gene chips, using analysis of variance-based techniques. Moderate hypothermia led to distinct time-dependent expression changes, as did heat shock. Hypothermia initially caused statistically significant, greater than or equal to twofold changes in expression (relative to controls) of 409 sequences (143 increased and 266 decreased), whereas heat shock affected 71 (35 increased and 36 decreased). After 3 h of recovery, 192 sequences (83 increased, 109 decreased) were affected by hypothermia and 231 (146 increased, 85 decreased) by heat shock. Expression of many heat shock proteins was decreased by hypothermia but significantly increased after rewarming. A comparison of sequences affected by thermal stress without regard to the magnitude of change revealed that the overlap between heat and cold stress was greater after 3 h of recovery than immediately following thermal stress. Thus, while some overlap occurs (particularly after rewarming), moderate hypothermia produces extensive, time-dependent gene expression changes in HepG2 cells that differ in important ways from those induced by heat shock.
View details for DOI 10.1007/s12192-010-0181-2
View details for Web of Science ID 000280781800021
View details for PubMedID 20526826
View details for Web of Science ID 000275921701073
We have developed NetPath as a resource of curated human signaling pathways. As an initial step, NetPath provides detailed maps of a number of immune signaling pathways, which include approximately 1,600 reactions annotated from the literature and more than 2,800 instances of transcriptionally regulated genes - all linked to over 5,500 published articles. We anticipate NetPath to become a consolidated resource for human signaling pathways that should enable systems biology approaches.
View details for DOI 10.1186/gb-2010-11-1-r3
View details for Web of Science ID 000276433600011
View details for PubMedID 20067622
The correct interpretation of many molecular biology experiments depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are meant to act as repositories for our biological knowledge as we acquire and refine it. Hence, by definition, they are incomplete at any given time. In this paper, we describe a technique that improves our previous method for predicting novel GO annotations by extracting implicit semantic relationships between genes and functions. In this work, we use a vector space model and a number of weighting schemes in addition to our previous latent semantic indexing approach. The technique described here is able to take into consideration the hierarchical structure of the Gene Ontology (GO) and can weight differently GO terms situated at different depths. The prediction abilities of 15 different weighting schemes are compared and evaluated. Nine such schemes were previously used in other problem domains, while six of them are introduced in this paper. The best weighting scheme was a novel scheme, n2tn. Out of the top 50 functional annotations predicted using this weighting scheme, we found support in the literature for 84 percent of them, while 6 percent of the predictions were contradicted by the existing literature. For the remaining 10 percent, we did not find any relevant publications to confirm or contradict the predictions. The n2tn weighting scheme also outperformed the simple binary scheme used in our previous approach.
View details for DOI 10.1109/TCBB.2008.29
View details for Web of Science ID 000274063600008
View details for PubMedID 20150671
In the last decade, microarray technology has revolutionized biological research by allowing the screening of tens of thousands of genes simultaneously. This article reviews recent studies in organ transplantation using microarrays and highlights the issues that should be addressed in order to use microarrays in diagnosis of rejection.Microarrays have been useful in identifying potential biomarkers for chronic rejection in peripheral blood mononuclear cells, novel pathways for induction of tolerance, and genes involved in protecting the graft from the host immune system. Microarray analysis of peripheral blood mononuclear cells from chronic antibody-mediated rejection has identified potential noninvasive biomarkers. In a recent study, correlation of pathogenesis-based transcripts with histopathologic lesions is a promising step towards inclusion of microarrays in clinics for organ transplants.Despite promising results in diagnosis of histopathologic lesions using microarrays, the low dynamic range of microarrays and large measured expression changes within the probes for the same gene continue to cast doubts on their readiness for diagnosis of rejection. More studies must be performed to resolve these issues. Dominating expression of globin genes in whole blood poses another challenge for identification of noninvasive biomarkers. In addition, studies are also needed to demonstrate effects of different immunosuppression therapies and their outcomes.
View details for DOI 10.1097/MOT.0b013e32831e13d0
View details for Web of Science ID 000264312900007
View details for PubMedID 19337144
Gene expression class comparison studies may identify hundreds or thousands of genes as differentially expressed (DE) between sample groups. Gaining biological insight from the result of such experiments can be approached, for instance, by identifying the signaling pathways impacted by the observed changes. Most of the existing pathway analysis methods focus on either the number of DE genes observed in a given pathway (enrichment analysis methods), or on the correlation between the pathway genes and the class of the samples (functional class scoring methods). Both approaches treat the pathways as simple sets of genes, disregarding the complex gene interactions that these pathways are built to describe.We describe a novel signaling pathway impact analysis (SPIA) that combines the evidence obtained from the classical enrichment analysis with a novel type of evidence, which measures the actual perturbation on a given pathway under a given condition. A bootstrap procedure is used to assess the significance of the observed total pathway perturbation. Using simulations we show that the evidence derived from perturbations is independent of the pathway enrichment evidence. This allows us to calculate a global pathway significance P-value, which combines the enrichment and perturbation P-values. We illustrate the capabilities of the novel method on four real datasets. The results obtained on these data show that SPIA has better specificity and more sensitivity than several widely used pathway analysis methods.SPIA was implemented as an R package available at http://vortex.cs.wayne.edu/ontoexpress/
View details for DOI 10.1093/bioinformatics/btn577
View details for Web of Science ID 000261996400012
View details for PubMedID 18990722
View details for Web of Science ID 000263827202029
Onto-Tools is a freely available web-accessible software suite, composed of an annotation database and nine complementary data-mining tools. This article describes a new tool, Onto-Express-to-go (OE2GO), as well as some new features implemented in Pathway-Express and Onto-Miner over the past year. Pathway-Express (PE) has been enhanced to identify significantly perturbed pathways in a given condition using the differentially expressed genes in the input. OE2GO is a tool for functional profiling using custom annotations. The development of this tool was aimed at the researchers working with organisms for which annotations are not yet available in the public domain. OE2GO allows researchers to use either annotation data from the Onto-Tools database, or their own custom annotations. By removing the necessity to use any specific database, OE2GO makes the functional profiling available for all organisms, with annotations using any ontology. The Onto-Tools are freely available at http://vortex.cs.wayne.edu/projects.htm.
View details for DOI 10.1093/nar/gkm327
View details for Web of Science ID 000255311500039
View details for PubMedID 17584796
View details for Web of Science ID 000248516200030
View details for Web of Science ID 000252725900004
Annotation databases are widely used as public repositories of biological knowledge. However, most of these resources have been developed by independent groups which used different designs and different identifiers for the same biological entities. As we show in this article, incoherent name spaces between various databases represent a serious impediment to using the existing annotations at their full potential. Navigating between various such name spaces by mapping IDs from one database to another is a very important issue which is not properly addressed at the moment.We have developed a web-based resource, Onto-Translate (OT), which effectively addresses this problem. OT is able to map onto each other different types of biological entities from the following annotation databases: Swiss-Prot, TrEMBL, NREF, PIR, Gene Ontology, KEGG, Entrez Gene, GenBank, GenPept, IMAGE, RefSeq, UniGene, OMIM, PDB, Eukaryotic Promoter Database, HUGO Gene Nomenclature Committee and NetAffx. Currently, OT is able to perform 462 types of mappings between 29 different types of IDs from 17 databases concerning 53 organisms. Among these, over 300 types of translations and 15 types of IDs are not currently supported by any other tool or resource. On average, OT is able to correctly map between 96 and 99% of the biological entities provided as input. In terms of speed, sets of approximately 20 000 IDs can be translated in <30 s, in most cases.OT is a part of Onto-Tools, which is freely available at http://vortex.cs.wayne.edu/Projects.html
View details for DOI 10.1093/bioinofrmatics/btl372
View details for Web of Science ID 000242246300015
View details for PubMedID 17068090
The Onto-Tools suite is composed of an annotation database and eight complementary, web-accessible data mining tools: Onto-Express, Onto-Compare, Onto-Design, Onto-Translate, Onto-Miner, Pathway-Express, Promoter-Express and nsSNPCounter. Promoter-Express is a new tool added to the Onto-Tools ensemble that facilitates the identification of transcription factor binding sites active in specific conditions. nsSNPCounter is another new tool that allows computation and analysis of synonymous and non-synonymous codon substitutions for studying evolutionary rates of protein coding genes. Onto-Translate has also been enhanced to expand its scope and accuracy by fully utilizing the capabilities of the Onto-Tools database. Currently, Onto-Translate allows arbitrary mappings between 28 types of IDs for 53 organisms. Onto-Tools are freely available at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gkl213
View details for Web of Science ID 000245650200126
View details for PubMedID 16845086
DNA microarrays enable researchers to monitor the expression of thousands of genes simultaneously. However, the current technology has several limitations. Here we discuss problems related to the sensitivity, accuracy, specificity and reproducibility of microarray results. The existing data suggest that for relatively abundant transcripts the existence and direction (but not the magnitude) of expression changes can be reliably detected. However, accurate measurements of absolute expression levels and the reliable detection of low abundance genes are difficult to achieve. The main problems seem to be the sub-optimal design or choice of probes and some incorrect probe annotations. Well-designed data-analysis approaches can rectify some of these problems.
View details for DOI 10.1016/j.tig.2005.12.005
View details for Web of Science ID 000235576900009
View details for PubMedID 16380191
Independent of the platform and the analysis methods used, the result of a microarray experiment is, in most cases, a list of differentially expressed genes. An automatic ontological analysis approach has been recently proposed to help with the biological interpretation of such results. Currently, this approach is the de facto standard for the secondary analysis of high throughput experiments and a large number of tools have been developed for this purpose. We present a detailed comparison of 14 such tools using the following criteria: scope of the analysis, visualization capabilities, statistical model(s) used, correction for multiple comparisons, reference microarrays available, installation issues and sources of annotation data. This detailed analysis of the capabilities of these tools will help researchers choose the most appropriate tool for a given type of analysis. More importantly, in spite of the fact that this type of analysis has been generally adopted, this approach has several important intrinsic drawbacks. These drawbacks are associated with all tools discussed and represent conceptual limitations of the current state-of-the-art in ontological analysis. We propose these as challenges for the next generation of secondary data analysis tools.
View details for DOI 10.1093/bioinformatics/bti565
View details for Web of Science ID 000231694600001
View details for PubMedID 15994189
The correct interpretation of any biological experiment depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are ubiquitous and used by all life scientists in most experiments. However, it is well known that such databases are incomplete and many annotations may also be incorrect. In this paper we describe a technique that can be used to analyze the semantic content of such annotation databases. Our approach is able to extract implicit semantic relationships between genes and functions. This ability allows us to discover novel functions for known genes. This approach is able to identify missing and inaccurate annotations in existing annotation databases, and thus help improve their accuracy. We used our technique to analyze the current annotations of the human genome. From this body of annotations, we were able to predict 212 additional gene-function assignments. A subsequent literature search found that 138 of these gene-functions assignments are supported by existing peer-reviewed papers. An additional 23 assignments have been confirmed in the meantime by the addition of the respective annotations in later releases of the Gene Ontology database. Overall, the 161 confirmed assignments represent 75.95% of the proposed gene-function assignments. Only one of our predictions (0.4%) was contradicted by the existing literature. We could not find any relevant articles for 50 of our predictions (23.58%). The method is independent of the organism and can be used to analyze and improve the quality of the data of any public or private annotation database.
View details for DOI 10.1093/bioinformatics/bti538
View details for Web of Science ID 000231360600012
View details for PubMedID 15955782
The Onto-Tools suite is composed of an annotation database and six seamlessly integrated, web-accessible data mining tools: Onto-Express, Onto-Compare, Onto-Design, Onto-Translate, Onto-Miner and Pathway-Express. The Onto-Tools database has been expanded to include various types of data from 12 new databases. Our database now integrates different types of genomic data from 19 sequence, gene, protein and annotation databases. Additionally, our database is also expanded to include complete Gene Ontology (GO) annotations. Using the enhanced database and GO annotations, Onto-Express now allows functional profiling for 24 organisms and supports 17 different types of input IDs. Onto-Translate is also enhanced to fully utilize the capabilities of the new Onto-Tools database with an ultimate goal of providing the users with a non-redundant and complete mapping from any type of identification system to any other type. Currently, Onto-Translate allows arbitrary mappings between 29 types of IDs. Pathway-Express is a new tool that helps the users find the most interesting pathways for their input list of genes. Onto-Tools are freely available at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gki472
View details for Web of Science ID 000230271400156
View details for PubMedID 15980579
Sequences that are present in a given species or strain while absent from or different in any other organisms can be used to distinguish the target organism from other related or un-related species. Such DNA signatures are particularly important for the identification of genetic source of drug resistance of a strain or for the detection of organisms that can be used as biological agents in warfare or terrorism. Most approaches used to find DNA signatures are laboratory based, require a great deal of effort and can only distinguish between two organisms at a time. We propose a more efficient and cost-effective bioinformatics approach that allows identification of genomic fingerprints for a target organism. We validated our approach using a custom microarray, using sequences identified as DNA fingerprints of Bacillus anthracis. Hybridization results showed that the sequences found using our algorithm were truly unique to B. anthracis and were able to distinguish B. anthracis from its close relatives B. cereus and B. thuringiensis.
View details for Web of Science ID 000230169100021
View details for PubMedID 15759631
View details for Web of Science ID 000235518600028
The Onto-Tools suite is composed of an annotation database and five seamlessly integrated web-accessible data mining tools: Onto-Express (OE), Onto-Compare (OC), Onto-Design (OD), Onto-Translate (OT) and Onto-Miner (OM). OM is a new tool that provides a unified access point and an application programming interface for most annotations available. Our database has been enhanced with more than 120 new commercial microarrays and annotations for Rattus norvegicus, Drosophila melanogaster and Carnorhabditis elegans. The Onto-Tools have been redesigned to provide better biological insight, improved performance and user convenience. The new features implemented in OE include support for gene names, LocusLink IDs and Gene Ontology (GO) IDs, ability to specify fold changes for the input genes, links to the KEGG pathway database and detailed output files. OC allows comparisons of the functional bias of more than 170 commercial microarrays. The latest version of OD allows the user to specify keywords if the exact GO term is not known as well as providing more details than the previous version. OE, OC and OD now have an integrated GO browser that allows the user to customize the level of abstraction for each GO category. The Onto-Tools are available online at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gkh409
View details for Web of Science ID 000222273100090
View details for PubMedID 15215428
Onto-Tools is a set of four seamlessly integrated databases: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Onto-Express is able to automatically translate lists of genes found to be differentially regulated in a given condition into functional profiles characterizing the impact of the condition studied upon various biological processes and pathways. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function and chromosome location. Statistical significance values are calculated for each category. Once the initial exploratory analysis identified a number of relevant biological processes, specific mechanisms of interactions can be hypothesized for the conditions studied. Currently, many commercial arrays are available for the investigation of specific mechanisms. Each such array is characterized by a biological bias determined by the extent to which the genes present on the array represent specific pathways. Onto-Compare is a tool that allows efficient comparisons of any sets of commercial or custom arrays. Using Onto-Compare, a researcher can determine quickly which array, or set of arrays, covers best the hypotheses studied. In many situations, no commercial arrays are available for specific biological mechanisms. Onto-Design is a tool that allows the user to select genes that represent given functional categories. Onto-Translate allows the user to translate easily lists of accession numbers, UniGene clusters and Affymetrix probes into one another. All tools above are seamlessly integrated. The Onto-Tools are available online at http://vortex.cs.wayne.edu/Projects.html.
View details for DOI 10.1093/nar/gkg624
View details for Web of Science ID 000183832900108
View details for PubMedID 12824416
Microarrays are at the center of a revolution in biotechnology, allowing researchers to screen tens of thousands of genes simultaneously. Typically, they have been used in exploratory research to help formulate hypotheses. In most cases, this phase is followed by a more focused, hypothesis-driven stage in which certain specific biological processes and pathways are thought to be involved. Since a single biological process can still involve hundreds of genes, microarrays are still the preferred approach as proven by the availability of focused arrays from several manufacturers. Because focused arrays from different manufacturers use different sets of genes, each array will represent any given regulatory pathway to a different extent. We argue that a functional analysis of the arrays available should be the most important criterion used in the array selection. We developed Onto-Compare as a database that can provide this functionality, based on the Gene Ontology Consortium nomenclature. We used this tool to compare several arrays focused on apoptosis, oncogenes, and tumor suppressors. We considered arrays from BD Biosciences Clontech, PerkinElmer, Sigma-Genosys, and SuperArray. We showed that among the oncogene arrays, the PerkinElmer MICROMAX oncogene microarray has a better representation of oncogenesis, protein phosphorylation, and negative control of cell proliferation. The comparison of the apoptosis arrays showed that most apoptosis-related biological processes are equally well represented on the arrays considered. However, functional categories such as immune response, cell-cell signaling, cell-surface receptor linked signal transduction, and interleukins are better represented on the Sigma-Genoys Panorama human apoptosis array. At the same time, processes such as cell cycle control, oncogenesis, and negative control of cell proliferation are better represented on the BD Biosciences Clontech Atlas Select human apoptosis array.
View details for Web of Science ID 000181595900009
View details for PubMedID 12664686
The typical result of a microarray experiment is a list of tens or hundreds of genes found to be differentially regulated in the condition under study. Independent of the methods used to select these genes, the common task faced by any researcher is to translate these lists of genes into a better understanding of the biological phenomena involved. Currently, this is done through a tedious combination of searches through the literature and a number of public databases. We developed Onto-Express (OE) as a novel tool able to automatically translate such lists of differentially regulated genes into functional profiles characterizing the impact of the condition studied. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function, and chromosome location. Statistical significance values are calculated for each category. We demonstrate the validity and the utility of this comprehensive global analysis of gene function by analyzing two breast cancer datasets from two separate laboratories. OE was able to identify correctly all biological processes postulated by the original authors, as well as discover novel relevant mechanisms.
View details for DOI 10.1016/S0888-7543(02)00021-6
View details for Web of Science ID 000181532700002
View details for PubMedID 12620386
Gene expression profiles obtained through microarray or data mining analyses often exist as vast data strings. To interpret the biology of these genetic profiles, investigators must analyze this data in the context of other information such as the biological, biochemical, or molecular function of the translated proteins. This is particularly challenging for a human analyst because large quantities of less than relevant data often bury such information. To address this need we implemented an automated routine, called Onto-Express (http://vortex.cs.wayne.edu:8080), to systematically translate genetic fingerprints into functional profiles. Using strings of accession or cluster identification numbers, Onto-Express searches the public databases and returns tables that correlate expression profiles with the cytogenetic locations, biochemical and molecular functions, biological processes, cellular components, and cellular roles of the translated proteins. The profiles created by Onto-Express fundamentally increase the value of gene expression analyses by facilitating the translation of quantitative value sets to records that contain biological implications.
View details for DOI 10.1006/geno.2002.6698
View details for Web of Science ID 000173628100016
View details for PubMedID 11829497