Honors & Awards

  • Faculty Fellow at the Stanford Center at Peking University, SCPKU (September-October 2016)
  • Henri Benedictus Fellow, King Baudouin Foundation (June 2009)
  • Honorary Fellow, Belgian American Educational Foundation (BAEF) (June 2009)

Boards, Advisory Committees, Professional Organizations

  • Member, International Society for Computational Biology (ISCB) (2006 - Present)
  • Member, American Association for Cancer Research (AACR) (2010 - Present)

Professional Education

  • Certificate, Stanford Business School, Stanford Ignite (2012)
  • Ph.D, University of Leuven, Belgium, BIoinformatics (2008)
  • M.S., University of Leuven, Belgium, Artificial Intelligence (2004)
  • M.S., University College, Ghent, Belgium, Electrical Engineering/Computer Science (2003)

Research & Scholarship

Current Research and Scholarly Interests

My lab focuses on biomedical data fusion: the development of machine learning methods for biomedical decision support using multi-scale biomedical data. Previously we pioneered data fusion work using Bayesian and kernel methods studying breast and ovarian cancer. Additionally, we developed computational algorithms for the identification of driver genes using multi-omics data. Furthermore, we are working on multi-scale biomedical data fusion methods, bridging the molecular using omics data, cellular using pathology data and tissue using medical imaging data.


2018-19 Courses

Stanford Advisees

Graduate and Fellowship Programs

  • Biomedical Informatics (Phd Program)


All Publications

  • MethylMix 2.0: an R package for identifying DNA methylation genes. Bioinformatics (Oxford, England) Cedoz, P., Prunello, M., Brennan, K., Gevaert, O. 2018


    Summary: DNA methylation is an important mechanism regulating gene transcription, and its role in carcinogenesis has been extensively studied. Hyper and hypomethylation of genes is a major mechanism of gene expression deregulation in a wide range of diseases. At the same time, high-throughput DNA methylation assays have been developed generating vast amounts of genome wide DNA methylation measurements. We developed MethylMix, an algorithm implemented in R to identify disease specific hyper and hypomethylated genes. Here we present a new version of MethylMix that automates the construction of DNA-methylation and gene expression datasets from The Cancer Genome Atlas (TCGA). More precisely, MethylMix 2.0 incorporates two major updates: the automated downloading of DNA methylation and gene expression datasets from TCGA and the automated preprocessing of such datasets: value imputation, batch correction and CpG sites clustering within each gene. The resulting datasets can subsequently be analyzed with MethylMix to identify transcriptionally predictive methylation states. We show that the Differential Methylation Values created by MethylMix can be used for cancer subtyping.Contact: and implementation: MethylMix 2.0 was implemented as an R package and is available in bioconductor.

    View details for DOI 10.1093/bioinformatics/bty156

    View details for PubMedID 29668835

  • Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation CELL Malta, T. M., Sokolov, A., Gentles, A. J., Burzykowski, T., Poisson, L., Weinstein, J. N., Kaminska, B., Huelsken, J., Omberg, L., Gevaert, O., Colaprico, A., Czerwinska, P., Mazurek, S., Mishra, L., Heyn, H., Krasnitz, A., Godwin, A. K., Lazar, A. J., Stuart, J. M., Hoadley, K. A., Laird, P. W., Noushmehr, H., Wiznerowicz, M., Cancer Genome Atlas Res Network 2018; 173 (2): 338-+


    Cancer progression involves the gradual loss of a differentiated phenotype and acquisition of progenitor and stem-cell-like features. Here, we provide novel stemness indices for assessing the degree of oncogenic dedifferentiation. We used an innovative one-class logistic regression (OCLR) machine-learning algorithm to extract transcriptomic and epigenetic feature sets derived from non-transformed pluripotent stem cells and their differentiated progeny. Using OCLR, we were able to identify previously undiscovered biological mechanisms associated with the dedifferentiated oncogenic state. Analyses of the tumor microenvironment revealed unanticipated correlation of cancer stemness with immune checkpoint expression and infiltrating immune cells. We found that the dedifferentiated oncogenic phenotype was generally most prominent in metastatic tumors. Application of our stemness indices to single-cell data revealed patterns of intra-tumor molecular heterogeneity. Finally, the indices allowed for the identification of novel targets and possible targeted therapies aimed at tumor differentiation.

    View details for PubMedID 29625051

  • Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas CELL REPORTS Campbell, J. D., Yau, C., Bowlby, R., Liu, Y., Brennan, K., Fan, H., Taylor, A. M., Wang, C., Walter, V., Akbani, R., Byers, L., Creighton, C. J., Coarfa, C., Shih, J., Cherniack, A. D., Gevaert, O., Prunello, M., Shen, H., Anur, P., Chen, J., Cheng, H., Hayes, D., Bullman, S., Pedamallu, C., Ojesina, A. I., Sadeghi, S., Mungall, K. L., Robertson, A., Benz, C., Schultz, A., Kanchi, R. S., Gay, C. M., Hegde, A., Diao, L., Wang, J., Ma, W., Sumazin, P., Chiu, H., Chen, T., Gunaratne, P., Donehower, L., Rader, J. S., Zuna, R., Al-Ahmadie, H., Lazar, A. J., Flores, E. R., Tsai, K. Y., Zhou, J. H., Rustgi, A. K., Drill, E., Shen, R., Wong, C. K., Stuart, J. M., Laird, P. W., Hoadley, K. A., Weinstein, J. N., Peto, M., Pickering, C. R., Chen, Z., Van Waes, C., Canc Genome Atlas Res Network 2018; 23 (1): 194-+


    This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing molecular features of squamous cell carcinomas (SCCs) from five sites associated with smoking and/or human papillomavirus (HPV). SCCs harbor 3q, 5p, and other recurrent chromosomal copy-number alterations (CNAs), DNA mutations, and/or aberrant methylation of genes and microRNAs, which are correlated with the expression of multi-gene programs linked to squamous cell stemness, epithelial-to-mesenchymal differentiation, growth, genomic integrity, oxidative damage, death, and inflammation. Low-CNA SCCs tended to be HPV(+) and display hypermethylation with repression of TET1 demethylase and FANCF, previously linked to predisposition to SCC, or harbor mutations affecting CASP8, RAS-MAPK pathways, chromatin modifiers, and immunoregulatory molecules. We uncovered hypomethylation of the alternative promoter that drives expression of the ΔNp63 oncogene and embedded miR944. Co-expression of immune checkpoint, T-regulatory, and Myeloid suppressor cells signatures may explain reduced efficacy of immune therapy. These findings support possibilities for molecular classification and therapeutic approaches.

    View details for PubMedID 29617660

  • Module Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response. EBioMedicine Champion, M., Brennan, K., Croonenborghs, T., Gentles, A. J., Pochet, N., Gevaert, O. 2018; 27: 156–66


    The availability of increasing volumes of multi-omics profiles across many cancers promises to improve our understanding of the regulatory mechanisms underlying cancer. The main challenge is to integrate these multiple levels of omics profiles and especially to analyze them across many cancers. Here we present AMARETTO, an algorithm that addresses both challenges in three steps. First, AMARETTO identifies potential cancer driver genes through integration of copy number, DNA methylation and gene expression data. Then AMARETTO connects these driver genes with co-expressed target genes that they control, defined as regulatory modules. Thirdly, we connect AMARETTO modules identified from different cancer sites into a pancancer network to identify cancer driver genes. Here we applied AMARETTO in a pancancer study comprising eleven cancer sites and confirmed that AMARETTO captures hallmarks of cancer. We also demonstrated that AMARETTO enables the identification of novel pancancer driver genes. In particular, our analysis led to the identification of pancancer driver genes of smoking-induced cancers and 'antiviral' interferon-modulated innate immune response.AMARETTO is available as an R package at

    View details for DOI 10.1016/j.ebiom.2017.11.028

    View details for PubMedID 29331675

  • Identification of an atypical etiological head and neck squamous carcinoma subtype featuring the CpG island methylator phenotype. EBioMedicine Brennan, K., Koenig, J. L., Gentles, A. J., Sunwoo, J. B., Gevaert, O. 2017; 17: 223-236


    Head and neck squamous cell carcinoma (HNSCC) is broadly classified into HNSCC associated with human papilloma virus (HPV) infection, and HPV negative HNSCC, which is typically smoking-related. A subset of HPV negative HNSCCs occur in patients without smoking history, however, and these etiologically 'atypical' HNSCCs disproportionately occur in the oral cavity, and in female patients, suggesting a distinct etiology. To investigate the determinants of clinical and molecular heterogeneity, we performed unsupervised clustering to classify 528 HNSCC patients from The Cancer Genome Atlas (TCGA) into putative intrinsic subtypes based on their profiles of epigenetically (DNA methylation) deregulated genes. HNSCCs clustered into five subtypes, including one HPV positive subtype, two smoking-related subtypes, and two atypical subtypes. One atypical subtype was particularly genomically stable, but featured widespread gene silencing associated with the 'CpG island methylator phenotype' (CIMP). Further distinguishing features of this 'CIMP-Atypical' subtype include an antiviral gene expression profile associated with pro-inflammatory M1 macrophages and CD8+ T cell infiltration, CASP8 mutations, and a well-differentiated state corresponding to normal SOX2 copy number and SOX2OT hypermethylation. We developed a gene expression classifier for the CIMP-Atypical subtype that could classify atypical disease features in two independent patient cohorts, demonstrating the reproducibility of this subtype. Taken together, these findings provide unprecedented evidence that atypical HNSCC is molecularly distinct, and postulates the CIMP-Atypical subtype as a distinct clinical entity that may be caused by chronic inflammation.

    View details for DOI 10.1016/j.ebiom.2017.02.025

    View details for PubMedID 28314692

    View details for PubMedCentralID PMC5360591

  • Intestinal Enteroendocrine Lineage Cells Possess Homeostatic and Injury-Inducible Stem Cell Activity. Cell stem cell Yan, K. S., Gevaert, O., Zheng, G. X., Anchang, B., Probert, C. S., Larkin, K. A., Davies, P. S., Cheng, Z. F., Kaddis, J. S., Han, A., Roelf, K., Calderon, R. I., Cynn, E., Hu, X., Mandleywala, K., Wilhelmy, J., Grimes, S. M., Corney, D. C., Boutet, S. C., Terry, J. M., Belgrader, P., Ziraldo, S. B., Mikkelsen, T. S., Wang, F., von Furstenberg, R. J., Smith, N. R., Chandrakesan, P., May, R., Chrissy, M. A., Jain, R., Cartwright, C. A., Niland, J. C., Hong, Y. K., Carrington, J., Breault, D. T., Epstein, J., Houchen, C. W., Lynch, J. P., Martin, M. G., Plevritis, S. K., Curtis, C., Ji, H. P., Li, L., Henning, S. J., Wong, M. H., Kuo, C. J. 2017; 21 (1): 78–90.e6


    Several cell populations have been reported to possess intestinal stem cell (ISC) activity during homeostasis and injury-induced regeneration. Here, we explored inter-relationships between putative mouse ISC populations by comparative RNA-sequencing (RNA-seq). The transcriptomes of multiple cycling ISC populations closely resembled Lgr5+ISCs, the most well-defined ISC pool, but Bmi1-GFP+cells were distinct and enriched for enteroendocrine (EE) markers, including Prox1. Prox1-GFP+cells exhibited sustained clonogenic growth in vitro, and lineage-tracing of Prox1+cells revealed long-lived clones during homeostasis and after radiation-induced injury in vivo. Single-cell mRNA-seq revealed two subsets of Prox1-GFP+cells, one of which resembled mature EE cells while the other displayed low-level EE gene expression but co-expressed tuft cell markers, Lgr5 and Ascl2, reminiscent of label-retaining secretory progenitors. Our data suggest that the EE lineage, including mature EE cells, comprises a reservoir of homeostatic and injury-inducible ISCs, extending our understanding of cellular plasticity and stemness.

    View details for PubMedID 28686870

    View details for PubMedCentralID PMC5642297

  • Noninvasive radiomics signature based on quantitative analysis of computed tomography images as a surrogate for microvascular invasion in hepatocellular carcinoma: a pilot study. Journal of medical imaging (Bellingham, Wash.) Bakr, S., Echegaray, S., Shah, R., Kamaya, A., Louie, J., Napel, S., Kothary, N., Gevaert, O. 2017; 4 (4): 041303


    We explore noninvasive biomarkers of microvascular invasion (mVI) in patients with hepatocellular carcinoma (HCC) using quantitative and semantic image features extracted from contrast-enhanced, triphasic computed tomography (CT). Under institutional review board approval, we selected 28 treatment-naive HCC patients who underwent surgical resection. Four radiologists independently selected and delineated tumor margins on three axial CT images and extracted computational features capturing tumor shape, image intensities, and texture. We also computed two types of "delta features," defined as the absolute difference and the ratio computed from all pairs of imaging phases for each feature. 717 arterial, portal-venous, delayed single-phase, and delta-phase features were robust against interreader variability ([Formula: see text]). An enhanced cross-validation analysis showed that combining robust single-phase and delta features in the arterial and venous phases identified mVI (AUC [Formula: see text]). Compared to a previously reported semantic feature signature (AUC 0.47 to 0.58), these features in our cohort showed only slight to moderate agreement (Cohen's kappa range: 0.03 to 0.59). Though preliminary, quantitative analysis of image features in arterial and venous phases may be potential surrogate biomarkers for mVI in HCC. Further study in a larger cohort is warranted.

    View details for DOI 10.1117/1.JMI.4.4.041303

    View details for PubMedID 28840174

    View details for PubMedCentralID PMC5565686

  • Magnetic resonance image features identify glioblastoma phenotypic subtypes with distinct molecular pathway activities. Science translational medicine Itakura, H., Achrol, A. S., Mitchell, L. A., Loya, J. J., Liu, T., Westbroek, E. M., Feroze, A. H., Rodriguez, S., Echegaray, S., Azad, T. D., Yeom, K. W., Napel, S., Rubin, D. L., Chang, S. D., Harsh, G. R., Gevaert, O. 2015; 7 (303): 303ra138-?


    Glioblastoma (GBM) is the most common and highly lethal primary malignant brain tumor in adults. There is a dire need for easily accessible, noninvasive biomarkers that can delineate underlying molecular activities and predict response to therapy. To this end, we sought to identify subtypes of GBM, differentiated solely by quantitative magnetic resonance (MR) imaging features, that could be used for better management of GBM patients. Quantitative image features capturing the shape, texture, and edge sharpness of each lesion were extracted from MR images of 121 single-institution patients with de novo, solitary, unilateral GBM. Three distinct phenotypic "clusters" emerged in the development cohort using consensus clustering with 10,000 iterations on these image features. These three clusters--pre-multifocal, spherical, and rim-enhancing, names reflecting their image features--were validated in an independent cohort consisting of 144 multi-institution patients with similar tumor characteristics from The Cancer Genome Atlas (TCGA). Each cluster mapped to a unique set of molecular signaling pathways using pathway activity estimates derived from the analysis of TCGA tumor copy number and gene expression data with the PARADIGM (Pathway Recognition Algorithm Using Data Integration on Genomic Models) algorithm. Distinct pathways, such as c-Kit and FOXA, were enriched in each cluster, indicating differential molecular activities as determined by the image features. Each cluster also demonstrated differential probabilities of survival, indicating prognostic importance. Our imaging method offers a noninvasive approach to stratify GBM patients and also provides unique sets of molecular signatures to inform targeted therapy and personalized treatment of GBM.

    View details for DOI 10.1126/scitranslmed.aaa7582

    View details for PubMedID 26333934

  • MethylMix: an R package for identifying DNA methylation-driven genes BIOINFORMATICS Gevaert, O. 2015; 31 (11): 1839-1841


    DNA methylation is an important mechanism regulating gene transcription, and its role in carcinogenesis has been extensively studied. Hyper and hypomethylation of genes is an alternative mechanism to deregulate gene expression in a wide range of diseases. At the same time, high-throughput DNA methylation assays have been developed generating vast amounts of genome wide DNA methylation measurements. Yet, few tools exist that can formally identify hypo and hypermethylated genes that are predictive of transcription and thus functionally relevant for a particular disease. To accommodate this lack of tools, we developed MethylMix, an algorithm implemented in R to identify disease specific hyper and hypomethylated genes. MethylMix is based on a beta mixture model to identify methylation states and compares them with the normal DNA methylation state. MethylMix introduces a novel metric, the 'Differential Methylation value' or DM-value defined as the difference of a methylation state with the normal methylation state. Finally, matched gene expression data are used to identify, besides differential, transcriptionally predictive methylation states by focusing on methylation changes that effect gene expression.MethylMix was implemented as an R package and is available in bioconductor.

    View details for DOI 10.1093/bioinformatics/btv020

    View details for Web of Science ID 000356625300021

    View details for PubMedID 25609794

  • Pancancer analysis of DNA methylation-driven genes using MethylMix GENOME BIOLOGY Gevaert, O., Tibshirani, R., Plevritis, S. K. 2015; 16


    Aberrant DNA methylation is an important mechanism that contributes to oncogenesis. Yet, few algorithms exist that exploit this vast dataset to identify hypo- and hypermethylated genes in cancer. We developed a novel computational algorithm called MethylMix to identify differentially methylated genes that are also predictive of transcription. We apply MethylMix to 12 individual cancer sites, and additionally combine all cancer sites in a pancancer analysis. We discover pancancer hypo- and hypermethylated genes and identify novel methylation-driven subgroups with clinical implications. MethylMix analysis on combined cancer sites reveals 10 pancancer clusters reflecting new similarities across malignantly transformed tissues.

    View details for DOI 10.1186/s13059-014-0579-8

    View details for Web of Science ID 000351817300001

    View details for PubMedID 25631659

    View details for PubMedCentralID PMC4365533

  • CaMoDi: a new method for cancer module discovery BMC GENOMICS Manolakos, A., Ochoa, I., Venkat, K., Goldsmith, A. J., Gevaert, O. 2014; 15


    Identification of genomic patterns in tumors is an important problem, which would enable the community to understand and extend effective therapies across the current tissue-based tumor boundaries. With this in mind, in this work we develop a robust and fast algorithm to discover cancer driver genes using an unsupervised clustering of similarly expressed genes across cancer patients. Specifically, we introduce CaMoDi, a new method for module discovery which demonstrates superior performance across a number of computational and statistical metrics.The proposed algorithm CaMoDi demonstrates effective statistical performance compared to the state of the art, and is algorithmically simple and scalable - which makes it suitable for tissue-independent genomic characterization of individual tumors as well as groups of tumors. We perform an extensive comparative study between CaMoDi and two previously developed methods (CONEXIC and AMARETTO), across 11 individual tumors and 8 combinations of tumors from The Cancer Genome Atlas. We demonstrate that CaMoDi is able to discover modules with better average consistency and homogeneity, with similar or better adjusted R2 performance compared to CONEXIC and AMARETTO.We present a novel method for Cancer Module Discovery, CaMoDi, and demonstrate through extensive simulations on the TCGA Pan-Cancer dataset that it achieves comparable or better performance than that of CONEXIC and AMARETTO, while achieving an order-of-magnitude improvement in computational run time compared to the other methods.

    View details for DOI 10.1186/1471-2164-15-S10-S8

    View details for Web of Science ID 000346166900008

    View details for PubMedID 25560933

  • Glioblastoma Multiforme: Exploratory Radiogenomic Analysis by Using Quantitative Image Features RADIOLOGY Gevaert, O., Mitchell, L. A., Achrol, A. S., Xu, J., Echegaray, S., Steinberg, G. K., Cheshier, S. H., Napel, S., Zaharchuk, G., Plevritis, S. K. 2014; 273 (1): 168-174


    To derive quantitative image features from magnetic resonance (MR) images that characterize the radiographic phenotype of glioblastoma multiforme (GBM) lesions and to create radiogenomic maps associating these features with various molecular data.Clinical, molecular, and MR imaging data for GBMs in 55 patients were obtained from the Cancer Genome Atlas and the Cancer Imaging Archive after local ethics committee and institutional review board approval. Regions of interest (ROIs) corresponding to enhancing necrotic portions of tumor and peritumoral edema were drawn, and quantitative image features were derived from these ROIs. Robust quantitative image features were defined on the basis of an intraclass correlation coefficient of 0.6 for a digital algorithmic modification and a test-retest analysis. The robust features were visualized by using hierarchic clustering and were correlated with survival by using Cox proportional hazards modeling. Next, these robust image features were correlated with manual radiologist annotations from the Visually Accessible Rembrandt Images (VASARI) feature set and GBM molecular subgroups by using nonparametric statistical tests. A bioinformatic algorithm was used to create gene expression modules, defined as a set of coexpressed genes together with a multivariate model of cancer driver genes predictive of the module's expression pattern. Modules were correlated with robust image features by using the Spearman correlation test to create radiogenomic maps and to link robust image features with molecular pathways.Eighteen image features passed the robustness analysis and were further analyzed for the three types of ROIs, for a total of 54 image features. Three enhancement features were significantly correlated with survival, 77 significant correlations were found between robust quantitative features and the VASARI feature set, and seven image features were correlated with molecular subgroups (P < .05 for all). A radiogenomics map was created to link image features with gene expression modules and allowed linkage of 56% (30 of 54) of the image features with biologic processes.Radiogenomic approaches in GBM have the potential to predict clinical and molecular characteristics of tumors noninvasively. Online supplemental material is available for this article.

    View details for DOI 10.1148/radiol.14131731

    View details for Web of Science ID 000344232100019

    View details for PubMedCentralID PMC4263772

  • Oncogenic transformation of diverse gastrointestinal tissues in primary organoid culture NATURE MEDICINE Li, X., Nadauld, L., Ootani, A., Corney, D. C., Pai, R. K., Gevaert, O., Cantrell, M. A., Rack, P. G., Neal, J. T., Chan, C. W., Yeung, T., Gong, X., Yuan, J., Wilhelmy, J., Robine, S., Attardi, L. D., Plevritis, S. K., Hung, K. E., Chen, C., Ji, H. P., Kuo, C. J. 2014; 20 (7): 769-777


    The application of primary organoid cultures containing epithelial and mesenchymal elements to cancer modeling holds promise for combining the accurate multilineage differentiation and physiology of in vivo systems with the facile in vitro manipulation of transformed cell lines. Here we used a single air-liquid interface culture method without modification to engineer oncogenic mutations into primary epithelial and mesenchymal organoids from mouse colon, stomach and pancreas. Pancreatic and gastric organoids exhibited dysplasia as a result of expression of Kras carrying the G12D mutation (Kras(G12D)), p53 loss or both and readily generated adenocarcinoma after in vivo transplantation. In contrast, primary colon organoids required combinatorial Apc, p53, Kras(G12D) and Smad4 mutations for progressive transformation to invasive adenocarcinoma-like histology in vitro and tumorigenicity in vivo, recapitulating multi-hit models of colorectal cancer (CRC), as compared to the more promiscuous transformation of small intestinal organoids. Colon organoid culture functionally validated the microRNA miR-483 as a dominant driver oncogene at the IGF2 (insulin-like growth factor-2) 11p15.5 CRC amplicon, inducing dysplasia in vitro and tumorigenicity in vivo. These studies demonstrate the general utility of a highly tractable primary organoid system for cancer modeling and driver oncogene validation in diverse gastrointestinal tissues.

    View details for DOI 10.1038/nm.3585

    View details for Web of Science ID 000338689500021

  • Identification of ovarian cancer driver genes by using module network integration of multi-omics data INTERFACE FOCUS Gevaert, O., Villalobos, V., Sikic, B. I., Plevritis, S. K. 2013; 3 (4)
  • Identifying master regulators of cancer and their downstream targets by integrating genomic and epigenomic features. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Gevaert, O., Plevritis, S. 2013: 123-134


    Vast amounts of molecular data characterizing the genome, epigenome and transcriptome are becoming available for a variety of cancers. The current challenge is to integrate these diverse layers of molecular biology information to create a more comprehensive view of key biological processes underlying cancer. We developed a biocomputational algorithm that integrates copy number, DNA methylation, and gene expression data to study master regulators of cancer and identify their targets. Our algorithm starts by generating a list of candidate driver genes based on the rationale that genes that are driven by multiple genomic events in a subset of samples are unlikely to be randomly deregulated. We then select the master regulators from the candidate driver and identify their targets by inferring the underlying regulatory network of gene expression. We applied our biocomputational algorithm to identify master regulators and their targets in glioblastoma multiforme (GBM) and serous ovarian cancer. Our results suggest that the expression of candidate drivers is more likely to be influenced by copy number variations than DNA methylation. Next, we selected the master regulators and identified their downstream targets using module networks analysis. As a proof-of-concept, we show that the GBM and ovarian cancer module networks recapitulate known processes in these cancers. In addition, we identify master regulators that have not been previously reported and suggest their likely role. In summary, focusing on genes whose expression can be explained by their genomic and epigenomic aberrations is a promising strategy to identify master regulators of cancer.

    View details for PubMedID 23424118

  • Prognostic PET F-18-FDG Uptake Imaging Features Are Associated with Major Oncogenomic Alterations in Patients with Resected Non-Small Cell Lung Cancer CANCER RESEARCH Nair, V. S., Gevaert, O., Davidzon, G., Napel, S., Graves, E. E., Hoang, C. D., Shrager, J. B., Quon, A., Rubin, D. L., Plevritis, S. K. 2012; 72 (15): 3725-3734


    Although 2[18F]fluoro-2-deoxy-d-glucose (FDG) uptake during positron emission tomography (PET) predicts post-surgical outcome in patients with non-small cell lung cancer (NSCLC), the biologic basis for this observation is not fully understood. Here, we analyzed 25 tumors from patients with NSCLCs to identify tumor PET-FDG uptake features associated with gene expression signatures and survival. Fourteen quantitative PET imaging features describing FDG uptake were correlated with gene expression for single genes and coexpressed gene clusters (metagenes). For each FDG uptake feature, an associated metagene signature was derived, and a prognostic model was identified in an external cohort and then tested in a validation cohort of patients with NSCLC. Four of eight single genes associated with FDG uptake (LY6E, RNF149, MCM6, and FAP) were also associated with survival. The most prognostic metagene signature was associated with a multivariate FDG uptake feature [maximum standard uptake value (SUV(max)), SUV(variance), and SUV(PCA2)], each highly associated with survival in the external [HR, 5.87; confidence interval (CI), 2.49-13.8] and validation (HR, 6.12; CI, 1.08-34.8) cohorts, respectively. Cell-cycle, proliferation, death, and self-recognition pathways were altered in this radiogenomic profile. Together, our findings suggest that leveraging tumor genomics with an expanded collection of PET-FDG imaging features may enhance our understanding of FDG uptake as an imaging biomarker beyond its association with glycolysis.

    View details for DOI 10.1158/0008-5472.CAN-11-3943

    View details for Web of Science ID 000307354100004

    View details for PubMedID 22710433

    View details for PubMedCentralID PMC3596510

  • Non-Small Cell Lung Cancer: Identifying Prognostic Imaging Biomarkers by Leveraging Public Gene Expression Microarray Data-Methods and Preliminary Results RADIOLOGY Gevaert, O., Xu, J., Hoang, C. D., Leung, A. N., Xu, Y., Quon, A., Rubin, D. L., Napel, S., Plevritis, S. K. 2012; 264 (2): 387-396


    To identify prognostic imaging biomarkers in non-small cell lung cancer (NSCLC) by means of a radiogenomics strategy that integrates gene expression and medical images in patients for whom survival outcomes are not available by leveraging survival data in public gene expression data sets.A radiogenomics strategy for associating image features with clusters of coexpressed genes (metagenes) was defined. First, a radiogenomics correlation map is created for a pairwise association between image features and metagenes. Next, predictive models of metagenes are built in terms of image features by using sparse linear regression. Similarly, predictive models of image features are built in terms of metagenes. Finally, the prognostic significance of the predicted image features are evaluated in a public gene expression data set with survival outcomes. This radiogenomics strategy was applied to a cohort of 26 patients with NSCLC for whom gene expression and 180 image features from computed tomography (CT) and positron emission tomography (PET)/CT were available.There were 243 statistically significant pairwise correlations between image features and metagenes of NSCLC. Metagenes were predicted in terms of image features with an accuracy of 59%-83%. One hundred fourteen of 180 CT image features and the PET standardized uptake value were predicted in terms of metagenes with an accuracy of 65%-86%. When the predicted image features were mapped to a public gene expression data set with survival outcomes, tumor size, edge shape, and sharpness ranked highest for prognostic significance.This radiogenomics strategy for identifying imaging biomarkers may enable a more rapid evaluation of novel imaging modalities, thereby accelerating their translation to personalized medicine.

    View details for DOI 10.1148/radiol.12111607

    View details for Web of Science ID 000306660000010

    View details for PubMedID 22723499

    View details for PubMedCentralID PMC3401348

  • A Seven-Gene Set Associated with Chronic Hypoxia of Prognostic Importance in Hepatocellular Carcinoma CLINICAL CANCER RESEARCH Van Malenstein, H., Gevaert, O., Libbrecht, L., Daemen, A., Allemeersch, J., Nevens, F., Van Cutsem, E., Cassiman, D., De Moor, B., Verslype, C., van Pelt, J. 2010; 16 (16): 4278-4288


    Hepatocellular carcinomas (HCC) have an unpredictable clinical course, and molecular classification could provide better insights into prognosis and patient-directed therapy. We hypothesized that in HCC, certain microenvironmental regions exist with a characteristic gene expression related to chronic hypoxia which would induce aggressive behavior.We determined the gene expression pattern for human HepG2 liver cells under chronic hypoxia by microarray analysis. Differentially expressed genes were selected and their clinical values were assessed. In our hypothesis-driven analysis, we included available independent microarray studies of patients with HCC in one single analysis. Three microarray studies encompassing 272 patients were used as training sets to determine a minimal prognostic gene set, and one recent study of 91 patients was used for validation.Using computational methods, we identified seven genes (out of 3,592 differentially expressed under chronic hypoxia) that showed correlation with poor prognostic indicators in all three training sets (65/139/73 patients) and this was validated in a fourth data set (91 patients). Retrospectively, the seven-gene set was associated with poor survival (hazard ratio, 1.39; P = 0.007) and early recurrence (hazard ratio, 2.92; P = 0.007) in 135 patients. Moreover, using a hypoxia score based on this seven-gene set, we found that patients with a score of >0.35 (n = 42) had a median survival of 307 days, whereas patients with a score of < or =0.35 (n = 93) had a median survival of 1,602 days (P = 0.005).We identified a unique, liver-specific, seven-gene signature associated with chronic hypoxia that correlates with poor prognosis in HCCs.

    View details for DOI 10.1158/1078-0432.CCR-09-3274

    View details for Web of Science ID 000280830300024

    View details for PubMedID 20592013

  • Intrinsic Gene Expression Profiles of Gliomas Are a Better Predictor of Survival than Histology CANCER RESEARCH Gravendeel, L. A., Kouwenhoven, M. C., Gevaert, O., de Rooi, J. J., Stubbs, A. P., Duijm, J. E., Daemen, A., Bleeker, F. E., Bralten, L. B., Kloosterhof, N. K., De Moor, B., Eilers, P. H., van der Spek, P. J., Kros, J. M., Smitt, P. A., van den Bent, M. J., French, P. J. 2009; 69 (23): 9065-9072


    Gliomas are the most common primary brain tumors with heterogeneous morphology and variable prognosis. Treatment decisions in patients rely mainly on histologic classification and clinical parameters. However, differences between histologic subclasses and grades are subtle, and classifying gliomas is subject to a large interobserver variability. To improve current classification standards, we have performed gene expression profiling on a large cohort of glioma samples of all histologic subtypes and grades. We identified seven distinct molecular subgroups that correlate with survival. These include two favorable prognostic subgroups (median survival, >4.7 years), two with intermediate prognosis (median survival, 1-4 years), two with poor prognosis (median survival, <1 year), and one control group. The intrinsic molecular subtypes of glioma are different from histologic subgroups and correlate better to patient survival. The prognostic value of molecular subgroups was validated on five independent sample cohorts (The Cancer Genome Atlas, Repository for Molecular Brain Neoplasia Data, GSE12907, GSE4271, and Li and colleagues). The power of intrinsic subtyping is shown by its ability to identify a subset of prognostically favorable tumors within an external data set that contains only histologically confirmed glioblastomas (GBM). Specific genetic changes (epidermal growth factor receptor amplification, IDH1 mutation, and 1p/19q loss of heterozygosity) segregate in distinct molecular subgroups. We identified a subgroup with molecular features associated with secondary GBM, suggesting that different genetic changes drive gene expression profiles. Finally, we assessed response to treatment in molecular subgroups. Our data provide compelling evidence that expression profiling is a more accurate and objective method to classify gliomas than histologic classification. Molecular classification therefore may aid diagnosis and can guide clinical decision making.

    View details for DOI 10.1158/0008-5472.CAN-09-2307

    View details for Web of Science ID 000272362800029

    View details for PubMedID 19920198

  • Recurrent Copy Number Alterations in BRCA1-Mutated Ovarian Tumors Alter Biological Pathways HUMAN MUTATION Leunen, K., Gevaert, O., Daemen, A., Vanspauwen, V., Michils, G., De Moor, B., Moerman, P., Vergote, I., Legius, E. 2009; 30 (12): 1693-1702


    Array CGH was used to identify recurrent copy number alterations (RCNA) characteristic of either BRCA1-related or sporadic ovarian cancer. After preprocessing, both groups of patients were modeled using a recurrent Hidden Markov Model to detect RCNA. RCNA with a probability higher than 80% were called. After removing RCNA present in both groups, the genes present in the remaining RCNA were investigated for enrichment of pathways from external databases. More RCNA were observed in the BRCA1 group, and they display more losses than gains compared to the sporadic group. When focusing on the type of RCNA, no significant difference in length was seen for the gains, but there was a statistically significant difference for the losses. In the sporadic group, a great proportion of the altered regions contain genes known to have a function in cell adhesion and complement activation, whereas the BRCA1 samples are characterized by alterations in the HOX genes, metalloproteinases, tumor suppressor genes, and the estrogen-signaling pathways. We conclude that BRCA1 ovarian tumors present a different type, number, and length of RCNA; a huge amount of the genome is lost, resulting in important genomic instability. Moreover, important biological pathways are altered differentially when compared to the sporadic group.

    View details for DOI 10.1002/humu.21135

    View details for Web of Science ID 000272796400011

    View details for PubMedID 19802895

  • Combined Analysis of Metabolomes, Proteomes, and Transcriptomes of HCV-infected Cells and Liver to Identify Pathways Associated With Disease Development. Gastroenterology Lupberger, J., Croonenborghs, T., Roca Suarez, A. A., Van Renne, N., Juhling, F., Oudot, M. A., Virzi, A., Bandiera, S., Jamey, C., Meszaros, G., Brumaru, D., Mukherji, A., Durand, S. C., Heydmann, L., Verrier, E. R., El Saghire, H., Hamdane, N., Bartenschlager, R., Fereshetian, S., Ramberger, E., Sinha, R., Nabian, M., Everaert, C., Jovanovic, M., Mertins, P., Carr, S. A., Chayama, K., Dali-Youcef, N., Ricci, R., Bardeesy, N. M., Fujiwara, N., Gevaert, O., Zeisel, M. B., Hoshida, Y., Pochet, N., Baumert, T. F. 2019


    BACKGROUND & AIMS: The mechanisms of hepatitis C virus (HCV) infection, liver disease progression, and hepatocarcinogenesis are only partially understood. We performed genomic, proteomic, and metabolomic analyses of HCV-infected cells and chimeric mice to learn more about these processes.METHODS: Huh7.5.1dif (hepatocyte-like cells) were infected with culture-derived HCV and used in RNA-Seq, proteomic, metabolomic, and integrative genomic analyses. uPA/SCID mice were injected with serum from HCV-infected patients; 8 weeks later, liver tissues were collected and analyzed by RNA-seq and proteomics. Using differential expression, gene set enrichment analyses, and protein interaction mapping, we identified pathways that changed in response to HCV infection. We validated our findings in studies of liver tissues from 216 patients with HCV infection and early-stage cirrhosis and paired biopsies from 99 patients with hepatocellular carcinoma, including 17 patients with histologic features of steatohepatitis. Cirrhotic liver tissues from patients with HCV infection were classified into 2 groups based on relative peroxisome function; outcomes assessed included Child-Pugh class, development of hepatocellular carcinoma, survival and steatohepatitis. Hepatocellular carcinomas were classified according to steatohepatitis; the outcome was relative peroxisomal function.RESULTS: We quantified 21,950 mRNAs and 8297 proteins in HCV-infected cells. Upon HCV infection of hepatocyte-like cells and chimeric mice, we observed significant changes in levels of mRNAs and proteins involved in metabolism and hepatocarcinogenesis. HCV infection of hepatocyte-like cells significantly increased levels of mRNAs, but not proteins, that regulate the innate immune response-we believe this was due to the inhibition of translation in these cells. HCV infection of hepatocyte-like cells increased glucose consumption and metabolism and the STAT3 signaling pathway and reduced peroxisome function. Peroxisomes mediate beta-oxidation of very long-chain fatty acids (VLCFAs); we found intracellular accumulation of VLCFAs in HCV-infected cells, which is also observed in patients with fatty liver disease. Cells in livers from HCV-infected mice had significant reductions in levels of mRNAs and proteins associated with peroxisome function, indication perturbation of peroxisomes. We associated defects in peroxisome function with outcomes and features of HCV-associated cirrhosis, fatty liver disease, and hepatocellular carcinoma in patients.CONCLUSIONS: We performed combined transcriptome, proteome, and metabolome analyses of liver tissues from HCV-infected hepatocyte-like cells and HCV-infected mice. We found that HCV infection increases glucose metabolism and the STAT3 signaling pathway and thereby reduces peroxisome function; alterations in expression of peroxisome genes were associated with outcomes of patients with liver diseases. These findings provide insights into liver disease pathogenesis and might be used to identify new therapeutic targets.

    View details for PubMedID 30978357

  • Predicting EGFR Mutation Status in Lung Adenocarcinoma on CT Image Using Deep Learning. The European respiratory journal Wang, S., Shi, J., Ye, Z., Dong, D., Yu, D., Zhou, M., Liu, Y., Gevaert, O., Wang, K., Zhu, Y., Zhou, H., Liu, Z., Tian, J. 2019


    Epidermal Growth Factor Receptor (EGFR) genotyping is critical for treatment guideline such as the use of tyrosine kinase inhibitors in lung adenocarcinoma (LA). Conventional identification of EGFR genotype requires biopsy and sequence testing that is invasive and may suffer from the difficulty in accessing tissue samples. Here, we proposed a deep learning (DL) model to predict the EGFR mutation status in LA by non-invasive computed tomography (CT).We retrospectively collected 844 LA patients with preoperative CT image, EGFR mutation and clinical information from two hospitals. An end-to-end DL model was proposed to predict the EGFR mutation status by CT scanning.By training in 14926 CT images, the DL model achieved encouraging predictive performance in both the primary cohort (n=603; AUC=0.85, 95% CI 0.83-0.88) and the independent validation cohort (n=241; AUC=0.81, 95% CI 0.79-0.83), which showed significant improvement than previous studies using hand-crafted CT features or clinical characteristics (p<0.001). The deep learning score demonstrated significant difference in EGFR-mutant and EGFR-wild type tumours (p<0.001).Since CT is routinely used in lung cancer diagnosis, the DL model provides a non-invasive and easy-to-use method for EGFR mutation status prediction.

    View details for PubMedID 30635290

  • Non-invasive genotype prediction of chromosome 1p/19q co-deletion by development and validation of an MRI-based radiomics signature in lower-grade gliomas JOURNAL OF NEURO-ONCOLOGY Han, Y., Xie, Z., Zang, Y., Zhang, S., Gu, D., Zhou, M., Gevaert, O., Wei, J., Li, C., Chen, H., Du, J., Liu, Z., Dong, D., Tian, J., Zhou, D. 2018; 140 (2): 297–306


    To perform radiomics analysis for non-invasively predicting chromosome 1p/19q co-deletion in World Health Organization grade II and III (lower-grade) gliomas.This retrospective study included 277 patients histopathologically diagnosed with lower-grade glioma. Clinical parameters were recorded for each patient. We performed a radiomics analysis by extracting 647 MRI-based features and applied the random forest algorithm to generate a radiomics signature for predicting 1p/19q co-deletion in the training cohort (n = 184). The clinical model consisted of pertinent clinical factors, and was built using a logistic regression algorithm. A combined model, incorporating both the radiomics signature and related clinical factors, was also constructed. The receiver operating characteristics curve was used to evaluate the predictive performance. We further validated the predictability of the three developed models using a time-independent validation cohort (n = 93).The radiomics signature was constructed as an independent predictor for differentiating 1p/19q co-deletion genotypes, which demonstrated superior performance on both the training and validation cohorts with areas under curve (AUCs) of 0.887 and 0.760, respectively. These results outperformed the clinical model (AUCs of 0.580 and 0.627 on training and validation cohorts). The AUCs of the combined model were 0.885 and 0.753 on training and validation cohorts, respectively, which indicated that clinical factors did not present additional improvement for the prediction.Our study highlighted that an MRI-based radiomics signature can effectively identify the 1p/19q co-deletion in histopathologically diagnosed lower-grade gliomas, thereby offering the potential to facilitate non-invasive molecular subtype prediction of gliomas.

    View details for DOI 10.1007/s11060-018-2953-y

    View details for Web of Science ID 000450472400011

    View details for PubMedID 30097822

  • A radiogenomic dataset of non-small cell lung cancer. Scientific data Bakr, S., Gevaert, O., Echegaray, S., Ayers, K., Zhou, M., Shafiq, M., Zheng, H., Benson, J. A., Zhang, W., Leung, A. N., Kadoch, M., D Hoang, C., Shrager, J., Quon, A., Rubin, D. L., Plevritis, S. K., Napel, S. 2018; 5: 180202


    Medical image biomarkers of cancer promise improvements in patient care through advances in precision medicine. Compared to genomic biomarkers, image biomarkers provide the advantages of being non-invasive, and characterizing a heterogeneous tumor in its entirety, as opposed to limited tissue available via biopsy. We developed a unique radiogenomic dataset from a Non-Small Cell Lung Cancer (NSCLC) cohort of 211 subjects. The dataset comprises Computed Tomography (CT), Positron Emission Tomography (PET)/CT images, semantic annotations of the tumors as observed on the medical images using a controlled vocabulary, and segmentation maps of tumors in the CT scans. Imaging data are also paired with results of gene mutation analyses, gene expression microarrays and RNA sequencing data from samples of surgically excised tumor tissue, and clinical data, including survival outcomes. This dataset was created to facilitate the discovery of the underlying relationship between tumor molecular and medical image features, as well as the development and evaluation of prognostic medical image biomarkers.

    View details for PubMedID 30325352

  • Non-Small Cell Lung Cancer Radiogenomics Map Identifies Relationships between Molecular and Imaging Phenotypes with Prognostic Implications. Radiology Zhou, M., Leung, A., Echegaray, S., Gentles, A., Shrager, J. B., Jensen, K. C., Berry, G. J., Plevritis, S. K., Rubin, D. L., Napel, S., Gevaert, O. 2018; 286 (1): 307–15


    Purpose To create a radiogenomic map linking computed tomographic (CT) image features and gene expression profiles generated by RNA sequencing for patients with non-small cell lung cancer (NSCLC). Materials and Methods A cohort of 113 patients with NSCLC diagnosed between April 2008 and September 2014 who had preoperative CT data and tumor tissue available was studied. For each tumor, a thoracic radiologist recorded 87 semantic image features, selected to reflect radiologic characteristics of nodule shape, margin, texture, tumor environment, and overall lung characteristics. Next, total RNA was extracted from the tissue and analyzed with RNA sequencing technology. Ten highly coexpressed gene clusters, termed metagenes, were identified, validated in publicly available gene-expression cohorts, and correlated with prognosis. Next, a radiogenomics map was built that linked semantic image features to metagenes by using the t statistic and the Spearman correlation metric with multiple testing correction. Results RNA sequencing analysis resulted in 10 metagenes that capture a variety of molecular pathways, including the epidermal growth factor (EGF) pathway. A radiogenomic map was created with 32 statistically significant correlations between semantic image features and metagenes. For example, nodule attenuation and margins are associated with the late cell-cycle genes, and a metagene that represents the EGF pathway was significantly correlated with the presence of ground-glass opacity and irregular nodules or nodules with poorly defined margins. Conclusion Radiogenomic analysis of NSCLC showed multiple associations between semantic image features and metagenes that represented canonical molecular pathways, and it can result in noninvasive identification of molecular properties of NSCLC. Online supplemental material is available for this article.

    View details for DOI 10.1148/radiol.2017161845

    View details for PubMedID 28727543

    View details for PubMedCentralID PMC5749594

  • The ENGAGE study: Integrating neuroimaging, virtual reality and smartphone sensing to understand self-regulation for managing depression and obesity in a precision medicine model. Behaviour research and therapy Williams, L. M., Pines, A., Goldstein-Piekarski, A. N., Rosas, L. G., Kullar, M., Sacchet, M. D., Gevaert, O., Bailenson, J., Lavori, P. W., Dagum, P., Wandell, B., Correa, C., Greenleaf, W., Suppes, T., Perry, L. M., Smyth, J. M., Lewis, M. A., Venditti, E. M., Snowden, M., Simmons, J. M., Ma, J. 2018; 101: 58–70


    Precision medicine models for personalizing achieving sustained behavior change are largely outside of current clinical practice. Yet, changing self-regulatory behaviors is fundamental to the self-management of complex lifestyle-related chronic conditions such as depression and obesity - two top contributors to the global burden of disease and disability. To optimize treatments and address these burdens, behavior change and self-regulation must be better understood in relation to their neurobiological underpinnings. Here, we present the conceptual framework and protocol for a novel study, "Engaging self-regulation targets to understand the mechanisms of behavior change and improve mood and weight outcomes (ENGAGE)". The ENGAGE study integrates neuroscience with behavioral science to better understand the self-regulation related mechanisms of behavior change for improving mood and weight outcomes among adults with comorbid depression and obesity. We collect assays of three self-regulation targets (emotion, cognition, and self-reflection) in multiple settings: neuroimaging and behavioral lab-based measures, virtual reality, and passive smartphone sampling. By connecting human neuroscience and behavioral science in this manner within the ENGAGE study, we develop a prototype for elucidating the underlying self-regulation mechanisms of behavior change outcomes and their application in optimizing intervention strategies for multiple chronic diseases.

    View details for DOI 10.1016/j.brat.2017.09.012

    View details for PubMedID 29074231

  • Development and validation of an MRI-based model to predict response to chemoradiotherapy for rectal cancer. Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology Bulens, P., Couwenberg, A., Haustermans, K., Debucquoy, A., Vandecaveye, V., Philippens, M., Zhou, M., Gevaert, O., Intven, M. 2018; 126 (3): 437–42


    To safely implement organ preserving treatment strategies for patients with rectal cancer, well-considered selection of patients with favourable response is needed. In this study, we develop and validate an MRI-based response predicting model.A multivariate model using T2-volumetric and DWI parameters before and 6 weeks after chemoradiation (CRT) was developed using a cohort of 85 rectal cancer patients and validated in an external cohort of 55 patients that underwent preoperative CRT.Twenty-two patients (26%) achieved ypT0-1N0 response in the development cohort versus 13 patients (24%) in the validation cohort. Two T2-volumetric parameters (ΔVolume% and Sphere_post) and two DWI parameters (ADC_avg_post and ADCratio_avg) were retained in a model predicting (near-)complete response (ypT0-1N0). In the development cohort, this model had a good predictive performance (AUC = 0.89; 95% CI 0.80-0.98). Validation of the model in an external cohort resulted in a similar performance (AUC = 0.88 95% CI 0.79-0.98).An MRI-based prediction model of (near-)complete pathological response following CRT in rectal cancer patients, shows a high predictive performance in an external validation cohort. The clinically relevant features in the model make it an interesting tool for implementation of organ-preserving strategies in rectal cancer.

    View details for DOI 10.1016/j.radonc.2018.01.008

    View details for PubMedID 29395287

  • Prediction of EGFR and KRAS mutation in non-small cell lung cancer using quantitative 18F FDG-PET/CT metrics. Oncotarget Minamimoto, R., Jamali, M., Gevaert, O., Echegaray, S., Khuong, A., Hoang, C. D., Shrager, J. B., Plevritis, S. K., Rubin, D. L., Leung, A. N., Napel, S., Quon, A. 2017


    This study investigated the relationship between epidermal growth factor receptor (EGFR) and Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations in non-small-cell lung cancer (NSCLC) and quantitative FDG-PET/CT parameters including tumor heterogeneity. 131 patients with NSCLC underwent staging FDG-PET/CT followed by tumor resection and histopathological analysis that included testing for the EGFR and KRAS gene mutations. Patient and lesion characteristics, including smoking habits and FDG uptake parameters, were correlated to each gene mutation. Never-smoker (P < 0.001) or low pack-year smoking history (p = 0.002) and female gender (p = 0.047) were predictive factors for the presence of the EGFR mutations. Being a current or former smoker was a predictive factor for the KRAS mutations (p = 0.018). The maximum standardized uptake value (SUVmax) of FDG uptake in lung lesions was a predictive factor of the EGFR mutations (p = 0.029), while metabolic tumor volume and total lesion glycolysis were not predictive. Amongst several tumor heterogeneity metrics included in our analysis, inverse coefficient of variation (1/COV) was a predictive factor (p < 0.02) of EGFR mutations status, independent of metabolic tumor diameter. Multivariate analysis showed that being a never-smoker was the most significant factor (p < 0.001) for the EGFR mutations in lung cancer overall. The tumor heterogeneity metric 1/COV and SUVmax were both predictive for the EGFR mutations in NSCLC in a univariate analysis. Overall, smoking status was the most significant factor for the presence of the EGFR and KRAS mutations in lung cancer.

    View details for DOI 10.18632/oncotarget.17782

    View details for PubMedID 28538213

  • Predictive radiogenomics modeling of EGFR mutation status in lung cancer SCIENTIFIC REPORTS Gevaert, O., Echegaray, S., Khuong, A., Hoang, C. D., Shrager, J. B., Jensen, K. C., Berry, G. J., Guo, H. H., Lau, C., Plevritis, S. K., Rubin, D. L., Napel, S., Leung, A. N. 2017; 7


    Molecular analysis of the mutation status for EGFR and KRAS are now routine in the management of non-small cell lung cancer. Radiogenomics, the linking of medical images with the genomic properties of human tumors, provides exciting opportunities for non-invasive diagnostics and prognostics. We investigated whether EGFR and KRAS mutation status can be predicted using imaging data. To accomplish this, we studied 186 cases of NSCLC with preoperative thin-slice CT scans. A thoracic radiologist annotated 89 semantic image features of each patient's tumor. Next, we built a decision tree to predict the presence of EGFR and KRAS mutations. We found a statistically significant model for predicting EGFR but not for KRAS mutations. The test set area under the ROC curve for predicting EGFR mutation status was 0.89. The final decision tree used four variables: emphysema, airway abnormality, the percentage of ground glass component and the type of tumor margin. The presence of either of the first two features predicts a wild type status for EGFR while the presence of any ground glass component indicates EGFR mutations. These results show the potential of quantitative imaging to predict molecular properties in a non-invasive manner, as CT imaging is more readily available than biopsies.

    View details for DOI 10.1038/srep41674

    View details for Web of Science ID 000393094200001

    View details for PubMedID 28139704

    View details for PubMedCentralID PMC5282551

  • MicroRNA based Pan-Cancer Diagnosis and Treatment Recommendation BMC BIOINFORMATICS Cheerla, N., Gevaert, O. 2017; 18


    The current state-of-the-art in cancer diagnosis and treatment is not ideal; diagnostic tests are accurate but invasive, and treatments are "one-size fits-all" instead of being personalized. Recently, miRNA's have garnered significant attention as cancer biomarkers, owing to their ease of access (circulating miRNA in the blood) and stability. There have been many studies showing the effectiveness of miRNA data in diagnosing specific cancer types, but few studies explore the role of miRNA in predicting treatment outcome.Here we go a step further, using tissue miRNA and clinical data across 21 cancers from the 'The Cancer Genome Atlas' (TCGA) database. We use machine learning techniques to create an accurate pan-cancer diagnosis system, and a prediction model for treatment outcomes. Finally, using these models, we create a web-based tool that diagnoses cancer and recommends the best treatment options.We achieved 97.2% accuracy for classification using a support vector machine classifier with radial basis. The accuracies improved to 99.9-100% when climbing up the embryonic tree and classifying cancers at different stages. We define the accuracy as the ratio of the total number of instances correctly classified to the total instances. The classifier also performed well, achieving greater than 80% sensitivity for many cancer types on independent validation datasets. Many miRNAs selected by our feature selection algorithm had strong previous associations to various cancers and tumor progression. Then, using miRNA, clinical and treatment data and encoding it in a machine-learning readable format, we built a prognosis predictor model to predict the outcome of treatment with 85% accuracy. We used this model to create a tool that recommends personalized treatment regimens. Both the diagnosis and prognosis model, incorporating semi-supervised learning techniques to improve their accuracies with repeated use, were uploaded online for easy access.Our research is a step towards the final goal of diagnosing cancer and predicting treatment recommendations using non-invasive blood tests.

    View details for DOI 10.1186/s12859-016-1421-y

    View details for Web of Science ID 000392171000002

    View details for PubMedID 28086747

    View details for PubMedCentralID PMC5237282

  • A multi-view deep convolutional neural networks for lung nodule segmentation. Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference Gevaert, O. 2017; 2017: 1752–55


    We present a multi-view convolutional neural networks (MV-CNN) for lung nodule segmentation. The MV-CNN specialized in capturing a diverse set of nodule-sensitive features from axial, coronal and sagittal views in CT images simultaneously. The proposed network architecture consists of three CNN branches, where each branch includes seven stacked layers and takes multi-scale nodule patches as input. The three CNN branches are then integrated with a fully connected layer to predict whether the patch center voxel belongs to the nodule. The proposed method has been evaluated on 893 nodules from the public LIDC-IDRI dataset, where ground-truth annotations and CT imaging data were provided. We showed that MV-CNN demonstrated encouraging performance for segmenting various type of nodules including juxta-pleural, cavitary, and non-solid nodules, achieving an average dice similarity coefficient (DSC) of 77.67% and average surface distance (ASD) of 0.24, outperforming conventional image segmentation approaches.

    View details for DOI 10.1109/EMBC.2017.8037182

    View details for PubMedID 29060226

  • Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations. AMIA ... Annual Symposium proceedings. AMIA Symposium Martinez-Romero, M., O'Connor, M. J., Shankar, R. D., Panahiazar, M., Willrett, D., Egyedi, A. L., Gevaert, O., Graybeal, J., Musen, M. A. 2017; 2017: 1272–81


    In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.

    View details for PubMedID 29854196

  • Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO). Journal of biomedical informatics Panahiazar, M., Dumontier, M., Gevaert, O. 2017; 72: 132–39


    A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse.

    View details for PubMedID 28625880

  • Radiomics in Brain Tumor: Image Assessment, Quantitative Feature Descriptors, and Machine-Learning Approaches. AJNR. American journal of neuroradiology Zhou, M., Scott, J., Chaudhury, B., Hall, L., Goldgof, D., Yeom, K. W., Iv, M., Ou, Y., Kalpathy-Cramer, J., Napel, S., Gillies, R., Gevaert, O., Gatenby, R. 2017


    Radiomics describes a broad set of computational methods that extract quantitative features from radiographic images. The resulting features can be used to inform imaging diagnosis, prognosis, and therapy response in oncology. However, major challenges remain for methodologic developments to optimize feature extraction and provide rapid information flow in clinical settings. Equally important, to be clinically useful, predictive radiomic properties must be clearly linked to meaningful biologic characteristics and qualitative imaging properties familiar to radiologists. Here we use a cross-disciplinary approach to highlight studies in radiomics. We review brain tumor radiologic studies (eg, imaging interpretation) through computational models (eg, computer vision and machine learning) that provide novel clinical insights. We outline current quantitative image feature extraction and prediction strategies with different levels of available clinical classes for supporting clinical decision-making. We further discuss machine-learning challenges and data opportunities to advance radiomic studies.

    View details for DOI 10.3174/ajnr.A5391

    View details for PubMedID 28982791

  • Quantitative imaging outperforms molecular markers when predicting response to chemoradiotherapy for rectal cancer. Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology Joye, I., Debucquoy, A., Deroose, C. M., Vandecaveye, V., Cutsem, E. V., Wolthuis, A., D'Hoore, A., Sagaert, X., Zhou, M., Gevaert, O., Haustermans, K. 2017; 124 (1): 104–9


    To explore the integration of imaging and molecular data for response prediction to chemoradiotherapy (CRT) for rectal cancer.Eighty-five rectal cancer patients underwent preoperative CRT.18F-FDG PET/CT and diffusion-weighted imaging (DWI) were acquired before (TP1) and during CRT (TP2) and prior to surgery (TP3). Inflammatory cytokines and gene expression were analysed. Tumour response was defined as ypT0-1N0. Multivariate models were built combining the obtained parameters. Final models were calculated on the data combination with the highest AUC.Twenty-two patients (26%) achieved ypT0-1N0 response.18F-FDG PET/CT had worse predictive performance than DWI and T2-volumetry (AUC 0.61±0.04, 0.72±0.03, and 0.72±0.02, respectively). Combining all imaging parameters increased the AUC to 0.81±0.03. Adding cytokines or gene expression did not improve the AUC (AUC of 0.72±0.06 and 0.79±0.04 respectively). Final models combining18F-FDG PET/CT, DWI, and T2-weighted volumetry at all TPs and using only TP1 and TP3, allowed ypT0-1N0 prediction with a 75% sensitivity, 94% specificity and PPV of 80%.Combining18F-FDG PET/CT, DWI, and T2-weighted MRI volumetry obtained before CRT and prior to surgery may help physicians in selecting rectal cancer patients for organ-preservation.

    View details for PubMedID 28647399

  • NSD1 inactivation defines an immune cold, DNA hypomethylated subtype in squamous cell carcinoma. Scientific reports Brennan, K., Shin, J. H., Tay, J. K., Prunello, M., Gentles, A. J., Sunwoo, J. B., Gevaert, O. 2017; 7 (1): 17064


    Chromatin modifying enzymes are frequently mutated in cancer, resulting in widespread epigenetic deregulation. Recent reports indicate that inactivating mutations in the histone methyltransferase NSD1 define an intrinsic subtype of head and neck squamous cell carcinoma (HNSC) that features pronounced DNA hypomethylation. Here, we describe a similar hypomethylated subtype of lung squamous cell carcinoma (LUSC) that is enriched for both inactivating mutations and deletions in NSD1. The 'NSD1 subtypes' of HNSC and LUSC are highly correlated at the DNA methylation and gene expression levels, featuring ectopic expression of developmental transcription factors and genes that are also hypomethylated in Sotos syndrome, a congenital disorder caused by germline NSD1 mutations. Further, the NSD1 subtype of HNSC displays an 'immune cold' phenotype characterized by low infiltration of tumor-associated leukocytes, particularly macrophages and CD8+ T cells, as well as low expression of genes encoding the immunotherapy target PD-1 immune checkpoint receptor and its ligands. Using an in vivo model, we demonstrate that NSD1 inactivation results in reduced T cell infiltration into the tumor microenvironment, implicating NSD1 as a tumor cell-intrinsic driver of an immune cold phenotype. NSD1 inactivation therefore causes epigenetic deregulation across cancer sites, and has implications for immunotherapy.

    View details for DOI 10.1038/s41598-017-17298-x

    View details for PubMedID 29213088

  • Magnetic resonance perfusion image features uncover an angiogenic subgroup of glioblastoma patients with poor survival and better response to antiangiogenic treatment. Neuro-oncology Liu, T. T., Achrol, A. S., Mitchell, L. A., Rodriguez, S. A., Feroze, A., Kim, C., Chaudhary, N., Gevaert, O., Stuart, J. M., Harsh, G. R., Chang, S. D., Rubin, D. L. 2016


    In previous clinical trials, antiangiogenic therapies such as bevacizumab did not show efficacy in patients with newly diagnosed glioblastoma (GBM). This may be a result of the heterogeneity of GBM, which has a variety of imaging-based phenotypes and gene expression patterns. In this study, we sought to identify a phenotypic subtype of GBM patients who have distinct tumor-image features and molecular activities and who may benefit from antiangiogenic therapies.Quantitative image features characterizing subregions of tumors and the whole tumor were extracted from preoperative and pretherapy perfusion magnetic resonance (MR) images of 117 GBM patients in 2 independent cohorts. Unsupervised consensus clustering was performed to identify robust clusters of GBM in each cohort. Cox survival and gene set enrichment analyses were conducted to characterize the clinical significance and molecular pathway activities of the clusters. The differential treatment efficacy of antiangiogenic therapy between the clusters was evaluated.A subgroup of patients with elevated perfusion features was identified and was significantly associated with poor patient survival after accounting for other clinical covariates (P values <.01; hazard ratios > 3) consistently found in both cohorts. Angiogenesis and hypoxia pathways were enriched in this subgroup of patients, suggesting the potential efficacy of antiangiogenic therapy. Patients of the angiogenic subgroups pooled from both cohorts, who had chemotherapy information available, had significantly longer survival when treated with antiangiogenic therapy (log-rank P=.022).Our findings suggest that an angiogenic subtype of GBM patients may benefit from antiangiogenic therapy with improved overall survival.

    View details for DOI 10.1093/neuonc/now270

    View details for PubMedID 28007759

  • Chromatin-Remodeling Complex SWI/SNF Controls Multidrug Resistance by Transcriptionally Regulating the Drug Efflux Pump ABCB1 CANCER RESEARCH Dubey, R., Lebensohn, A. M., Bahrami-Nejad, Z., Marceau, C., Champion, M., Gevaert, O., Sikic, B. I., Carette, J. E., Rohatgi, R. 2016; 76 (19): 5810-5821


    Anthracyclines are among the most effective yet most toxic drugs used in the oncology clinic. The nucleosome-remodeling SWI/SNF complex, a potent tumor suppressor, is thought to promote sensitivity to anthracyclines by recruiting topoisomerase IIa (TOP2A) to DNA and increasing double-strand breaks. In this study, we discovered a novel mechanism through which SWI/SNF influences resistance to the widely used anthracycline doxorubicin based on the use of a forward genetic screen in haploid human cells, followed by a rigorous single and double-mutant epistasis analysis using CRISPR/Cas9-mediated engineering. Doxorubicin resistance conferred by loss of the SMARCB1 subunit of the SWI/SNF complex was caused by transcriptional upregulation of a single gene, encoding the multidrug resistance pump ABCB1. Remarkably, both ABCB1 upregulation and doxorubicin resistance caused by SMARCB1 loss were dependent on the function of SMARCA4, a catalytic subunit of the SWI/SNF complex. We propose that residual SWI/SNF complexes lacking SMARCB1 are vital determinants of drug sensitivity, not just to TOP2A-targeted agents, but to the much broader range of cancer drugs effluxed by ABCB1. Cancer Res; 76(19); 5810-21. ©2016 AACR.

    View details for DOI 10.1158/0008-5472.CAN-16-0716

    View details for Web of Science ID 000385625500025

    View details for PubMedID 27503929

    View details for PubMedCentralID PMC5050136

  • Transforming Big Data into Cancer-Relevant Insight: An Initial, Multi-Tier Approach to Assess Reproducibility and Relevance The Cancer Target Discovery and Development Network MOLECULAR CANCER RESEARCH Clemons, P. A., Shamji, A., Hon, C., Wagner, B. K., Schreiber, S. L., Krasnitz, A., Sordella, R., Sander, C., Lowe, S. W., Powers, S., Smith, K., Aburi, M., Lavarone, A., Lasorella, A., Silva, J., Stockwell, B., Califano, A., Boehm, J. S., Vazquez, F., Weir, B. A., Golub, T. R., Hahn, W. C., Khuri, F. R., Moreno, C. S., Du, Y., Cooper, L., Ivanov, A. A., Johns, M. A., Fu, H., Nikolova, O., Mendez, E., Gadi, V. K., Margolin, A. A., Grandori, C., Kemp, C. J., Warren, E. H., Riddell, S. R., McIntosh, M. W., Gevaert, O., Ji, H. P., Kuo, C. J., Dhruv, H., Finlay, D., Kiefer, J., Kim, S., Vuori, K., Berens, M. E., Weissman, J., Bivona, T., Bandyopadhyay, S., Hangauer, M., Boettcher, M., McManus, M., McCormick, F., Aksoy, O., Simonds, E. F., Zheng, T., Chen, J., An, Z., Balmain, A., Weiss, W. A., Chen, K., Liang, H., Scott, K. L., Mills, G. B., Posner, B. A., Macmillan, J., Minna, J., White, M. A., Roth, M. G., Jagu, S., Mazerik, J. N., Gerhard, D. S. 2016; 14 (8): 675-682
  • Predicting structured metadata from unstructured metadata DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION Posch, L., Panahiazar, M., Dumontier, M., Gevaert, O. 2016
  • CoINcIDE: A framework for discovery of patient subtypes across multiple datasets GENOME MEDICINE Planey, C. R., Gevaert, O. 2016; 8


    Patient disease subtypes have the potential to transform personalized medicine. However, many patient subtypes derived from unsupervised clustering analyses on high-dimensional datasets are not replicable across multiple datasets, limiting their clinical utility. We present CoINcIDE, a novel methodological framework for the discovery of patient subtypes across multiple datasets that requires no between-dataset transformations. We also present a high-quality database collection, curatedBreastData, with over 2,500 breast cancer gene expression samples. We use CoINcIDE to discover novel breast and ovarian cancer subtypes with prognostic significance and novel hypothesized ovarian therapeutic targets across multiple datasets. CoINcIDE and curatedBreastData are available as R packages.

    View details for DOI 10.1186/s13073-016-0281-4

    View details for Web of Science ID 000371588100001

    View details for PubMedID 26961683

  • Single Gene Prognostic Biomarkers in Ovarian Cancer: A Meta-Analysis PLOS ONE Willis, S., Villalobos, V. M., Gevaert, O., Abramovitz, M., Williams, C., Sikic, B. I., Leyland-Jones, B. 2016; 11 (2)


    To discover novel prognostic biomarkers in ovarian serous carcinomas.A meta-analysis of all single genes probes in the TCGA and HAS ovarian cohorts was performed to identify possible biomarkers using Cox regression as a continuous variable for overall survival. Genes were ranked by p-value using Stouffer's method and selected for statistical significance with a false discovery rate (FDR) <.05 using the Benjamini-Hochberg method.Twelve genes with high mRNA expression were prognostic of poor outcome with an FDR <.05 (AXL, APC, RAB11FIP5, C19orf2, CYBRD1, PINK1, LRRN3, AQP1, DES, XRCC4, BCHE, and ASAP3). Twenty genes with low mRNA expression were prognostic of poor outcome with an FDR <.05 (LRIG1, SLC33A1, NUCB2, POLD3, ESR2, GOLPH3, XBP1, PAXIP1, CYB561, POLA2, CDH1, GMNN, SLC37A4, FAM174B, AGR2, SDR39U1, MAGT1, GJB1, SDF2L1, and C9orf82).A meta-analysis of all single genes identified thirty-two candidate biomarkers for their possible role in ovarian serous carcinoma. These genes can provide insight into the drivers or regulators of ovarian cancer and should be evaluated in future studies. Genes with high expression indicating poor outcome are possible therapeutic targets with known antagonists or inhibitors. Additionally, the genes could be combined into a prognostic multi-gene signature and tested in future ovarian cohorts.

    View details for DOI 10.1371/journal.pone.0149183

    View details for Web of Science ID 000371218400064

    View details for PubMedID 26886260

  • Development of prognostic signatures for intermediate-risk papillary thyroid cancer. BMC cancer Brennan, K., Holsinger, C., Dosiou, C., Sunwoo, J. B., Akatsu, H., Haile, R., Gevaert, O. 2016; 16 (1): 736-?


    The incidence of Papillary thyroid carcinoma (PTC), the most common type of thyroid malignancy, has risen rapidly worldwide. PTC usually has an excellent prognosis. However, the rising incidence of PTC, due at least partially to widespread use of neck imaging studies with increased detection of small cancers, has created a clinical issue of overdiagnosis, and consequential overtreatment. We investigated how molecular data can be used to develop a prognostics signature for PTC.The Cancer Genome Atlas (TCGA) recently reported on the genomic landscape of a large cohort of PTC cases. In order to decrease unnecessary morbidity associated with over diagnosing PTC patient with good prognosis, we used TCGA data to develop a gene expression signature to distinguish between patients with good and poor prognosis. We selected a set of clinical phenotypes to define an 'extreme poor' prognosis group and an 'extreme good' prognosis group and developed a gene signature that characterized these.We discovered a gene expression signature that distinguished the extreme good from extreme poor prognosis patients. Next, we applied this signature to the remaining intermediate risk patients, and show that they can be classified in clinically meaningful risk groups, characterized by established prognostic disease phenotypes. Analysis of the genes in the signature shows many known and novel genes involved in PTC prognosis.This work demonstrates that using a selection of clinical phenotypes and treatment variables, it is possible to develop a statistically useful and biologically meaningful gene signature of PTC prognosis, which may be developed as a biomarker to help prevent overdiagnosis.

    View details for DOI 10.1186/s12885-016-2771-6

    View details for PubMedID 27633254

  • Magnetic resonance perfusion image features uncover an angiogenic subgroup of glioblastoma patients with poor survival and better response to antiangiogenic treatment. Neuro-Oncology Liu, T. T., Achrol, A. S., Mitchell, L. A., Rodriguez, S. A., Feroze, A., Iv, M., Kim, C., Chaudhary, N., Gevaert, O., Stuart, J. M., Harsh, G. R., Chang, S. D., Rubin, D. L. 2016


    In previous clinical trials, antiangiogenic therapies such as bevacizumab did not show efficacy in patients with newly diagnosed glioblastoma (GBM). This may be a result of the heterogeneity of GBM, which has a variety of imaging-based phenotypes and gene expression patterns. In this study, we sought to identify a phenotypic subtype of GBM patients who have distinct tumor-image features and molecular activities and who may benefit from antiangiogenic therapies.Quantitative image features characterizing subregions of tumors and the whole tumor were extracted from preoperative and pretherapy perfusion magnetic resonance (MR) images of 117 GBM patients in 2 independent cohorts. Unsupervised consensus clustering was performed to identify robust clusters of GBM in each cohort. Cox survival and gene set enrichment analyses were conducted to characterize the clinical significance and molecular pathway activities of the clusters. The differential treatment efficacy of antiangiogenic therapy between the clusters was evaluated.A subgroup of patients with elevated perfusion features was identified and was significantly associated with poor patient survival after accounting for other clinical covariates (P values <.01; hazard ratios > 3) consistently found in both cohorts. Angiogenesis and hypoxia pathways were enriched in this subgroup of patients, suggesting the potential efficacy of antiangiogenic therapy. Patients of the angiogenic subgroups pooled from both cohorts, who had chemotherapy information available, had significantly longer survival when treated with antiangiogenic therapy (log-rank P=.022).Our findings suggest that an angiogenic subtype of GBM patients may benefit from antiangiogenic therapy with improved overall survival.

    View details for DOI 10.1093/neuonc/now270

  • A Rapid Segmentation-Insensitive "Digital Biopsy" Method for Radiomic Feature Extraction: Method and Pilot Study Using CT Images of Non-Small Cell Lung Cancer. Tomography : a journal for imaging research Echegaray, S., Nair, V., Kadoch, M., Leung, A., Rubin, D., Gevaert, O., Napel, S. 2016; 2 (4): 283–94


    Quantitative imaging approaches compute features within images' regions of interest. Segmentation is rarely completely automatic, requiring time-consuming editing by experts. We propose a new paradigm, called "digital biopsy," that allows for the collection of intensity- and texture-based features from these regions at least 1 order of magnitude faster than the current manual or semiautomated methods. A radiologist reviewed automated segmentations of lung nodules from 100 preoperative volume computed tomography scans of patients with non-small cell lung cancer, and manually adjusted the nodule boundaries in each section, to be used as a reference standard, requiring up to 45 minutes per nodule. We also asked a different expert to generate a digital biopsy for each patient using a paintbrush tool to paint a contiguous region of each tumor over multiple cross-sections, a procedure that required an average of <3 minutes per nodule. We simulated additional digital biopsies using morphological procedures. Finally, we compared the features extracted from these digital biopsies with our reference standard using intraclass correlation coefficient (ICC) to characterize robustness. Comparing the reference standard segmentations to our digital biopsies, we found that 84/94 features had an ICC >0.7; comparing erosions and dilations, using a sphere of 1.5-mm radius, of our digital biopsies to the reference standard segmentations resulted in 41/94 and 53/94 features, respectively, with ICCs >0.7. We conclude that many intensity- and texture-based features remain consistent between the reference standard and our method while substantially reducing the amount of operator time required.

    View details for PubMedID 28612050

    View details for PubMedCentralID PMC5466872

  • COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K SCIENTIFIC REPORTS Sweeney, T. E., Chen, A. C., Gevaert, O. 2015; 5


    In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of 'dark art', with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R.

    View details for DOI 10.1038/srep16971

    View details for Web of Science ID 000364936000001

    View details for PubMedID 26581809

  • The center for expanded data annotation and retrieval. Journal of the American Medical Informatics Association Musen, M. A., Bean, C. A., Cheung, K., Dumontier, M., Durante, K. A., Gevaert, O., Gonzalez-Beltran, A., Khatri, P., Kleinstein, S. H., O'Connor, M. J., Pouliot, Y., Rocca-Serra, P., Sansone, S., Wiser, J. A. 2015; 22 (6): 1148-1152


    The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments.

    View details for DOI 10.1093/jamia/ocv048

    View details for PubMedID 26112029

  • Core samples for radiomics features that are insensitive to tumor segmentation: method and pilot study using CT images of hepatocellular carcinoma. Journal of medical imaging (Bellingham, Wash.) Echegaray, S., Gevaert, O., Shah, R., Kamaya, A., Louie, J., Kothary, N., Napel, S. 2015; 2 (4): 041011-?


    The purpose of this study is to investigate the utility of obtaining "core samples" of regions in CT volume scans for extraction of radiomic features. We asked four readers to outline tumors in three representative slices from each phase of multiphasic liver CT images taken from 29 patients (1128 segmentations) with hepatocellular carcinoma. Core samples were obtained by automatically tracing the maximal circle inscribed in the outlines. Image features describing the intensity, texture, shape, and margin were used to describe the segmented lesion. We calculated the intraclass correlation between the features extracted from the readers' segmentations and their core samples to characterize robustness to segmentation between readers, and between human-based segmentation and core sampling. We conclude that despite the high interreader variability in manually delineating the tumor (average overlap of 43% across all readers), certain features such as intensity and texture features are robust to segmentation. More importantly, this same subset of features can be obtained from the core samples, providing as much information as detailed segmentation while being simpler and faster to obtain.

    View details for DOI 10.1117/1.JMI.2.4.041011

    View details for PubMedID 26587549

    View details for PubMedCentralID PMC4650964

  • Addition of MR imaging features and genetic biomarkers strengthens glioblastoma survival prediction in TCGA patients. Journal of neuroradiology. Journal de neuroradiologie Nicolasjilwan, M., Hu, Y., Yan, C., Meerzaman, D., Holder, C. A., Gutman, D., Jain, R., Colen, R., Rubin, D. L., Zinn, P. O., Hwang, S. N., Raghavan, P., Hammoud, D. A., Scarpace, L. M., Mikkelsen, T., Chen, J., Gevaert, O., Buetow, K., Freymann, J., Kirby, J., Flanders, A. E., Wintermark, M. 2015; 42 (4): 212-221


    The purpose of our study was to assess whether a model combining clinical factors, MR imaging features, and genomics would better predict overall survival of patients with glioblastoma (GBM) than either individual data type.The study was conducted leveraging The Cancer Genome Atlas (TCGA) effort supported by the National Institutes of Health. Six neuroradiologists reviewed MRI images from The Cancer Imaging Archive ( of 102 GBM patients using the VASARI scoring system. The patients' clinical and genetic data were obtained from the TCGA website ( Patient outcome was measured in terms of overall survival time. The association between different categories of biomarkers and survival was evaluated using Cox analysis.The features that were significantly associated with survival were: (1) clinical factors: chemotherapy; (2) imaging: proportion of tumor contrast enhancement on MRI; and (3) genomics: HRAS copy number variation. The combination of these three biomarkers resulted in an incremental increase in the strength of prediction of survival, with the model that included clinical, imaging, and genetic variables having the highest predictive accuracy (area under the curve 0.679±0.068, Akaike's information criterion 566.7, P<0.001).A combination of clinical factors, imaging features, and HRAS copy number variation best predicts survival of patients with GBM.

    View details for DOI 10.1016/j.neurad.2014.02.006

    View details for PubMedID 24997477

  • DNA Methylation-Guided Prediction of Clinical Failure in High-Risk Prostate Cancer PLOS ONE Litovkin, K., Van Eynde, A., Joniau, S., Lerut, E., Laenen, A., Gevaert, T., Gevaert, O., Spahn, M., Kneitz, B., Gramme, P., Helleputte, T., Isebaert, S., Haustermans, K., Bollen, M. 2015; 10 (6)


    Prostate cancer (PCa) is a very heterogeneous disease with respect to clinical outcome. This study explored differential DNA methylation in a priori selected genes to diagnose PCa and predict clinical failure (CF) in high-risk patients.A quantitative multiplex, methylation-specific PCR assay was developed to assess promoter methylation of the APC, CCND2, GSTP1, PTGS2 and RARB genes in formalin-fixed, paraffin-embedded tissue samples from 42 patients with benign prostatic hyperplasia and radical prostatectomy specimens of patients with high-risk PCa, encompassing training and validation cohorts of 147 and 71 patients, respectively. Log-rank tests, univariate and multivariate Cox models were used to investigate the prognostic value of the DNA methylation.Hypermethylation of APC, CCND2, GSTP1, PTGS2 and RARB was highly cancer-specific. However, only GSTP1 methylation was significantly associated with CF in both independent high-risk PCa cohorts. Importantly, trichotomization into low, moderate and high GSTP1 methylation level subgroups was highly predictive for CF. Patients with either a low or high GSTP1 methylation level, as compared to the moderate methylation groups, were at a higher risk for CF in both the training (Hazard ratio [HR], 3.65; 95% CI, 1.65 to 8.07) and validation sets (HR, 4.27; 95% CI, 1.03 to 17.72) as well as in the combined cohort (HR, 2.74; 95% CI, 1.42 to 5.27) in multivariate analysis.Classification of primary high-risk tumors into three subtypes based on DNA methylation can be combined with clinico-pathological parameters for a more informative risk-stratification of these PCa patients.

    View details for DOI 10.1371/journal.pone.0130651

    View details for Web of Science ID 000356567500126

    View details for PubMedID 26086362

  • Combining bevacizumab and chemoradiation in rectal cancer. Translational results of the AXEBeam trial. British journal of cancer Verstraete, M., Debucquoy, A., Dekervel, J., van Pelt, J., Verslype, C., Devos, E., Chiritescu, G., Dumon, K., D'Hoore, A., Gevaert, O., Sagaert, X., Van Cutsem, E., Haustermans, K. 2015; 112 (8): 1314-1325


    This study characterises molecular effect of bevacizumab, and explores the relation of molecular and genetic markers with response to bevacizumab combined with chemoradiotherapy (CRT).From a subset of 59 patients of 84 rectal cancer patients included in a phase II study combining bevacizumab with CRT, tumour and blood samples were collected before and during treatment, offering the possibility to evaluate changes induced by one dose of bevacizumab. We performed cDNA microarrays, stains for CD31/CD34 combined with α-SMA and CA-IX, as well as enzyme-linked immunosorbent assay (ELISA) for circulating angiogenic proteins. Markers were related with the pathological response of patients.One dose of bevacizumab changed the expression of 14 genes and led to a significant decrease in microvessel density and in the proportion of pericyte-covered blood vessels, and a small but nonsignificant increase in hypoxia. Alterations in angiogenic processes after bevacizumab delivery were only detected in responding tumours. Lower PDGFA expression and PDGF-BB levels, less pericyte-covered blood vessels and higher CA-IX expression were found after bevacizumab treatment only in patients with pathological complete response.We could not support the 'normalization hypothesis' and suggest a role for PDGFA, PDGF-BB, CA-IX and α-SMA. Validation in larger patient groups is needed.

    View details for DOI 10.1038/bjc.2015.93

    View details for PubMedID 25867261

  • Methylation of PITX2, HOXD3, RASSF1 and TDRD1 predicts biochemical recurrence in high-risk prostate cancer JOURNAL OF CANCER RESEARCH AND CLINICAL ONCOLOGY Litovkin, K., Joniau, S., Lerut, E., Laenen, A., Gevaert, O., Spahn, M., Kneitz, B., Isebaert, S., Haustermans, K., Beullens, M., Van Eynde, A., Bollen, M. 2014; 140 (11): 1849-1861
  • NF-?B protein expression associates with (18)F-FDG PET tumor uptake in non-small cell lung cancer: A radiogenomics validation study to understand tumor metabolism. Lung cancer Nair, V. S., Gevaert, O., Davidzon, G., Plevritis, S. K., West, R. 2014; 83 (2): 189-196


    We previously demonstrated that NF-κB may be associated with (18)F-FDG PET uptake and patient prognosis using radiogenomics in patients with non-small cell lung cancer (NSCLC). To validate these results, we assessed NF-κB protein expression in an extended cohort of NSCLC patients.We examined NF-κBp65 by immunohistochemistry (IHC) using a Tissue Microarray. Staining intensity was assessed by qualitative ordinal scoring and compared to tumor FDG uptake (SUVmax and SUVmean), lactate dehydrogenase A (LDHA) expression (as a positive control) and outcome using ANOVA, Kaplan Meier (KM), and Cox-proportional hazards (CPH) analysis.365 tumors from 355 patients with long-term follow-up were analyzed. The average age for patients was 67±11 years, 46% were male and 67% were ever smokers. Stage I and II patients comprised 83% of the cohort and the majority had adenocarcinoma (73%). From 88 FDG PET scans available, average SUVmax and SUVmean were 8.3±6.6, and 3.7±2.4 respectively. Increasing NF-κBp65 expression, but not LDHA expression, was associated with higher SUVmax and SUVmean (p=0.03 and 0.02 respectively). Both NF-κBp65 and positive FDG uptake were significantly associated with more advanced stage, tumor histology and invasion. Higher NF-κBp65 expression was associated with death by KM analysis (p=0.06) while LDHA was strongly associated with recurrence (p=0.04). Increased levels of combined NF-κBp65 and LDHA expression were synergistic and associated with both recurrence (p=0.04) and death (p=0.03).NF-κB IHC was a modest biomarker of prognosis that associated with tumor glucose metabolism on FDG PET when compared to existing molecular correlates like LDHA, which was synergistic with NF-κB for outcome. These findings recapitulate radiogenomics profiles previously reported by our group and provide a methodology for studying tumor biology using computational approaches.

    View details for DOI 10.1016/j.lungcan.2013.11.001

    View details for PubMedID 24355259

  • Stromal architecture and periductal decorin are potential prognostic markers for ipsilateral locoregional recurrence in ductal carcinoma in situ of the breast HISTOPATHOLOGY Van Bockstal, M., Lambein, K., Gevaert, O., de Wever, O., Praet, M., Cocquyt, V., Van den Broecke, R., Braems, G., Denys, H., Libbrecht, L. 2013; 63 (4): 520-533

    View details for DOI 10.1111/his.12188

    View details for Web of Science ID 000325088600008

  • Cross-Species Functional Analysis of Cancer-Associated Fibroblasts Identifies a Critical Role for CLCF1 and IL-6 in Non-Small Cell Lung Cancer In Vivo CANCER RESEARCH Vicent, S., Sayles, L. C., Vaka, D., Khatri, P., Gevaert, O., Chen, R., Zheng, Y., Gillespie, A. K., Clarke, N., Xu, Y., Shrager, J., Hoang, C. D., Plevritis, S., Butte, A. J., Sweet-Cordero, E. A. 2012; 72 (22): 5744-5756


    Cancer-associated fibroblasts (CAF) have been reported to support tumor progression by a variety of mechanisms. However, their role in the progression of non-small cell lung cancer (NSCLC) remains poorly defined. In addition, the extent to which specific proteins secreted by CAFs contribute directly to tumor growth is unclear. To study the role of CAFs in NSCLCs, a cross-species functional characterization of mouse and human lung CAFs was conducted. CAFs supported the growth of lung cancer cells in vivo by secretion of soluble factors that directly stimulate the growth of tumor cells. Gene expression analysis comparing normal mouse lung fibroblasts and mouse lung CAFs identified multiple genes that correlate with the CAF phenotype. A gene signature of secreted genes upregulated in CAFs was an independent marker of poor survival in patients with NSCLC. This secreted gene signature was upregulated in normal lung fibroblasts after long-term exposure to tumor cells, showing that lung fibroblasts are "educated" by tumor cells to acquire a CAF-like phenotype. Functional studies identified important roles for CLCF1-CNTFR and interleukin (IL)-6-IL-6R signaling in promoting growth of NSCLCs. This study identifies novel soluble factors contributing to the CAF protumorigenic phenotype in NSCLCs and suggests new avenues for the development of therapeutic strategies.

    View details for DOI 10.1158/0008-5472.CAN-12-1097

    View details for Web of Science ID 000311141300012

    View details for PubMedID 22962265

  • Evaluation of a panel of 28 biomarkers for the non-invasive diagnosis of endometriosis HUMAN REPRODUCTION Vodolazkaia, A., El-Aalamat, Y., Popovic, D., Mihalyi, A., Bossuyt, X., Kyama, C. M., Fassbender, A., Bokor, A., SCHOLS, D., Huskens, D., Meuleman, C., Peeraer, K., Tomassetti, C., Gevaert, O., Waelkens, E., Kasran, A., De Moor, B., D'Hooghe, T. M. 2012; 27 (9): 2698-2711


    At present, the only way to conclusively diagnose endometriosis is laparoscopic inspection, preferably with histological confirmation. This contributes to the delay in the diagnosis of endometriosis which is 6-11 years. So far non-invasive diagnostic approaches such as ultrasound (US), MRI or blood tests do not have sufficient diagnostic power. Our aim was to develop and validate a non-invasive diagnostic test with a high sensitivity (80% or more) for symptomatic endometriosis patients, without US evidence of endometriosis, since this is the group most in need of a non-invasive test.A total of 28 inflammatory and non-inflammatory plasma biomarkers were measured in 353 EDTA plasma samples collected at surgery from 121 controls without endometriosis at laparoscopy and from 232 women with endometriosis (minimal-mild n = 148; moderate-severe n = 84), including 175 women without preoperative US evidence of endometriosis. Surgery was done during menstrual (n = 83), follicular (n = 135) and luteal (n = 135) phases of the menstrual cycle. For analysis, the data were randomly divided into an independent training (n = 235) and a test (n = 118) data set. Statistical analysis was done using univariate and multivariate (logistic regression and least squares support vector machines (LS-SVM) approaches in training- and test data set separately to validate our findings.In the training set, two models of four biomarkers (Model 1: annexin V, VEGF, CA-125 and glycodelin; Model 2: annexin V, VEGF, CA-125 and sICAM-1) analysed in plasma, obtained during the menstrual phase, could predict US-negative endometriosis with a high sensitivity (81-90%) and an acceptable specificity (68-81%). The same two models predicted US-negative endometriosis in the independent validation test set with a high sensitivity (82%) and an acceptable specificity (63-75%).In plasma samples obtained during menstruation, multivariate analysis of four biomarkers (annexin V, VEGF, CA-125 and sICAM-1/or glycodelin) enabled the diagnosis of endometriosis undetectable by US with a sensitivity of 81-90% and a specificity of 63-81% in independent training- and test data set. The next step is to apply these models for preoperative prediction of endometriosis in an independent set of patients with infertility and/or pain without US evidence of endometriosis, scheduled for laparoscopy.

    View details for DOI 10.1093/humrep/des234

    View details for Web of Science ID 000307502000016

    View details for PubMedID 22736326

  • Combined mRNA microarray and proteomic analysis of eutopic endometrium of women with and without endometriosis. Human reproduction (Oxford, England) Fassbender, A., Verbeeck, N., Börnigen, D., Kyama, C. M., Bokor, A., Vodolazkaia, A., Peeraer, K., Tomassetti, C., Meuleman, C., Gevaert, O., Van de Plas, R., Ojeda, F., De Moor, B., Moreau, Y., Waelkens, E., D'Hooghe, T. M. 2012; 27 (7): 2020-2029


    An early semi-invasive diagnosis of endometriosis has the potential to allow early treatment and minimize disease progression but no such test is available at present. Our aim was to perform a combined mRNA microarray and proteomic analysis on the same eutopic endometrium sample obtained from patients with and without endometriosis.mRNA and protein fractions were extracted from 49 endometrial biopsies obtained from women with laparoscopically proven presence (n= 31) or absence (n= 18) of endometriosis during the early luteal (n= 27) or menstrual phase (n= 22) and analyzed using microarray and proteomic surface enhanced laser desorption ionization-time of flight mass spectrometry, respectively. Proteomic data were analyzed using a least squares-support vector machines (LS-SVM) model built on 70% (training set) and 30% of the samples (test set).mRNA analysis of eutopic endometrium did not show any differentially expressed genes in women with endometriosis when compared with controls, regardless of endometriosis stage or cycle phase. mRNA was differentially expressed (P< 0.05) in women with (925 genes) and without endometriosis (1087 genes) during the menstrual phase when compared with the early luteal phase. Proteomic analysis based on five peptide peaks [2072 mass/charge (m/z); 2973 m/z; 3623 m/z; 3680 m/z and 21133 m/z] using an LS-SVM model applied on the luteal phase endometrium training set allowed the diagnosis of endometriosis (sensitivity, 91; 95% confidence interval (CI): 74-98; specificity, 80; 95% CI: 66-97 and positive predictive value, 87.9%; negative predictive value, 84.8%) in the test set.mRNA expression of eutopic endometrium was comparable in women with and without endometriosis but different in menstrual endometrium when compared with luteal endometrium in women with endometriosis. Proteomic analysis of luteal phase endometrium allowed the diagnosis of endometriosis with high sensitivity and specificity in training and test sets. A potential limitation of our study is the fact that our control group included women with a normal pelvis as well as women with concurrent pelvic disease (e.g. fibroids, benign ovarian cysts, hydrosalpinges), which may have contributed to the comparable mRNA expression profile in the eutopic endometrium of women with endometriosis and controls.

    View details for DOI 10.1093/humrep/des127

    View details for PubMedID 22556377

  • Combined mRNA microarray and proteomic analysis of eutopic endometrium of women with and without endometriosis HUMAN REPRODUCTION Fassbender, A., Verbeeck, N., Boernigen, D., Kyama, C. M., Bokor, A., Vodolazkaia, A., Peeraer, K., Tomassetti, C., Meuleman, C., Gevaert, O., Van de Plas, R., Ojeda, F., De Moor, B., Moreau, Y., Waelkens, E., D'Hooghe, T. M. 2012; 27 (7): 2020-2029
  • Proteomics Analysis of Plasma for Early Diagnosis of Endometriosis OBSTETRICS AND GYNECOLOGY Fassbender, A., Waelkens, E., Verbeeck, N., Kyama, C. M., Bokor, A., Vodolazkaia, A., Van De Plas, R., Meuleman, C., Peeraer, K., Tomassetti, C., Gevaert, O., Ojeda, F., De Moor, B., D'Hooghe, T. 2012; 119 (2): 276-285


    To test the hypothesis that differential surface-enhanced laser desorption/ionization time-of-flight mass spectrometry protein or peptide expression in plasma can be used in infertile women with or without pelvic pain to predict the presence of laparoscopically and histologically confirmed endometriosis, especially in the subpopulation with a normal preoperative gynecologic ultrasound examination.Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry analysis was performed on 254 plasma samples obtained from 89 women without endometriosis and 165 women with endometriosis (histologically confirmed) undergoing laparoscopies for infertility with or without pelvic pain. Data were analyzed using least squares support vector machines and were divided randomly (100 times) into a training data set (70%) and a test data set (30%).Minimal-to-mild endometriosis was best predicted (sensitivity 75%, 95% confidence interval [CI] 63-89; specificity 86%, 95% CI 71-94; positive predictive value 83.6%, negative predictive value 78.3%) using a model based on five peptide and protein peaks (range 4.898-14.698 m/z) in menstrual phase samples. Moderate-to-severe endometriosis was best predicted (sensitivity 98%, 95% CI 84-100; specificity 81%, 95% CI 67-92; positive predictive value 74.4%, negative predictive value 98.6%) using a model based on five other peptide and protein peaks (range 2.189-7.457 m/z) in luteal phase samples. The peak with the highest intensity (2.189 m/z) was identified as a fibrinogen β-chain peptide. Ultrasonography-negative endometriosis was best predicted (sensitivity 88%, 95% CI 73-100; specificity 84%, 95% CI 71-96) using a model based on five peptide peaks (range 2.058-42.065 m/z) in menstrual phase samples.A noninvasive test using proteomic analysis of plasma samples obtained during the menstrual phase enabled the diagnosis of endometriosis undetectable by ultrasonography with high sensitivity and specificity.II.

    View details for DOI 10.1097/AOG.0b013e31823fda8d

    View details for Web of Science ID 000299604300012

    View details for PubMedID 22270279

  • Atypical Neurofibromas in Neurofibromatosis Type 1 are Premalignant Tumors GENES CHROMOSOMES & CANCER Beert, E., Brems, H., Daniels, B., De Wever, I., Van Calenbergh, F., Schoenaers, J., Debiec-Rychter, M., Gevaert, O., De Raedt, T., Van den Bruel, A., de Ravel, T., Cichowski, K., Kluwe, L., Mautner, V., Sciot, R., Legius, E. 2011; 50 (12): 1021-1032


    Benign peripheral nerve sheath tumors (PNSTs) are a characteristic feature of neurofibromatosis type I (NF1) patients. NF1 individuals have an 8-13% lifetime risk of developing a malignant PNST (MPNST). Atypical neurofibromas are symptomatic, hypercellular PNSTs, composed of cells with hyperchromatic nuclei in the absence of mitoses. Little is known about the origin and nature of atypical neurofibromas in NF1 patients. In this study, we classified the atypical neurofibromas in the spectrum of NF1-associated PNSTs by analyzing 65 tumor samples from 48 NF1 patients. We compared tumor-specific chromosomal copy number alterations between benign neurofibromas, atypical neurofibromas, and MPNSTs (low-, intermediate-, and high-grade) by karyotyping and microarray-based comparative genome hybridization (aCGH). In 15 benign neurofibromas (4 subcutaneous and 11 plexiform), no copy number alterations were found, except a single event in a plexiform neurofibroma. One highly significant recurrent aberration (15/16) was identified in the atypical neurofibromas, namely a deletion with a minimal overlapping region (MOR) in chromosome band 9p21.3, including CDKN2A and CDKN2B. Copy number loss of the CDKN2A/B gene locus was one of the most common events in the group of MPNSTs, with deletions in low-, intermediate-, and high-grade MPNSTs. In one tumor, we observed a clear transition from a benign-atypical neurofibroma toward an intermediate-grade MPNST, confirmed by both histopathology and aCGH analysis. These data support the hypothesis that atypical neurofibromas are premalignant tumors, with the CDKN2A/B deletion as the first step in the progression toward MPNST.

    View details for DOI 10.1002/gcc.20921

    View details for Web of Science ID 000296443600005

    View details for PubMedID 21987445

  • Prediction of lymph node involvement in breast cancer from primary tumor tissue using gene expression profiling and miRNAs BREAST CANCER RESEARCH AND TREATMENT Smeets, A., Daemen, A., Vanden Bempt, I., Gevaert, O., Claes, B., Wildiers, H., Drijkoningen, R., Van Hummelen, P., Lambrechts, D., De Moor, B., Neven, P., Sotiriou, C., Vandorpe, T., Paridaens, R., Christiaens, M. R. 2011; 129 (3): 767-776


    The aim of this study was to investigate whether lymph node involvement in breast cancer is influenced by gene or miRNA expression of the primary tumor. For this purpose, we selected a very homogeneous patient population to minimize heterogeneity in other tumor and patient characteristics. First, we compared gene expression profiles of primary tumor tissue from a group of 96 breast cancer patients balanced for lymph node involvement using Affymetrix Human U133 Plus 2.0 microarray chip. A model was built by weighted Least-Squares Support Vector Machines and validated on an internal and external dataset. Next, miRNA profiling was performed on a subset of 82 tumors using Human MiRNA-microarray chips (Illumina). Finally, for each miRNA the number of significant inverse correlated targets was determined and compared with 1000 sets of randomly chosen targets. A model based on 241 genes was built (AUC 0.66). The AUC for the internal dataset was 0.646 and 0. 651 for the external datasets. The model includes multiple kinases, apoptosis-related, and zinc ion-binding genes. Integration of the microarray and miRNA data reveals ten miRNAs suppressing lymph node invasion and one miRNA promoting lymph node invasion. Our results provide evidence that measurable differences in gene and miRNA expression exist between node negative and node positive patients and thus that lymph node involvement is not a genetically random process. Moreover, our data suggest a general deregulation of the miRNA machinery that is potentially responsible for lymph node invasion.

    View details for DOI 10.1007/s10549-010-1265-5

    View details for Web of Science ID 000294680600010

    View details for PubMedID 21116709

  • Ectopic pregnancy: using the hCG ratio to select women for expectant or medical management ACTA OBSTETRICIA ET GYNECOLOGICA SCANDINAVICA Kirk, E., Van Calster, B., Condous, G., Papageorghiou, A. T., Gevaert, O., Van Huffel, S., De Moor, B., Timmerman, D., Bourne, T. 2011; 90 (3): 264-272


    To identify variables that can be used to select women with an ectopic pregnancy for expectant or medical management with systemic methotrexate.Cohort study.Early Pregnancy Unit of a London teaching hospital.Women with a tubal ectopic pregnancy managed non-surgically.The diagnosis of tubal ectopic pregnancy was made using transvaginal sonography. Human chorionic gonadotrophin (hCG) levels had to be taken at 0 hour and 48 hours pre-treatment. Other recorded variables include presenting complaints, gestational age, progesterone levels, size of the ectopic mass and appearance of the ectopic on transvaginal sonography. Women were followed up until the outcome (success or failure) of management was known.Univariable analysis was performed to identify the variables associated with successful management using area under curves and relative risks.Thirty-nine women underwent expectant management (overall success rate 71.8%) and 42 had medical management (overall success rate 76.2%). The pre-treatment hCG ratio (hCG 48 hours/hCG 0 hour) was related to the failure of both expectant (area under curve 0.86, 95% CI 0.67-0.94) and medical (area under curve 0.79, 95% CI 0.58-0.90) management. History of ectopic pregnancy was related to failure of expectant management only (relative risk 0.46, 95% CI 0.16-0.92).The most important variable for predicting the likelihood of successful non-surgical management was the pre-treatment hCG ratio. New studies are required to validate the use of this variable and of history of ectopic pregnancy to predict the likelihood of successful non-surgical management in clinical practice.

    View details for DOI 10.1111/j.1600-0412.2010.01053.x

    View details for Web of Science ID 000288825600010

    View details for PubMedID 21306315

  • Evaluation of endometrial biomarkers for semi-invasive diagnosis of endometriosis FERTILITY AND STERILITY Kyama, C. M., Mihalyi, A., Gevaert, O., Waelkens, E., Simsa, P., Van De Plas, R., Meuleman, C., De Moor, B., D'Hooghe, T. M. 2011; 95 (4): 1338-U173


    To test the hypothesis that specific proteins and peptides are expressed differentially in eutopic endometrium of women with and without endometriosis and at specific stages of the disease (minimal, mild, moderate, or severe) during the secretory phase.Patients with endometriosis were compared with controls.University hospital.A total of 29 patients during the secretory phase were selected for this study on the basis of cycle phase and presence or absence of endometriosis.Endometriosis was confirmed laparoscopically and histologically in 19 patients with endometriosis of revised American Society for Reproductive Medicine stages (9 minimal-mild and 10 moderate-severe), and the presence of a normal pelvis was documented by laparoscopy in 10 controls.Protein expression of endometrium was evaluated with use of surface-enhanced laser desorption/ionization time-of-flight mass spectrometry. The differential expression of protein mass peaks was analyzed with use of support vector machine algorithms and logistic regression models.Data preprocessing resulted in differential expression of 73, 30, and 131 mass peaks between controls and patients with endometriosis (all stages), with minimal-mild endometriosis, and with moderate-severe endometriosis, respectively. Endometriosis was diagnosed with high sensitivity (89.5%) and specificity (90%) with use of five down-regulated mass peaks (1.949 kDa, 5.183 kDa, 8.650 kDa, 8.659 kDa, and 13.910 kDa) obtained after support vector machine ranking and logistic regression classification. With use of a similar analysis, minimal-mild endometriosis was diagnosed with four mass peaks (two up-regulated: 35.956 kDa and 90.675 kDa and two down-regulated: 1.924 kDa and 2.504 kDa) with maximal sensitivity (100%) and specificity (100%). The 90.675-kDa and 35.956-kDa mass peaks were identified as T-plastin and annexin V, respectively.Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry analysis of secretory phase endometrium combined with bioinformatics puts forward a prospective panel of potential biomarkers with sensitivity of 100% and specificity of 100% for the diagnosis of minimal to mild endometriosis.

    View details for DOI 10.1016/j.fertnstert.2010.06.084

    View details for Web of Science ID 000288010900024

    View details for PubMedID 20800833

  • TRIzol treatment of secretory phase endometrium allows combined proteomic and mRNA microarray analysis of the same sample in women with and without endometriosis REPRODUCTIVE BIOLOGY AND ENDOCRINOLOGY Fassbender, A., Simsa, P., Kyama, C. M., Waelkens, E., Mihalyi, A., Meuleman, C., Gevaert, O., Van De Plas, R., De Moor, B., D'Hooghe, T. M. 2010; 8


    According to mRNA microarray, proteomics and other studies, biological abnormalities of eutopic endometrium (EM) are involved in the pathogenesis of endometriosis, but the relationship between mRNA and protein expression in EM is not clear. We tested for the first time the hypothesis that EM TRIzol extraction allows proteomic Surface Enhanced Laser Desorption/Ionisation Time-of-Flight Mass Spectrometry (SELDI-TOF MS) analysis and that these proteomic data can be related to mRNA (microarray) data obtained from the same EM sample from women with and without endometriosis.Proteomic analysis was performed using SELDI-TOF-MS of TRIzol-extracted EM obtained during secretory phase from patients without endometriosis (n = 6), patients with minimal-mild (n = 5) and with moderate-severe endometriosis (n = 5), classified according to the system of the American Society of Reproductive Medicine. Proteomic data were compared to mRNA microarray data obtained from the same EM samples.In our SELDI-TOF MS study 32 peaks were differentially expressed in endometrium of all women with endometriosis (stages I-IV) compared with all controls during the secretory phase. Comparison of proteomic results with those from microarray revealed no corresponding genes/proteins.TRIzol treatment of secretory phase EM allows combined proteomic and mRNA microarray analysis of the same sample, but comparison between proteomic and microarray data was not evident, probably due to post-translational modifications.

    View details for DOI 10.1186/1477-7827-8-123

    View details for Web of Science ID 000284485100001

    View details for PubMedID 20964823

  • Improved Microarray-Based Decision Support with Graph Encoded Interactome Data PLOS ONE Daemen, A., Signoretto, M., Gevaert, O., Suykens, J. A., De Moor, B. 2010; 5 (4)


    In the past, microarray studies have been criticized due to noise and the limited overlap between gene signatures. Prior biological knowledge should therefore be incorporated as side information in models based on gene expression data to improve the accuracy of diagnosis and prognosis in cancer. As prior knowledge, we investigated interaction and pathway information from the human interactome on different aspects of biological systems. By exploiting the properties of kernel methods, relations between genes with similar functions but active in alternative pathways could be incorporated in a support vector machine classifier based on spectral graph theory. Using 10 microarray data sets, we first reduced the number of data sources relevant for multiple cancer types and outcomes. Three sources on metabolic pathway information (KEGG), protein-protein interactions (OPHID) and miRNA-gene targeting ( outperformed the other sources with regard to the considered class of models. Both fixed and adaptive approaches were subsequently considered to combine the three corresponding classifiers. Averaging the predictions of these classifiers performed best and was significantly better than the model based on microarray data only. These results were confirmed on 6 validation microarray sets, with a significantly improved performance in 4 of them. Integrating interactome data thus improves classification of cancer outcome for the investigated microarray technologies and cancer types. Moreover, this strategy can be incorporated in any kernel method or non-linear version of a non-kernel method.

    View details for DOI 10.1371/journal.pone.0010225

    View details for Web of Science ID 000276853800015

    View details for PubMedID 20419106

  • Non-invasive diagnosis of endometriosis based on a combined analysis of six plasma biomarkers HUMAN REPRODUCTION Mihalyi, A., Gevaert, O., Kyama, C. M., Simsa, P., Pochet, N., De Smet, F., De Moor, B., Meuleman, C., Billen, J., Blanckaert, N., Vodolazkaia, A., Fulop, V., D'Hooghe, T. M. 2010; 25 (3): 654-664


    Lack of a non-invasive diagnostic test contributes to the long delay between onset of symptoms and diagnosis of endometriosis. The aim of this study was to evaluate the combined performance of six potential plasma biomarkers in the diagnosis of endometriosis.This case-control study was conducted in 294 infertile women, consisting of 93 women with a normal pelvis and 201 women with endometriosis. We measured plasma concentrations of interleukin (IL)-6, IL-8, tumour necrosis factor-alpha, high-sensitivity C-reactive protein (hsCRP), and cancer antigens CA-125 and CA-19-9. Analyses were done using the Kruskal-Wallis test, Mann-Whitney test, receiver operator characteristic, stepwise logistic regression and least squares support vector machines (LSSVM).Plasma levels of IL-6, IL-8 and CA-125 were increased in all women with endometriosis and in those with minimal-mild endometriosis, compared with controls. In women with moderate-severe endometriosis, plasma levels of IL-6, IL-8 and CA-125, but also of hsCRP, were significantly higher than in controls. Using stepwise logistic regression, moderate-severe endometriosis was diagnosed with a sensitivity of 100% (specificity 84%) and minimal-mild endometriosis was detected with a sensitivity of 87% (specificity 71%) during the secretory phase. Using LSSVM analysis, minimal-mild endometriosis was diagnosed with a sensitivity of 94% (specificity 61%) during the secretory phase and with a sensitivity of 92% (specificity 63%) during the menstrual phase.Advanced statistical analysis of a panel of six selected plasma biomarkers on samples obtained during the secretory phase or during menstruation allows the diagnosis of both minimal-mild and moderate-severe endometriosis with high sensitivity and clinically acceptable specificity.

    View details for DOI 10.1093/humrep/dep425

    View details for Web of Science ID 000274490700014

    View details for PubMedID 20007161

  • A taxonomy of epithelial human cancer and their metastases BMC MEDICAL GENOMICS Gevaert, O., Daemen, A., De Moor, B., Libbrecht, L. 2009; 2


    Microarray technology has allowed to molecularly characterize many different cancer sites. This technology has the potential to individualize therapy and to discover new drug targets. However, due to technological differences and issues in standardized sample collection no study has evaluated the molecular profile of epithelial human cancer in a large number of samples and tissues. Additionally, it has not yet been extensively investigated whether metastases resemble their tissue of origin or tissue of destination.We studied the expression profiles of a series of 1566 primary and 178 metastases by unsupervised hierarchical clustering. The clustering profile was subsequently investigated and correlated with clinico-pathological data. Statistical enrichment of clinico-pathological annotations of groups of samples was investigated using Fisher exact test. Gene set enrichment analysis (GSEA) and DAVID functional enrichment analysis were used to investigate the molecular pathways. Kaplan-Meier survival analysis and log-rank tests were used to investigate prognostic significance of gene signatures.Large clusters corresponding to breast, gastrointestinal, ovarian and kidney primary tissues emerged from the data. Chromophobe renal cell carcinoma clustered together with follicular differentiated thyroid carcinoma, which supports recent morphological descriptions of thyroid follicular carcinoma-like tumors in the kidney and suggests that they represent a subtype of chromophobe carcinoma. We also found an expression signature identifying primary tumors of squamous cell histology in multiple tissues. Next, a subset of ovarian tumors enriched with endometrioid histology clustered together with endometrium tumors, confirming that they share their etiopathogenesis, which strongly differs from serous ovarian tumors. In addition, the clustering of colon and breast tumors correlated with clinico-pathological characteristics. Moreover, a signature was developed based on our unsupervised clustering of breast tumors and this was predictive for disease-specific survival in three independent studies. Next, the metastases from ovarian, breast, lung and vulva cluster with their tissue of origin while metastases from colon showed a bimodal distribution. A significant part clusters with tissue of origin while the remaining tumors cluster with the tissue of destination.Our molecular taxonomy of epithelial human cancer indicates surprising correlations over tissues. This may have a significant impact on the classification of many cancer sites and may guide pathologists, both in research and daily practice. Moreover, these results based on unsupervised analysis yielded a signature predictive of clinical outcome in breast cancer. Additionally, we hypothesize that metastases from gastrointestinal origin either remember their tissue of origin or adapt to the tissue of destination. More specifically, colon metastases in the liver show strong evidence for such a bimodal tissue specific profile.

    View details for DOI 10.1186/1755-8794-2-69

    View details for Web of Science ID 000273595600001

    View details for PubMedID 20017941

  • Density of small diameter sensory nerve fibres in endometrium: a semi-invasive diagnostic test for minimal to mild endometriosis HUMAN REPRODUCTION Bokor, A., Kyama, C. M., Vercruysse, L., Fassbender, A., Gevaert, O., Vodolazkaia, A., De Moor, B., Fulop, V., D'Hooghe, T. 2009; 24 (12): 3025-3032


    The aim of our study was to test the hypothesis that multiple-sensory small-diameter nerve fibres are present in a higher density in endometrium from patients with endometriosis when compared with women with a normal pelvis, enabling the development of a semi-invasive diagnostic test for minimal-mild endometriosis.Secretory phase endometrium samples (n = 40), obtained from women with laparoscopically/histologically confirmed minimal-mild endometriosis (n = 20) and from women with a normal pelvis (n = 20) were selected from the biobank at the Leuven University Fertility Centre. Immunohistochemistry was performed to localize neural markers for sensory C, Adelta, adrenergic and cholinergic nerve fibres in the functional layer of the endometrium. Sections were immunostained with anti-human protein gene product 9.5 (PGP9.5), anti-neurofilament protein, anti-substance P (SP), anti-vasoactive intestinal peptide (VIP), anti-neuropeptide Y and anti-calcitonine gene-related polypeptide. Statistical analysis was done using the Mann-Whitney U-test, receiver operator characteristic analysis, stepwise logistic regression and least-squares support vector machines.The density of small nerve fibres was approximately 14 times higher in endometrium from patients with minimal-mild endometriosis (1.96 +/- 2.73) when compared with women with a normal pelvis (0.14 +/- 0.46, P < 0.0001).The combined analysis of neural markers PGP9.5, VIP and SP could predict the presence of minimal-mild endometriosis with 95% sensitivity, 100% specificity and 97.5% accuracy. To confirm our findings, prospective studies are required.

    View details for DOI 10.1093/humrep/dep283

    View details for Web of Science ID 000272069500009

    View details for PubMedID 19690351

  • Molecular Response to Cetuximab and Efficacy of Preoperative Cetuximab-Based Chemoradiation in Rectal Cancer 44th Annual Meeting of the American-Society-of-Clinical-Oncology (ASCO) Debucquoy, A., Haustermans, K., Daemen, A., Aydin, S., Libbrecht, L., Gevaert, O., De Moor, B., Tejpar, S., McBride, W. H., Penninckx, F., Scalliet, P., Stroh, C., Vlassak, S., Sempoux, C., Machiels, J. AMER SOC CLINICAL ONCOLOGY. 2009: 2751–57


    To characterize the molecular pathways activated or inhibited by cetuximab when combined with chemoradiotherapy (CRT) in rectal cancer and to identify molecular profiles and biomarkers that might improve patient selection for such treatments.Forty-one patients with rectal cancer (T3-4 and/or N+) received preoperative radiotherapy (1.8 Gy, 5 days/wk, 45 Gy) in combination with capecitabine and cetuximab (400 mg/m2 as initial dose 1 week before CRT followed by 250 mg/m2 /wk for 5 weeks). Biopsies and plasma samples were taken before treatment, after cetuximab but before CRT, and at the time of surgery. Proteomics and microarrays were used to monitor the molecular response to cetuximab and to identify profiles and biomarkers to predict treatment efficacy.Cetuximab on its own downregulated genes involved in proliferation and invasion and upregulated inflammatory gene expression, with 16 genes being significantly influenced in microarray analysis. The decrease in proliferation was confirmed by immunohistochemistry for Ki67 (P = .01) and was accompanied by an increase in transforming growth factor-alpha in plasma samples (P < .001). Disease-free survival (DFS) was better in patients if epidermal growth factor receptor expression was upregulated in the tumor after the initial cetuximab dose (P = .02) and when fibro-inflammatory changes were present in the surgical specimen (P = .03). Microarray and proteomic profiles were predictive of DFS.Our study showed that a single dose of cetuximab has a significant impact on the expression of genes involved in tumor proliferation and inflammation. We identified potential biomarkers that might predict response to cetuximab-based CRT.

    View details for DOI 10.1200/JCO.2008.18.5033

    View details for Web of Science ID 000266782100005

    View details for PubMedID 19332731

  • Prediction of cancer outcome using DNA microarray technology: past, present and future. Expert opinion on medical diagnostics Gevaert, O., De Moor, B. 2009; 3 (2): 157-165


    Background: The use of DNA microarray technology to predict cancer outcome already has a history of almost a decade. Although many breakthroughs have been made, the promise of individualized therapy is still not fulfilled. In addition, new technologies are emerging that also show promise in outcome prediction of cancer patients. Objective: The impact of DNA microarray and other 'omics' technologies on the outcome prediction of cancer patients was investigated. Whether integration of omics data results in better predictions was also examined. Methods: DNA microarray technology was focused on as a starting point because this technology is considered to be the most mature technology from all omics technologies. Next, emerging technologies that may accomplish the same goals but have been less extensively studied are described. Conclusion: Besides DNA microarray technology, other omics technologies have shown promise in predicting the cancer outcome or have potential to replace microarray technology in the near future. Moreover, it is shown that integration of multiple omics data can result in better predictions of cancer outcome; but, owing to the lack of comprehensive studies, validation studies are required to verify which omics has the most information and whether a combination of multiple omics data improves predictive performance.

    View details for DOI 10.1517/17530050802680172

    View details for PubMedID 23485162

  • A kernel-based integration of genome-wide data for clinical decision support. Genome medicine Daemen, A., Gevaert, O., Ojeda, F., Debucquoy, A., Suykens, J. A., Sempoux, C., Machiels, J., Haustermans, K., De Moor, B. 2009; 1 (4): 39-?


    Although microarray technology allows the investigation of the transcriptomic make-up of a tumor in one experiment, the transcriptome does not completely reflect the underlying biology due to alternative splicing, post-translational modifications, as well as the influence of pathological conditions (for example, cancer) on transcription and translation. This increases the importance of fusing more than one source of genome-wide data, such as the genome, transcriptome, proteome, and epigenome. The current increase in the amount of available omics data emphasizes the need for a methodological integration framework.We propose a kernel-based approach for clinical decision support in which many genome-wide data sources are combined. Integration occurs within the patient domain at the level of kernel matrices before building the classifier. As supervised classification algorithm, a weighted least squares support vector machine is used. We apply this framework to two cancer cases, namely, a rectal cancer data set containing microarray and proteomics data and a prostate cancer data set containing microarray and genomics data. For both cases, multiple outcomes are predicted.For the rectal cancer outcomes, the highest leave-one-out (LOO) areas under the receiver operating characteristic curves (AUC) were obtained when combining microarray and proteomics data gathered during therapy and ranged from 0.927 to 0.987. For prostate cancer, all four outcomes had a better LOO AUC when combining microarray and genomics data, ranging from 0.786 for recurrence to 0.987 for metastasis.For both cancer sites the prediction of all outcomes improved when more than one genome-wide data set was considered. This suggests that integrating multiple genome-wide data sources increases the predictive performance of clinical decision support models. This emphasizes the need for comprehensive multi-modal data. We acknowledge that, in a first phase, this will substantially increase costs; however, this is a necessary investment to ultimately obtain cost-efficient models usable in patient tailored therapy.

    View details for DOI 10.1186/gm39

    View details for PubMedID 19356222

  • Building decision trees for diagnosing intracavitary uterine pathology. Facts, views & vision in ObGyn Van den Bosch, T., Daemen, A., Gevaert, O., De Moor, B., Timmerman, D. 2009; 1 (3): 182-188


    To build decision trees to predict intrauterine disease, based on a clinical data set, and using mathematical software.Diagnostic algorithms were built and validated using the data of 402 consecutive patients who underwent grey scale ultrasound, followed by colour Doppler, saline infusion sonography (SIS), office hysteroscopy and endometrial-- sampling. The "final diagnosis" was classified as "abnormal" in case of endometrial polyps, hyperplasia or malignancy or intracavitary myoma. "Pre-test parameters" included patient's age, weight, length, parity, menopausal status, bleeding symptoms and cervical cytology; "post-test parameters" included ultrasound-, color Doppler-, SIS-, hysteroscopy- findings and histology results after endometrial sampling. Decision Tree #1 was built using both "pre-test" and "post-test" parameters; Tree #2 was only based on "post-test" parameters; Tree #3 was designed without using the hysteroscopy variables. The Waikato Environment for Knowledge Analysis (Weka) software was used for the development of decision trees.All trees started with an imaging technique: hysteroscopy or SIS. The diagnostic accuracy was 88.3%, 88.3% and 84.0% for Tree #1, #2 and #3 respectively, the sensitivity and specificity was 95.5% and 82%, 97.7% and 80.0, 93.2 and 76.0%, respectively.The method used in this study enables the comparison between different decision trees containing multiple tests.

    View details for PubMedID 25489463

  • A kernel-based integration of genome-wide data for clinical decision support GENOME MEDICINE Daemen, A., Gevaert, O., Ojeda, F., Debucquoy, A., Suykens, J. A., Sempoux, C., Machiels, J., Haustermans, K., De Moor, B. 2009; 1

    View details for DOI 10.1186/gm39

    View details for Web of Science ID 000208627000039

  • SUPERVISED CLASSIFICATION OF ARRAY CGH DATA WITH HMM-BASED FEATURE SELECTION Pacific Symposium on Biocomputing Daemen, A., Gevaert, O., Leunen, K., Legius, E., Vergote, I., De Moor, B. WORLD SCIENTIFIC PUBL CO PTE LTD. 2009: 468–479


    For different tumour types, extended knowledge about the molecular mechanisms involved in tumorigenesis is lacking. Looking for copy number variations (CNV) by Comparative Genomic Hybridization (CGH) can help however to determine key elements in this tumorigenesis. As genome-wide array CGH gives the opportunity to evaluate CNV at high resolution, this leads to huge amount of data, necessitating adequate mathematical methods to carefully select and interpret these data.Two groups of patients differing in cancer subtype were defined in two publicly available array CGH data sets as well as in our own data set on ovarian cancer. Chromosomal regions characterizing each group of patients were gathered using recurrent hidden Markov Models (HMM). The differential regions were reduced to a subset of features for classification by integrating different univariate feature selection methods. Weighted Least Squares Support Vector Machines (LS-SVM), a supervised classification method which takes unbalancedness of data sets into account, resulted in leave-one-out or 10-fold cross-validation accuracies ranging from 88 to 95.5%.The combination of recurrent HMMs for the detection of copy number alterations with LS-SVM classifiers offers a novel methodological approach for classification based on copy number alterations. Additionally, this approach limits the chromosomal regions that are necessary to classify patients according to cancer subtype.

    View details for Web of Science ID 000263639700045

    View details for PubMedID 19209723

  • Pain experienced during transvaginal ultrasound, saline contrast sonohysterography, hysteroscopy and office sampling: a comparative study ULTRASOUND IN OBSTETRICS & GYNECOLOGY Van den Bosch, T., Verguts, J., Daemen, A., Gevaert, O., Domali, E., Claerhout, F., Vandenbroucke, V., De Moor, B., Deprest, J., Timmerman, D. 2008; 31 (3): 346-351


    To evaluate and compare the pain experienced by women during transvaginal ultrasound, saline contrast sonohysterography (SCSH), diagnostic hysteroscopy and office sampling.This was a descriptive study of 402 consecutive patients presenting at a 'one-stop' Bleeding Clinic between October 2004 and November 2006. Thirty-nine percent of the patients were postmenopausal. The patients underwent the following examinations transvaginally: first ultrasound with color Doppler, second SCSH, third diagnostic hysteroscopy and fourth endometrial biopsy. After completion of the examinations the patients were asked to complete a questionnaire including a visual analog scale (VAS) about their subjective appreciation of all four examinations. Two-hundred and ninety-three (72%) patients returned the questionnaire.The median (range) VAS scores for transvaginal ultrasound, SCSH, diagnostic hysteroscopy and endometrial sampling were 1.0 (0-8.1), 2.2 (0-10), 2.7 (0-10) and 5.1 (0-10), respectively (P < 0.0001). The patients' answers to the other questions about the pain experienced, including comparison with other minor procedures such as venous blood sampling, were all concordant with the VAS scores.Transvaginal ultrasound was the procedure best accepted, followed by SCSH, hysteroscopy and endometrial sampling. These results suggest that patients would prefer SCSH over hysteroscopy as an initial diagnostic approach in the evaluation of abnormal uterine bleeding.

    View details for DOI 10.1002/uog.5263

    View details for Web of Science ID 000254541900019

    View details for PubMedID 18307203

  • Expression profiling to predict the clinical behaviour of ovarian cancer fails independent evaluation BMC CANCER Gevaert, O., De Smet, F., Van Gorp, T., Pochet, N., Engelen, K., Amant, F., De Moor, B., Timmerman, D., Vergote, I. 2008; 8


    In a previously published pilot study we explored the performance of microarrays in predicting clinical behaviour of ovarian tumours. For this purpose we performed microarray analysis on 20 patients and estimated that we could predict advanced stage disease with 100% accuracy and the response to platin-based chemotherapy with 76.92% accuracy using leave-one-out cross validation techniques in combination with Least Squares Support Vector Machines (LS-SVMs).In the current study we evaluate whether tumour characteristics in an independent set of 49 patients can be predicted using the pilot data set with principal component analysis or LS-SVMs.The results of the principal component analysis suggest that the gene expression data from stage I, platin-sensitive advanced stage and platin-resistant advanced stage tumours in the independent data set did not correspond to their respective classes in the pilot study. Additionally, LS-SVM models built using the data from the pilot study - although they only misclassified one of four stage I tumours and correctly classified all 45 advanced stage tumours - were not able to predict resistance to platin-based chemotherapy. Furthermore, models based on the pilot data and on previously published gene sets related to ovarian cancer outcomes, did not perform significantly better than our models.We discuss possible reasons for failure of the model for predicting response to platin-based chemotherapy and conclude that existing results based on gene expression patterns of ovarian tumours need to be thoroughly scrutinized before these results can be accepted to reflect the true performance of microarray technology.

    View details for DOI 10.1186/1471-2407-8-18

    View details for Web of Science ID 000253596800002

    View details for PubMedID 18211668

  • Integrating microarray and proteomics data to predict the response on cetuximab in patients with rectal cancer. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Daemen, A., Gevaert, O., De Bie, T., Debucquoy, A., Machiels, J., De Moor, B., Haustermans, K. 2008: 166-177


    To investigate the combination of cetuximab, capecitabine and radiotherapy in the preoperative treatment of patients with rectal cancer, fourty tumour samples were gathered before treatment (T0), after one dose of cetuximab but before radiotherapy with capecitabine (T1) and at moment of surgery (T2). The tumour and plasma samples were subjected at all timepoints to Affymetrix microarray and Luminex proteomics analysis, respectively. At surgery, the Rectal Cancer Regression Grade (RCRG) was registered. We used a kernel-based method with Least Squares Support Vector Machines to predict RCRG based on the integration of microarray and proteomics data on To and T1. We demonstrated that combining multiple data sources improves the predictive power. The best model was based on 5 genes and 10 proteins at T0 and T1 and could predict the RCRG with an accuracy of 91.7%, sensitivity of 96.2% and specificity of 80%.

    View details for PubMedID 18229684

  • Classification of sporadic and BRCA1 ovarian cancer based on a genome-wide study of copy number variations KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS Daemen, A., Gevaert, O., Leunen, K., Vanspauwen, V., Michils, G., Legius, E., Vergote, I., De Moor, B. 2008; 5178: 165-?
  • Integration of microarray and textual data improves the prognosis prediction of breast, lung and ovarian cancer patients. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Gevaert, O., Van Vooren, S., De Moor, B. 2008: 279-290


    Microarray data are notoriously noisy such that models predicting clinically relevant outcomes often contain many false positive genes. Integration of other data sources can alleviate this problem and enhance gene selection and model building. Probabilistic models provide a natural solution to integrate information by using the prior over model space. We investigated if the use of text information from PUBMED abstracts in the structure prior of a Bayesian network could improve the prediction of the prognosis in cancer. Our results show that prediction of the outcome with the text prior was significantly better compared to not using a prior, both on a well known microarray data set and on three independent microarray data sets.

    View details for PubMedID 18229693

  • A framework for elucidating regulatory networks based on prior information and expression data Workshop on Dialogue on Reverse Engineering Assessment and Methods Gevaert, O., Van Vooren, S., De Moor, B. WILEY-BLACKWELL. 2007: 240–248


    Elucidating regulatory networks is an intensively studied topic in bioinformatics. Integration of different sources of information could facilitate this task. We propose to incorporate these information sources in the structure prior of a Bayesian network. We are currently investigating two complementary sources of information: PubMed abstracts combined with publicly available taxonomies or ontologies, and known protein-DNA interactions. These priors, either separately or combined, have the potential of reducing the complexity of reverse-engineering regulatory networks while creating more robust and reliable models. Moreover this approach can easily be extended with other data sources. In such a way Bayesian networks provide a powerful framework for data integration and regulatory network modeling.

    View details for DOI 10.1196/annals.1407.002

    View details for Web of Science ID 000252037600017

    View details for PubMedID 17925352

  • Integration of clinical and microarray data with kernel methods 29th Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society Daemen, A., Gevaert, O., De Moor, B. IEEE. 2007: 5411–5415


    Currently, the clinical management of cancer is based on empirical data from the literature (clinical studies) or based on the expertise of the clinician. Recently microarray technology emerged and it has the potential to revolutionize the clinical management of cancer and other diseases. A microarray allows to measure the expression levels of thousands of genes simultaneously which may reflect diagnostic or prognostic categories and sensitivity to treatment. The objective of this paper is to investigate whether clinical data, which is the basis of day-to-day clinical decision support, can be efficiently combined with microarray data, which has yet to prove its potential to deliver patient tailored therapy, using Least Squares Support Vector Machines.

    View details for Web of Science ID 000253467004088

    View details for PubMedID 18003232

  • Molecular profiling of platinum resistant ovarian cancer: Use of the model in clinical practice INTERNATIONAL JOURNAL OF CANCER Gevaert, O., Pochet, N., De Smet, F., Van Gorp, T., De Moor, B., Timmerman, D., Amant, F., Vergote, I. 2006; 119 (6): 1511-1511

    View details for DOI 10.1002/ijc.21985

    View details for Web of Science ID 000239877200043

    View details for PubMedID 16619247

  • Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks 14th Conference on Intelligent Systems for Molecular Biology Gevaert, O., De Smet, F., Timmerman, D., Moreau, Y., De Moor, B. OXFORD UNIV PRESS. 2006: E184–E190


    Clinical data, such as patient history, laboratory analysis, ultrasound parameters--which are the basis of day-to-day clinical decision support--are often underused to guide the clinical management of cancer in the presence of microarray data. We propose a strategy based on Bayesian networks to treat clinical and microarray data on an equal footing. The main advantage of this probabilistic model is that it allows to integrate these data sources in several ways and that it allows to investigate and understand the model structure and parameters. Furthermore using the concept of a Markov Blanket we can identify all the variables that shield off the class variable from the influence of the remaining network. Therefore Bayesian networks automatically perform feature selection by identifying the (in)dependency relationships with the class variable.We evaluated three methods for integrating clinical and microarray data: decision integration, partial integration and full integration and used them to classify publicly available data on breast cancer patients into a poor and a good prognosis group. The partial integration method is most promising and has an independent test set area under the ROC curve of 0.845. After choosing an operating point the classification performance is better than frequently used indices.

    View details for DOI 10.1093/bioinformatics/btl230

    View details for Web of Science ID 000250005000023

    View details for PubMedID 16873470

  • Predicting the outcome of pregnancies of unknown location: Bayesian networks with expert prior information compared to logistic regression HUMAN REPRODUCTION Gevaert, O., De Smet, F., Kirk, E., Van Calster, B., Bourne, T., Van Huffel, S., Moreau, Y., Timmerman, D., De Moor, B., Condous, G. 2006; 21 (7): 1824-1831


    As women present at earlier gestations to early pregnancy units (EPUs), the number of women diagnosed with a pregnancy of unknown location (PUL) increases. Some of these women will have an ectopic pregnancy (EP), and it is this group in the PUL population that poses the greatest concern. The aim of this study was to develop Bayesian networks to predict EPs in the PUL population.Data were gathered in a single EPU from all women with a PUL. This data set was divided into a model-building (599 women with 44 EPs) and a validation (257 women with 22 EPs) data set and consisted of the following variables: vaginal bleeding, fluid in the pouch of Douglas, midline echo, lower abdominal pain, age, endometrial thickness, gestation days, the ratio of HCG at 48 and 0 h, progesterone levels (0 and 48 h) and the clinical outcome of the PUL. We developed Bayesian networks with expert information using this data set to predict EPs.The best Bayesian network used the gestational age, HCG ratio and the progesterone level at 48 h and had an area under the receiver operator characteristic curve (AUC) of 0.88 for predicting EPs when tested prospectively.Discrete-valued Bayesian networks are more complex to build than, for example, logistic regression. Nevertheless, we have demonstrated that such models can be used to predict EPs in a PUL population. Prospective interventional multicentre studies are needed to validate the use of such models in clinical practice.

    View details for DOI 10.1093/humrep/del083

    View details for Web of Science ID 000238907400027

    View details for PubMedID 16601010

  • Diagnostic accuracy of varying discriminatory zones for the prediction of ectopic pregnancy in women with a pregnancy of unknown location ULTRASOUND IN OBSTETRICS & GYNECOLOGY Condous, G., Kirk, E., Lu, C., Van Huffel, S., Gevaert, O., De Moor, B., De Smet, F., Timmerman, D., Bourne, T. 2005; 26 (7): 770-775


    Various serum human chorionic gonadotropin (hCG) discriminatory zones are currently used for evaluating the likelihood of an ectopic pregnancy in women classified as having a pregnancy of unknown location (PUL) following a transvaginal ultrasound examination. We evaluated the diagnostic accuracy of discriminatory zones for serum hCG levels of > 1000 IU/L, 1500 IU/L and 2000 IU/L for the detection of ectopic pregnancy in such women.This was a prospective observational study of women who were assessed in a specialized transvaginal scanning unit. All women with a PUL had serum hCG measured at presentation. Expectant management of PULs was adopted. These women were followed up with transvaginal ultrasound, monitoring of serum hormone levels and laparoscopy until a final diagnosis was established: a failing PUL, an intrauterine pregnancy (IUP), an ectopic pregnancy or a persisting PUL. The persisting PULs probably represented ectopic pregnancies which had been missed on ultrasound and these were incorporated into the ectopic pregnancy group. Three different discriminatory zones (1000 IU/L, 1500 IU/L and 2000 IU/L) were evaluated for predicting ectopic pregnancy in this PUL population.A total of 5544 consecutive women presented to the early pregnancy unit between 25 June 2001 and 14 April 2003. Of these, 569 (10.3%) women were classified as having a PUL, 42 of which were lost to follow up. Of the 527 (9.5%) cases with PUL analyzed, there were 300 (56.9%) failing PULs, 181 (34.3%) IUPs and 46 (8.7%) ectopic pregnancies. Overall, 74.6% were symptomatic and 25.4% were asymptomatic (P = 8.825E-07). The sensitivity and specificity of an hCG level of > 1000 IU/L to detect ectopic pregnancy were 21.7% (10/46) and 87.3% (420/481), respectively; for an hCG level of > 1500 IU/L these values were 15.2% (7/46) and 93.4% (449/481), respectively, and for an hCG level of > 2000 IU/L they were 10.9% (5/46) and 95.2% (458/481), respectively.Varying the discriminatory zone does not significantly improve the detection of ectopic pregnancy in a PUL population. A single measurement of serum hCG is not only potentially falsely reassuring but also unhelpful in excluding the presence of an ectopic pregnancy.

    View details for DOI 10.1002/uog.2636

    View details for Web of Science ID 000234027800015

    View details for PubMedID 16308901