Instructor, Medicine - Biomedical Informatics Research
A challenge in the diagnosis of renal cell carcinoma (RCC) is to distinguish chromophobe RCC (chRCC) from benign renal oncocytoma, because these tumor types are histologically and morphologically similar, yet they require different clinical management. Molecular biomarkers could provide a way of distinguishing oncocytoma from chRCC, which could prevent unnecessary treatment of oncocytoma. Such biomarkers could also be applied to preoperative biopsy specimens such as needle core biopsy specimens, to avoid unnecessary surgery of oncocytoma.We profiled DNA methylation in fresh-frozen oncocytoma and chRCC tumors and adjacent normal tissue and used machine learning to identify a signature of differentially methylated cytosine-phosphate-guanine sites (CpGs) that robustly distinguish oncocytoma from chRCC.Unsupervised clustering of Stanford and preexisting RCC data from The Cancer Genome Atlas (TCGA) revealed that of all RCC subtypes, oncocytoma is most similar to chRCC. Unexpectedly, however, oncocytoma features more extensive, overall abnormal methylation than does chRCC. We identified 79 CpGs with large methylation differences between oncocytoma and chRCC. A diagnostic model trained on 30 CpGs could distinguish oncocytoma from chRCC in 10-fold cross-validation (area under the receiver operating curve [AUC], 0.96 (95% CI, 0.88 to 1.00)) and could distinguish TCGA chRCCs from an independent set of oncocytomas from a previous study (AUC, 0.87). This signature also separated oncocytoma from other RCC subtypes and normal tissue, revealing it as a standalone diagnostic biomarker for oncocytoma.This CpG signature could be developed as a clinical biomarker to support differential diagnosis of oncocytoma and chRCC in surgical samples. With improved biopsy techniques, this signature could be applied to preoperative biopsy specimens.
View details for DOI 10.1200/PO.20.00015
View details for PubMedID 33015531
View details for PubMedCentralID PMC7529536
Aberrant DNA methylation disrupts normal gene expression in cancer and broadly contributes to oncogenesis. We previously developed MethylMix, a model-based algorithmic approach to identify epigenetically regulated driver genes. MethylMix identifies genes where methylation likely executes a functional role by using transcriptomic data to select only methylation events that can be linked to changes in gene expression. However, given that proteins more closely link genotype to phenotype recent high-throughput proteomic data provides an opportunity to more accurately identify functionally relevant abnormal methylation events. Here we present a MethylMix analysis that refines nominations for epigenetic driver genes by leveraging quantitative high-throughput proteomic data to select only genes where DNA methylation is predictive of protein abundance. Applying our algorithm across three cancer cohorts we find that using protein abundance data narrows candidate nominations, where the effect of DNA methylation is often buffered at the protein level. Next, we find that MethylMix genes predictive of protein abundance are enriched for biological processes involved in cancer including functions involved in epithelial and mesenchymal transition. Moreover, our results are also enriched for tumor markers which are predictive of clinical features like tumor stage and we find clustering using MethylMix genes predictive of protein abundance captures cancer subtypes.
View details for DOI 10.1371/journal.pcbi.1007245
View details for PubMedID 31356589
Long non-coding RNAs (lncRNAs) are emerging as important regulators of various biological processes. While many studies have exploited public resources such as RNA sequencing (RNA-Seq) data in The Cancer Genome Atlas to study lncRNAs in cancer, it is crucial to choose the optimal method for accurate expression quantification.In this study, we compared the performance of pseudoalignment methods Kallisto and Salmon, alignment-based transcript quantification method RSEM, and alignment-based gene quantification methods HTSeq and featureCounts, in combination with read aligners STAR, Subread, and HISAT2, in lncRNA quantification, by applying them to both un-stranded and stranded RNA-Seq datasets. Full transcriptome annotation, including protein-coding and non-coding RNAs, greatly improves the specificity of lncRNA expression quantification. Pseudoalignment methods and RSEM outperform HTSeq and featureCounts for lncRNA quantification at both sample- and gene-level comparison, regardless of RNA-Seq protocol type, choice of aligners, and transcriptome annotation. Pseudoalignment methods and RSEM detect more lncRNAs and correlate highly with simulated ground truth. On the contrary, HTSeq and featureCounts often underestimate lncRNA expression. Antisense lncRNAs are poorly quantified by alignment-based gene quantification methods, which can be improved using stranded protocols and pseudoalignment methods.Considering the consistency with ground truth and computational resources, pseudoalignment methods Kallisto or Salmon in combination with full transcriptome annotation is our recommended strategy for RNA-Seq analysis for lncRNAs.
View details for DOI 10.1093/gigascience/giz145
View details for PubMedID 31808800
The plasma-based methylated SEPTIN9 (mSEPT9) is a colorectal cancer (CRC) screening test for adults aged 50-75 years who are at average risk for CRC and have refused colonoscopy or faecal-based screening tests. The applicability of mSEPT9 for high-risk persons with Lynch syndrome (LS), the most common hereditary CRC condition, has not been assessed. This study sought preliminary evidence for the utility of mSEPT9 for CRC detection in LS.Firstly, SEPT9 methylation was measured in LS-associated CRC, advanced adenoma, and subject-matched normal colorectal mucosa tissues by pyrosequencing. Secondly, to detect mSEPT9 as circulating tumor DNA, the plasma-based mSEPT9 test was retrospectively evaluated in LS subjects using the Epi proColon 2.0 CE assay adapted for 1mL plasma using the "1/1 algorithm". LS case groups included 20 peri-surgical cases with acolonoscopy-based diagnosis of CRC (stages I-IV), 13 post-surgical metastatic CRC, and 17 pre-diagnosis cases. The control group comprised 31 cancer-free LS subjects.Differential hypermethylation was found in 97.3% (36/37) of primary CRC and 90.0% (18/20) of advanced adenomas, showing LS-associated neoplasia frequently produce the mSEPT9 biomarker. Sensitivity of plasma mSEPT9 to detect CRC was 70.0% (95% CI, 48%-88%)in cases with a colonoscopy-based CRC diagnosis and 92.3% (95% CI, 64%-100%) inpost-surgical metastatic cases. In pre-diagnosis cases, plasma mSEPT9 was detected within two months prior to colonoscopy-based CRC diagnosis in 3/5 cases. Specificity in controls was 100% (95% CI 89%-100%).These preliminary findings suggest mSEPT9 may demonstrate similar diagnostic performance characteristics in LS as in the average-risk population, warranting a well-powered prospective case-control study.
View details for DOI 10.1136/bmjgast-2019-000299
View details for PubMedID 31275589
View details for PubMedCentralID PMC6577308
Radiomics-based non-invasive biomarkers are promising to facilitate the translation of therapeutically related molecular subtypes for treatment allocation of patients with head and neck squamous cell carcinoma (HNSCC).We included 113 HNSCC patients from The Cancer Genome Atlas (TCGA-HNSCC) project. Molecular phenotypes analyzed were RNA-defined HPV status, five DNA methylation subtypes, four gene expression subtypes and five somatic gene mutations. A total of 540 quantitative image features were extracted from pre-treatment CT scans. Features were selected and used in a regularized logistic regression model to build binary classifiers for each molecular subtype. Models were evaluated using the average area under the Receiver Operator Characteristic curve (AUC) of a stratified 10-fold cross-validation procedure repeated 10 times. Next, an HPV model was trained with the TCGA-HNSCC, and tested on a Stanford cohort (N?=?53).Our results show that quantitative image features are capable of distinguishing several molecular phenotypes. We obtained significant predictive performance for RNA-defined HPV+ (AUC?=?0.73), DNA methylation subtypes MethylMix HPV+ (AUC?=?0.79), non-CIMP-atypical (AUC?=?0.77) and Stem-like-Smoking (AUC?=?0.71), and mutation of NSD1 (AUC?=?0.73). We externally validated the HPV prediction model (AUC?=?0.76) on the Stanford cohort. When compared to clinical models, radiomic models were superior to subtypes such as NOTCH1 mutation and DNA methylation subtype non-CIMP-atypical while were inferior for DNA methylation subtype CIMP-atypical and NSD1 mutation.Our study demonstrates that radiomics can potentially serve as a non-invasive tool to identify treatment-relevant subtypes of HNSCC, opening up the possibility for patient stratification, treatment allocation and inclusion in clinical trials. FUND: Dr. Gevaert reports grants from National Institute of Dental & Craniofacial Research (NIDCR) U01 DE025188, grants from National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health (NIBIB), R01 EB020527, grants from National Cancer Institute (NCI), U01 CA217851, during the conduct of the study; Dr. Huang and Dr. Zhu report grants from China Scholarship Council (Grant NO:201606320087), grants from China Medical Board Collaborating Program (Grant NO:15-216), the Cyrus Tang Foundation, and the Zhejiang University Education Foundation during the conduct of the study; Dr. Cintra reports grants from São Paulo State Foundation for Teaching and Research (FAPESP), during the conduct of the study.
View details for DOI 10.1016/j.ebiom.2019.06.034
View details for PubMedID 31255659
Summary: DNA methylation is an important mechanism regulating gene transcription, and its role in carcinogenesis has been extensively studied. Hyper and hypomethylation of genes is a major mechanism of gene expression deregulation in a wide range of diseases. At the same time, high-throughput DNA methylation assays have been developed generating vast amounts of genome wide DNA methylation measurements. We developed MethylMix, an algorithm implemented in R to identify disease specific hyper and hypomethylated genes. Here we present a new version of MethylMix that automates the construction of DNA-methylation and gene expression datasets from The Cancer Genome Atlas (TCGA). More precisely, MethylMix 2.0 incorporates two major updates: the automated downloading of DNA methylation and gene expression datasets from TCGA and the automated preprocessing of such datasets: value imputation, batch correction and CpG sites clustering within each gene. The resulting datasets can subsequently be analyzed with MethylMix to identify transcriptionally predictive methylation states. We show that the Differential Methylation Values created by MethylMix can be used for cancer subtyping.Contact: email@example.com.Documentation: https://bioconductor.org/packages/release/bioc/manuals/MethylMix/man/MethylMix.pdf.Availability and implementation: MethylMix 2.0 was implemented as an R package and is available in bioconductor.
View details for PubMedID 29668835
This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing molecular features of squamous cell carcinomas (SCCs) from five sites associated with smoking and/or human papillomavirus (HPV). SCCs harbor 3q, 5p, and other recurrent chromosomal copy-number alterations (CNAs), DNA mutations, and/or aberrant methylation of genes and microRNAs, which are correlated with the expression of multi-gene programs linked to squamous cell stemness, epithelial-to-mesenchymal differentiation, growth, genomic integrity, oxidative damage, death, and inflammation. Low-CNA SCCs tended to be HPV(+) and display hypermethylation with repression of TET1 demethylase and FANCF, previously linked to predisposition to SCC, or harbor mutations affecting CASP8, RAS-MAPK pathways, chromatin modifiers, and immunoregulatory molecules. We uncovered hypomethylation of the alternative promoter that drives expression of the ?Np63 oncogene and embedded miR944. Co-expression of immune checkpoint, T-regulatory, and Myeloid suppressor cells signatures may explain reduced efficacy of immune therapy. These findings support possibilities for molecular classification and therapeutic approaches.
View details for PubMedID 29617660
The availability of increasing volumes of multi-omics profiles across many cancers promises to improve our understanding of the regulatory mechanisms underlying cancer. The main challenge is to integrate these multiple levels of omics profiles and especially to analyze them across many cancers. Here we present AMARETTO, an algorithm that addresses both challenges in three steps. First, AMARETTO identifies potential cancer driver genes through integration of copy number, DNA methylation and gene expression data. Then AMARETTO connects these driver genes with co-expressed target genes that they control, defined as regulatory modules. Thirdly, we connect AMARETTO modules identified from different cancer sites into a pancancer network to identify cancer driver genes. Here we applied AMARETTO in a pancancer study comprising eleven cancer sites and confirmed that AMARETTO captures hallmarks of cancer. We also demonstrated that AMARETTO enables the identification of novel pancancer driver genes. In particular, our analysis led to the identification of pancancer driver genes of smoking-induced cancers and 'antiviral' interferon-modulated innate immune response.AMARETTO is available as an R package at https://bitbucket.org/gevaertlab/pancanceramaretto.
View details for PubMedID 29331675
Obesity results from the interaction of genetic and environmental factors, which may involve epigenetic mechanisms such as DNA methylation (DNAm).We have followed the PRISMA protocol to select studies that analyzed DNAm at baseline and end point of a weight loss intervention using either candidate-locus or genome-wide approaches.Six genes displayed weight loss associated DNAm across four out of nine genome-wide studies. Weight loss is associated with significant but small changes in DNAm across the genome, and weight loss outcome is associated with individual differences in baseline DNAm at several genomic locations.The identified weight loss associated DNAm markers, especially those showing reproducibility across different studies, warrant validation by further studies with robust design and adequate power.
View details for DOI 10.2217/epi-2016-0182
View details for Web of Science ID 000401642200014
View details for PubMedID 28517981
Chromatin modifying enzymes are frequently mutated in cancer, resulting in widespread epigenetic deregulation. Recent reports indicate that inactivating mutations in the histone methyltransferase NSD1 define an intrinsic subtype of head and neck squamous cell carcinoma (HNSC) that features pronounced DNA hypomethylation. Here, we describe a similar hypomethylated subtype of lung squamous cell carcinoma (LUSC) that is enriched for both inactivating mutations and deletions in NSD1. The 'NSD1 subtypes' of HNSC and LUSC are highly correlated at the DNA methylation and gene expression levels, featuring ectopic expression of developmental transcription factors and genes that are also hypomethylated in Sotos syndrome, a congenital disorder caused by germline NSD1 mutations. Further, the NSD1 subtype of HNSC displays an 'immune cold' phenotype characterized by low infiltration of tumor-associated leukocytes, particularly macrophages and CD8+ T cells, as well as low expression of genes encoding the immunotherapy target PD-1 immune checkpoint receptor and its ligands. Using an in vivo model, we demonstrate that NSD1 inactivation results in reduced T cell infiltration into the tumor microenvironment, implicating NSD1 as a tumor cell-intrinsic driver of an immune cold phenotype. NSD1 inactivation therefore causes epigenetic deregulation across cancer sites, and has implications for immunotherapy.
View details for PubMedID 29213088
The incidence of Papillary thyroid carcinoma (PTC), the most common type of thyroid malignancy, has risen rapidly worldwide. PTC usually has an excellent prognosis. However, the rising incidence of PTC, due at least partially to widespread use of neck imaging studies with increased detection of small cancers, has created a clinical issue of overdiagnosis, and consequential overtreatment. We investigated how molecular data can be used to develop a prognostics signature for PTC.The Cancer Genome Atlas (TCGA) recently reported on the genomic landscape of a large cohort of PTC cases. In order to decrease unnecessary morbidity associated with over diagnosing PTC patient with good prognosis, we used TCGA data to develop a gene expression signature to distinguish between patients with good and poor prognosis. We selected a set of clinical phenotypes to define an 'extreme poor' prognosis group and an 'extreme good' prognosis group and developed a gene signature that characterized these.We discovered a gene expression signature that distinguished the extreme good from extreme poor prognosis patients. Next, we applied this signature to the remaining intermediate risk patients, and show that they can be classified in clinically meaningful risk groups, characterized by established prognostic disease phenotypes. Analysis of the genes in the signature shows many known and novel genes involved in PTC prognosis.This work demonstrates that using a selection of clinical phenotypes and treatment variables, it is possible to develop a statistically useful and biologically meaningful gene signature of PTC prognosis, which may be developed as a biomarker to help prevent overdiagnosis.
View details for DOI 10.1186/s12885-016-2771-6
View details for PubMedID 27633254