Bachelor of Science, McMaster University (2016)
Doctor of Philosophy, University of Toronto (2020)
Christina Curtis, Postdoctoral Faculty Sponsor
Although DNA methylation is a key regulator of gene expression, the comprehensive methylation landscape of metastatic cancer has never been defined. Through whole-genome bisulfite sequencing paired with deep whole-genome and transcriptome sequencing of 100 castration-resistant prostate metastases, we discovered alterations affecting driver genes that were detectable only with integrated whole-genome approaches. Notably, we observed that 22% of tumors exhibited a novel epigenomic subtype associated with hypermethylation and somatic mutations in TET2, DNMT3B, IDH1 and BRAF. We also identified intergenic regions where methylation is associated with RNA expression of the oncogenic driver genes AR, MYC and ERG. Finally, we showed that differential methylation during progression preferentially occurs at somatic mutational hotspots and putative regulatory regions. This study is a large integrated study of whole-genome, whole-methylome and whole-transcriptome sequencing in metastatic cancer that provides a comprehensive overview of the important regulatory role of methylation in metastatic castration-resistant prostate cancer.
View details for DOI 10.1038/s41588-020-0648-8
View details for PubMedID 32661416
View details for PubMedCentralID PMC7454228
Transcriptional dysregulation is a hallmark of prostate cancer (PCa). We mapped the RNA polymerase II-associated (RNA Pol II-associated) chromatin interactions in normal prostate cells and PCa cells. We discovered thousands of enhancer-promoter, enhancer-enhancer, as well as promoter-promoter chromatin interactions. These transcriptional hubs operate within the framework set by structural proteins - CTCF and cohesins - and are regulated by the cooperative action of master transcription factors, such as the androgen receptor (AR) and FOXA1. By combining analyses from metastatic castration-resistant PCa (mCRPC) specimens, we show that AR locus amplification contributes to the transcriptional upregulation of the AR gene by increasing the total number of chromatin interaction modules comprising the AR gene and its distal enhancer. We deconvoluted the transcription control modules of several PCa genes, notably the biomarker KLK3, lineage-restricted genes (KRT8, KRT18, HOXB13, FOXA1, ZBTB16), the drug target EZH2, and the oncogene MYC. By integrating clinical PCa data, we defined a germline-somatic interplay between the PCa risk allele rs684232 and the somatically acquired TMPRSS2-ERG gene fusion in the transcriptional regulation of multiple target genes - VPS53, FAM57A, and GEMIN4. Our studies implicate changes in genome organization as a critical determinant of aberrant transcriptional regulation in PCa.
View details for DOI 10.1172/JCI134260
View details for PubMedID 32343676
View details for PubMedCentralID PMC7410051
Oncogenesis is driven by germline, environmental and stochastic factors. It is unknown how these interact to produce the molecular phenotypes of tumors. We therefore quantified the influence of germline polymorphisms on the somatic epigenome of 589 localized prostate tumors. Predisposition risk loci influence a tumor's epigenome, uncovering a mechanism for cancer susceptibility. We identified and validated 1,178 loci associated with altered methylation in tumoral but not nonmalignant tissue. These tumor methylation quantitative trait loci influence chromatin structure, as well as RNA and protein abundance. One prominent tumor methylation quantitative trait locus is associated with AKT1 expression and is predictive of relapse after definitive local therapy in both discovery and validation cohorts. These data reveal intricate crosstalk between the germ line and the epigenome of primary tumors, which may help identify germline biomarkers of aggressive disease to aid patient triage and optimize the use of more invasive or expensive diagnostic assays.
View details for DOI 10.1038/s41591-019-0579-z
View details for PubMedID 31591588
View details for PubMedCentralID PMC7418214
Multiparametric magnetic resonance imaging (mpMRI) has transformed the management of localized prostate cancer by improving identification of clinically significant disease at diagnosis. Approximately 20% of primary prostate tumors are invisible to mpMRI, and we hypothesize that this invisibility reflects fundamental molecular properties of the tumor. We therefore profiled the genomes and transcriptomes of 40 International Society of Urological Pathology grade 2 tumors: 20 mpMRI-invisible (Prostate Imaging-Reporting and Data System [PI-RADS] v2 <3) and 20 mpMRI-visible (PI-RADS v2 5) tumors. mpMRI-visible tumors were enriched in hallmarks of nimbosus, an aggressive pathological, molecular, and microenvironmental phenomenon in prostate cancer. These hallmarks included genomes with increased mutation density, a higher prevalence of intraductal carcinoma/cribriform architecture pathology, and altered abundance of 102 transcripts, including overexpression of noncoding RNAs such as SCHLAP1. Multiple small nucleolar RNAs (snoRNAs) were identified, and a snoRNA signature synergized with nimbosus hallmarks to discriminate visible from invisible tumors. These data suggest a confluence of aggressive molecular and microenvironmental phenomena underlie mpMRI visibility of localized prostate cancer. PATIENT SUMMARY: We examined the correlation between tumor biology and magnetic resonance imaging (MRI) visibility in a group of patients with low- intermediate-risk prostate cancer. We observed that MRI findings are associated with biological features of aggressive prostate cancer.
View details for DOI 10.1016/j.eururo.2018.12.036
View details for PubMedID 30685078
Cancer is a complex collection of diseases that are to some degree unique to each patient. Precision oncology aims to identify the best drug treatment regime using molecular data on tumor samples. While omics-level data is becoming more widely available for tumor specimens, the datasets upon which computational learning methods can be trained vary in coverage from sample to sample and from data type to data type. Methods that can 'connect the dots' to leverage more of the information provided by these studies could offer major advantages for maximizing predictive potential. We introduce a multi-view machinelearning strategy called PLATYPUS that builds 'views' from multiple data sources that are all used as features for predicting patient outcomes. We show that a learning strategy that finds agreement across the views on unlabeled data increases the performance of the learning methods over any single view. We illustrate the power of the approach by deriving signatures for drug sensitivity in a large cancer cell line database. Code and additional information are available from the PLATYPUS website https://sysbiowiki.soe.ucsc.edu/platypus.
View details for PubMedID 30864317
View details for PubMedCentralID PMC6417802
The potent MYC oncoprotein is deregulated in many human cancers, including breast carcinoma, and is associated with aggressive disease. To understand the mechanisms and vulnerabilities of MYC-driven breast cancer, we have generated an in vivo model that mimics human disease in response to MYC deregulation. MCF10A cells ectopically expressing a common breast cancer mutation in the phosphoinositide 3 kinase pathway (PIK3CAH1047R) led to the development of organised acinar structures in mice. Expressing both PIK3CAH1047R and deregulated MYC led to the development of invasive ductal carcinoma. Therefore, the deregulation of MYC expression in this setting creates a MYC-dependent normal-to-tumour switch that can be measured in vivo These MYC-driven tumours exhibit classic hallmarks of human breast cancer at both the pathological and molecular level. Moreover, tumour growth is dependent upon sustained deregulated MYC expression, further demonstrating addiction to this potent oncogene and regulator of gene transcription. We therefore provide a MYC-dependent model of breast cancer, which can be used to assay invivo tumour signalling pathways, proliferation and transformation from normal breast acini to invasive breast carcinoma. We anticipate that this novel MYC-driven transformation model will be a useful research tool to better understand the oncogenic function of MYC and for the identification of therapeutic vulnerabilities.
View details for DOI 10.1242/dmm.038083
View details for PubMedID 31350286
View details for PubMedCentralID PMC6679384
We introduce BPG, a framework for generating publication-quality, highly-customizable plots in the R statistical environment.This open-source package includes multiple methods of displaying high-dimensional datasets and facilitates generation of complex multi-panel figures, making it suitable for complex datasets. A web-based interactive tool allows online figure customization, from which R code can be downloaded for integration with computational pipelines.BPG provides a new approach for linking interactive and scripted data visualization and is available at http://labs.oicr.on.ca/boutros-lab/software/bpg or via CRAN at https://cran.r-project.org/web/packages/BoutrosLab.plotting.general.
View details for DOI 10.1186/s12859-019-2610-2
View details for PubMedID 30665349
View details for PubMedCentralID PMC6341661
We conducted the largest investigation of predisposition variants in cancer to date, discovering 853 pathogenic or likely pathogenic variants in 8% of 10,389 cases from 33 cancer types. Twenty-one genes showed single or cross-cancer associations, including novel associations of SDHA in melanoma and PALB2 in stomach adenocarcinoma. The 659 predisposition variants and 18 additional large deletions in tumor suppressors, including ATM, BRCA1, and NF1, showed low gene expression and frequent (43%) loss of heterozygosity or biallelic two-hit events. We also discovered 33 such variants in oncogenes, including missenses in MET, RET, and PTPN11 associated with high gene expression. We nominated 47 additional predisposition variants from prioritized VUSs supported by multiple evidences involving case-control frequency, loss of heterozygosity, expression effect, and co-localization with mutations and modified residues. Our integrative approach links rare predisposition variants to functional consequences, informing future guidelines of variant classification and germline genetic testing in cancer.
View details for DOI 10.1016/j.cell.2018.03.039
View details for PubMedID 29625052
View details for PubMedCentralID PMC5949147
The phenotypes of cancer cells are driven in part by somatic structural variants. Structural variants can initiate tumors, enhance their aggressiveness, and provide unique therapeutic opportunities. Whole-genome sequencing of tumors can allow exhaustive identification of the specific structural variants present in an individual cancer, facilitating both clinical diagnostics and the discovery of novel mutagenic mechanisms. A plethora of somatic structural variant detection algorithms have been created to enable these discoveries; however, there are no systematic benchmarks of them. Rigorous performance evaluation of somatic structural variant detection methods has been challenged by the lack of gold standards, extensive resource requirements, and difficulties arising from the need to share personal genomic information.To facilitate structural variant detection algorithm evaluations, we create a robust simulation framework for somatic structural variants by extending the BAMSurgeon algorithm. We then organize and enable a crowdsourced benchmarking within the ICGC-TCGA DREAM Somatic Mutation Calling Challenge (SMC-DNA). We report here the results of structural variant benchmarking on three different tumors, comprising 204 submissions from 15 teams. In addition to ranking methods, we identify characteristic error profiles of individual algorithms and general trends across them. Surprisingly, we find that ensembles of analysis pipelines do not always outperform the best individual method, indicating a need for new ways to aggregate somatic structural variant detection approaches.The synthetic tumors and somatic structural variant detection leaderboards remain available as a community benchmarking resource, and BAMSurgeon is available at https://github.com/adamewing/bamsurgeon .
View details for DOI 10.1186/s13059-018-1539-5
View details for PubMedID 30400818
View details for PubMedCentralID PMC6219177
The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge.The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases.The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.
View details for DOI 10.1186/s12859-018-2046-0
View details for PubMedID 29385983
View details for PubMedCentralID PMC5793408
The c-MYC (MYC) oncoprotein is deregulated in over 50% of cancers, yet regulatory mechanisms controlling MYC remain unclear. To this end, we interrogated the MYC interactome using BioID mass spectrometry (MS) and identified PP1 (protein phosphatase 1) and its regulatory subunit PNUTS (protein phosphatase-1 nuclear-targeting subunit) as MYC interactors. We demonstrate that endogenous MYC and PNUTS interact across multiple cell types and that they co-occupy MYC target gene promoters. Inhibiting PP1 by RNAi or pharmacological inhibition results in MYC hyperphosphorylation at multiple serine and threonine residues, leading to a decrease in MYC protein levels due to proteasomal degradation through the canonical SCFFBXW7 pathway. MYC hyperphosphorylation can be rescued specifically with exogenous PP1, but not other phosphatases. Hyperphosphorylated MYC retained interaction with its transcriptional partner MAX, but binding to chromatin is significantly compromised. Our work demonstrates that PP1/PNUTS stabilizes chromatin-bound MYC in proliferating cells.
View details for DOI 10.1038/s41467-018-05660-0
View details for PubMedID 30158517
View details for PubMedCentralID PMC6115416
Platform-specific error profiles necessitate confirmatory studies where predictions made on data generated using one technology are additionally verified by processing the same samples on an orthogonal technology. However, verifying all predictions can be costly and redundant, and testing a subset of findings is often used to estimate the true error profile.To determine how to create subsets of predictions for validation that maximize accuracy of global error profile inference, we developed Valection, a software program that implements multiple strategies for the selection of verification candidates. We evaluated these selection strategies on one simulated and two experimental datasets.Valection is implemented in multiple programming languages, available at: http://labs.oicr.on.ca/boutros-lab/software/valection.
View details for DOI 10.1186/s12859-018-2391-z
View details for PubMedID 30253747
View details for PubMedCentralID PMC6157051
The majority of newly diagnosed prostate cancers are slow growing, with a long natural life history. Yet a subset can metastasize with lethal consequences. We reconstructed the phylogenies of 293 localized prostate tumors linked to clinical outcome data. Multiple subclones were detected in 59% of patients, and specific subclonal architectures associate with adverse clinicopathological features. Early tumor development is characterized by point mutations and deletions followed by later subclonal amplifications and changes in trinucleotide mutational signatures. Specific genes are selectively mutated prior to or following subclonal diversification, including MTOR, NKX3-1, and RB1. Patients with low-risk monoclonal tumors rarely relapse after primary therapy (7%), while those with high-risk polyclonal tumors frequently do (61%). The presence of multiple subclones in an index biopsy may be necessary, but not sufficient, for relapse of localized prostate cancer, suggesting that evolution-aware biomarkers should be studied in prospective studies of low-risk tumors suitable for active surveillance.
View details for DOI 10.1016/j.cell.2018.03.029
View details for PubMedID 29681457
Prostate tumours are highly variable in their response to therapies, but clinically available prognostic factors can explain only a fraction of this heterogeneity. Here we analysed 200 whole-genome sequences and 277 additional whole-exome sequences from localized, non-indolent prostate tumours with similar clinical risk profiles, and carried out RNA and methylation analyses in a subset. These tumours had a paucity of clinically actionable single nucleotide variants, unlike those seen in metastatic disease. Rather, a significant proportion of tumours harboured recurrent non-coding aberrations, large-scale genomic rearrangements, and alterations in which an inversion repressed transcription within its boundaries. Local hypermutation events were frequent, and correlated with specific genomic profiles. Numerous molecular aberrations were prognostic for disease recurrence, including several DNA methylation events, and a signature comprised of these aberrations outperformed well-described prognostic biomarkers. We suggest that intensified treatment of genomically aggressive localized prostate cancer may improve cure rates.
View details for DOI 10.1038/nature20788
View details for PubMedID 28068672
2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) is the most potent congener of the dioxin class of environmental contaminants. Exposure to TCDD causes a wide range of toxic outcomes, ranging from chloracne to acute lethality. The severity of toxicity is highly dependent on the aryl hydrocarbon receptor (AHR). Binding of TCDD to the AHR leads to changes in transcription of numerous genes. Studies evaluating the transcriptional changes brought on by TCDD may provide valuable insight into the role of the AHR in human health and disease. We therefore compiled a collection of transcriptomic datasets that can be used to aid the scientific community in better understanding the transcriptional effects of ligand-activated AHR.Specifically, we have created a datasets package - TCDD.Transcriptomics - for the R statistical environment, consisting of 63 unique experiments comprising 377 samples, including various combinations of 3 species (human derived cell lines, mouse and rat), 4 tissue types (liver, kidney, white adipose tissue and hypothalamus) and a wide range of TCDD exposure times and doses. These datasets have been fully standardized using consistent preprocessing and annotation packages (available as of September 14, 2015). To demonstrate the utility of this R package, a subset of "AHR-core" genes were evaluated across the included datasets. Ahrr, Nqo1 and members of the Cyp family were significantly induced following exposure to TCDD across the studies as expected while Aldh3a1 was induced specifically in rat liver. Inmt was altered only in liver tissue and primarily by rat-AHR.Analysis of the "AHR-core" genes demonstrates a continued need for studies surrounding the impact of AHR-activity on the transcriptome; genes believed to be consistently regulated by ligand-activated AHR show surprisingly little overlap across species and tissues. Until now, a comprehensive assessment of the transcriptome across these studies was challenging due to differences in array platforms, processing methods and annotation versions. We believe that this package, which is freely available for download ( http://labs.oicr.on.ca/boutros-lab/tcdd-transcriptomics ) will prove to be a highly beneficial resource to the scientific community evaluating the effects of TCDD exposure as well as the variety of functions of the AHR.
View details for DOI 10.1186/s12864-016-3446-z
View details for PubMedID 28086803
View details for PubMedCentralID PMC5237151
Polychlorinated dibenzodioxins are environmental contaminants commonly produced as a by-product of industrial processes. The most potent of these, 2,3,7,8-tetrachlorodibenzo-ρ-dioxin (TCDD), is highly lipophilic, leading to bioaccumulation. White adipose tissue (WAT) is a major site for energy storage, and is one of the organs in which TCDD accumulates. In laboratory animals, exposure to TCDD causes numerous metabolic abnormalities, including a wasting syndrome. We therefore investigated the molecular effects of TCDD exposure on WAT by profiling the transcriptomic response of WAT to 100μg/kg of TCDD at 1 or 4days in TCDD-sensitive Long-Evans (Turku/AB; L-E) rats. A comparative analysis was conducted simultaneously in identically treated TCDD-resistant Han/Wistar (Kuopio; H/W) rats one day after exposure to the same dose. We sought to identify transcriptomic changes coinciding with the onset of toxicity, while gaining additional insight into later responses. More transcriptional responses to TCDD were observed at 4days than at 1day post-exposure, suggesting WAT shows mostly secondary responses. Two classic AHR-regulated genes, Cyp1a1 and Nqo1, were significantly induced by TCDD in both strains, while several genes involved in the immune response, including Ms4a7 and F13a1 were altered in L-E rats alone. We compared genes affected by TCDD in rat WAT and human adipose cells, and observed little overlap. Interestingly, very few genes involved in lipid metabolism exhibited altered expression levels despite the pronounced lipid mobilization from peripheral fat pads by TCDD in L-E rats. Of these genes, the lipolysis-associated Lpin1 was induced slightly over 2-fold in L-E rat WAT on day 4.
View details for DOI 10.1016/j.taap.2015.07.018
View details for PubMedID 26232522
The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.
View details for DOI 10.1038/nmeth.3407
View details for PubMedID 25984700
View details for PubMedCentralID PMC4856034
In some mammals, halogenated aromatic hydrocarbon (HAH) exposure causes wasting syndrome, defined as significant weight loss associated with lethal outcomes. The most potent HAH in causing wasting is 2,3,7,8-tetrachlorodibenzo-ρ-dioxin (TCDD), which exerts its toxic effects through the aryl hydrocarbon receptor (AHR). Since TCDD toxicity is thought to predominantly arise from dysregulation of AHR-transcribed genes, it was hypothesized that wasting syndrome is a result of to TCDD-induced dysregulation of genes involved in regulation of food-intake. As the hypothalamus is the central nervous systems' regulatory center for food-intake and energy balance. Therefore, mRNA abundances in hypothalamic tissue from two rat strains with widely differing sensitivities to TCDD-induced wasting syndrome: TCDD-sensitive Long-Evans rats and TCDD-resistant Han/Wistar rats, 23h after exposure to TCDD (100μg/kg) or corn oil vehicle. TCDD exposure caused minimal transcriptional dysregulation in the hypothalamus, with only 6 genes significantly altered in Long-Evans rats and 15 genes in Han/Wistar rats. Two of the most dysregulated genes were Cyp1a1 and Nqo1, which are induced by TCDD across a wide range of tissues and are considered sensitive markers of TCDD exposure. The minimal response of the hypothalamic transcriptome to a lethal dose of TCDD at an early time-point suggests that the hypothalamus is not the predominant site of initial events leading to hypophagia and associated wasting. TCDD may affect feeding behaviour via events upstream or downstream of the hypothalamus, and further work is required to evaluate this at the level of individual hypothalamic nuclei and subregions.
View details for DOI 10.1016/j.tox.2014.12.016
View details for PubMedID 25529477
Research on the aryl hydrocarbon receptor (AHR) has largely focused on variations in toxic outcomes resulting from its activation by halogenated aromatic hydrocarbons. But the AHR also plays key roles in regulating pathways critical for development, and after decades of research the mechanisms underlying physiological regulation by the AHR remain poorly characterized. Previous studies identified several core genes that respond to xenobiotic AHR ligands across a broad range of species and tissues. However, only limited inferences have been made regarding its role in regulating constitutive gene activity, i.e. in the absence of exogenous ligands. To address this, we profiled transcriptomic variations between AHR-active and AHR-less-active animals in the absence of an exogenous agonist across five tissues, three of which came from rats (hypothalamus, white adipose and liver) and two of which came from mice (kidney and liver). Because AHR status alone has been shown sufficient to alter transcriptomic responses, we reason that by contrasting profiles amongst AHR-variant animals, we may elucidate effects of the AHR on constitutive mRNA abundances.We found significantly more overlap in constitutive mRNA abundances amongst tissues within the same species than from tissues between species and identified 13 genes (Agt, Car3, Creg1, Ctsc, E2f6, Enpp1, Gatm, Gstm4, Kcnj8, Me1, Pdk1, Slc35a3, and Sqrdl) that are affected by AHR-status in four of five tissues. One gene, Creg1, was significantly up-regulated in all AHR-less-active animals. We also find greater overlap between tissues at the pathway level than at the gene level, suggesting coherency to the AHR signalling response within these processes. Analysis of regulatory motifs suggests that the AHR mostly mediates transcriptional regulation via direct binding to response elements.These findings, though preliminary, present a platform for further evaluating the role of the AHR in regulation of constitutive mRNA levels and physiologic function.
View details for DOI 10.1186/1471-2164-15-1053
View details for PubMedID 25467400
View details for PubMedCentralID PMC4301818
Latest information on COVID-19