BSc (Hons) Physics, University of Manchester, UK
PhD Theoretical particle physics, University of Southampton, UK
BSc (Hons) Physics, University of Manchester, UK
PhD Theoretical particle physics, University of Southampton, UK
Computational systems biology of human disease. Particular focus on integration of high-throughput datasets with each other, and with phenotypic information and clinical outcomes.
The availability of increasing volumes of multi-omics profiles across many cancers promises to improve our understanding of the regulatory mechanisms underlying cancer. The main challenge is to integrate these multiple levels of omics profiles and especially to analyze them across many cancers. Here we present AMARETTO, an algorithm that addresses both challenges in three steps. First, AMARETTO identifies potential cancer driver genes through integration of copy number, DNA methylation and gene expression data. Then AMARETTO connects these driver genes with co-expressed target genes that they control, defined as regulatory modules. Thirdly, we connect AMARETTO modules identified from different cancer sites into a pancancer network to identify cancer driver genes. Here we applied AMARETTO in a pancancer study comprising eleven cancer sites and confirmed that AMARETTO captures hallmarks of cancer. We also demonstrated that AMARETTO enables the identification of novel pancancer driver genes. In particular, our analysis led to the identification of pancancer driver genes of smoking-induced cancers and 'antiviral' interferon-modulated innate immune response.AMARETTO is available as an R package at https://bitbucket.org/gevaertlab/pancanceramaretto.
View details for DOI 10.1016/j.ebiom.2017.11.028
View details for PubMedID 29331675
Purpose To create a radiogenomic map linking computed tomographic (CT) image features and gene expression profiles generated by RNA sequencing for patients with non-small cell lung cancer (NSCLC). Materials and Methods A cohort of 113 patients with NSCLC diagnosed between April 2008 and September 2014 who had preoperative CT data and tumor tissue available was studied. For each tumor, a thoracic radiologist recorded 87 semantic image features, selected to reflect radiologic characteristics of nodule shape, margin, texture, tumor environment, and overall lung characteristics. Next, total RNA was extracted from the tissue and analyzed with RNA sequencing technology. Ten highly coexpressed gene clusters, termed metagenes, were identified, validated in publicly available gene-expression cohorts, and correlated with prognosis. Next, a radiogenomics map was built that linked semantic image features to metagenes by using the t statistic and the Spearman correlation metric with multiple testing correction. Results RNA sequencing analysis resulted in 10 metagenes that capture a variety of molecular pathways, including the epidermal growth factor (EGF) pathway. A radiogenomic map was created with 32 statistically significant correlations between semantic image features and metagenes. For example, nodule attenuation and margins are associated with the late cell-cycle genes, and a metagene that represents the EGF pathway was significantly correlated with the presence of ground-glass opacity and irregular nodules or nodules with poorly defined margins. Conclusion Radiogenomic analysis of NSCLC showed multiple associations between semantic image features and metagenes that represented canonical molecular pathways, and it can result in noninvasive identification of molecular properties of NSCLC. Online supplemental material is available for this article.
View details for DOI 10.1148/radiol.2017161845
View details for PubMedID 28727543
View details for PubMedCentralID PMC5749594
Understanding the relative contributions of genetic and epigenetic abnormalities to acute myeloid leukemia (AML) should assist integrated design of targeted therapies. In this study, we generated induced pluripotent stem cells (iPSCs) from AML patient samples harboring MLL rearrangements and found that they retained leukemic mutations but reset leukemic DNA methylation/gene expression patterns. AML-iPSCs lacked leukemic potential, but when differentiated into hematopoietic cells, they reacquired the ability to give rise to leukemia in vivo and reestablished leukemic DNA methylation/gene expression patterns, including an aberrant MLL signature. Epigenetic reprogramming was therefore not sufficient to eliminate leukemic behavior. This approach also allowed us to study the properties of distinct AML subclones, including differential drug susceptibilities of KRAS mutant and wild-type cells, and predict relapse based on increased cytarabine resistance of a KRAS wild-type subclone. Overall, our findings illustrate the value of AML-iPSCs for investigating the mechanistic basis and clonal properties of human AML.
View details for DOI 10.1016/j.stem.2016.11.018
View details for PubMedID 28089908
Head and neck squamous cell carcinoma (HNSCC) is broadly classified into HNSCC associated with human papilloma virus (HPV) infection, and HPV negative HNSCC, which is typically smoking-related. A subset of HPV negative HNSCCs occur in patients without smoking history, however, and these etiologically 'atypical' HNSCCs disproportionately occur in the oral cavity, and in female patients, suggesting a distinct etiology. To investigate the determinants of clinical and molecular heterogeneity, we performed unsupervised clustering to classify 528 HNSCC patients from The Cancer Genome Atlas (TCGA) into putative intrinsic subtypes based on their profiles of epigenetically (DNA methylation) deregulated genes. HNSCCs clustered into five subtypes, including one HPV positive subtype, two smoking-related subtypes, and two atypical subtypes. One atypical subtype was particularly genomically stable, but featured widespread gene silencing associated with the 'CpG island methylator phenotype' (CIMP). Further distinguishing features of this 'CIMP-Atypical' subtype include an antiviral gene expression profile associated with pro-inflammatory M1 macrophages and CD8+ T cell infiltration, CASP8 mutations, and a well-differentiated state corresponding to normal SOX2 copy number and SOX2OT hypermethylation. We developed a gene expression classifier for the CIMP-Atypical subtype that could classify atypical disease features in two independent patient cohorts, demonstrating the reproducibility of this subtype. Taken together, these findings provide unprecedented evidence that atypical HNSCC is molecularly distinct, and postulates the CIMP-Atypical subtype as a distinct clinical entity that may be caused by chronic inflammation.
View details for DOI 10.1016/j.ebiom.2017.02.025
View details for PubMedID 28314692
View details for PubMedCentralID PMC5360591
Chromatin modifying enzymes are frequently mutated in cancer, resulting in widespread epigenetic deregulation. Recent reports indicate that inactivating mutations in the histone methyltransferase NSD1 define an intrinsic subtype of head and neck squamous cell carcinoma (HNSC) that features pronounced DNA hypomethylation. Here, we describe a similar hypomethylated subtype of lung squamous cell carcinoma (LUSC) that is enriched for both inactivating mutations and deletions in NSD1. The 'NSD1 subtypes' of HNSC and LUSC are highly correlated at the DNA methylation and gene expression levels, featuring ectopic expression of developmental transcription factors and genes that are also hypomethylated in Sotos syndrome, a congenital disorder caused by germline NSD1 mutations. Further, the NSD1 subtype of HNSC displays an 'immune cold' phenotype characterized by low infiltration of tumor-associated leukocytes, particularly macrophages and CD8+ T cells, as well as low expression of genes encoding the immunotherapy target PD-1 immune checkpoint receptor and its ligands. Using an in vivo model, we demonstrate that NSD1 inactivation results in reduced T cell infiltration into the tumor microenvironment, implicating NSD1 as a tumor cell-intrinsic driver of an immune cold phenotype. NSD1 inactivation therefore causes epigenetic deregulation across cancer sites, and has implications for immunotherapy.
View details for DOI 10.1038/s41598-017-17298-x
View details for PubMedID 29213088
Gastric adenocarcinomas are associated with a poor prognosis due to the fact that the tumor has often metastasized by the time of diagnosis and prognostic markers are urgently needed to tailor treatment. We examined the expression of the mitotic spindle checkpoint protein BUB1 (budding uninhibited by benzimidazoles 1) and Ki-67 protein expression by immunohistochemistry in 218 patients with primary gastric adenocarcinomas. Tumors with low frequency of BUB1 expression were associated with larger tumor size (pT) (p < 0.001), higher incidence of lymph node metastases (pN) (p = 0.027), distant metastases (pM) (p = 0.006) and higher UICC stage (p < 0.001). Furthermore, BUB1 expression was inversely correlated with residual tumor stage (p = 0.038). Abundant BUB1 protein expression correlated with frequent Ki-67 protein expression (p < 0.001) and low BUB1 expression was associated with shorter survival (p < 0.001). Univariate and multivariate analyses confirmed BUB1 to be an independent prognostic marker in gastric cancer (p = 0.021).
View details for DOI 10.18632/oncotarget.19357
View details for PubMedID 29100315
View details for PubMedCentralID PMC5652709
In a recently published article in Genome Biology, Li and colleagues introduced TIMER, a gene expression deconvolution approach for studying tumor-infiltrating leukocytes (TILs) in 23 cancer types profiled by The Cancer Genome Atlas. Methods to characterize TIL biology are increasingly important, and the authors offer several arguments in favor of their strategy. Several of these claims warrant further discussion and highlight the critical importance of data normalization in gene expression deconvolution applications.Please see related Li et al correspondence: www.dx.doi.org/10.1186/s13059-017-1256-5 and Zheng correspondence: www.dx.doi.org/10.1186/s13059-017-1258-3.
View details for DOI 10.1186/s13059-017-1257-4
View details for PubMedID 28679399
Lung squamous cell carcinoma (LSCC) pathogenesis remains incompletely understood, and biomarkers predicting treatment response remain lacking. Here, we describe novel murine LSCC models driven by loss of Trp53 and Keap1, both of which are frequently mutated in human LSCCs. Homozygous inactivation of Keap1 or Trp53 promoted airway basal stem cell (ABSC) self-renewal, suggesting that mutations in these genes lead to expansion of mutant stem cell clones. Deletion of Trp53 and Keap1 in ABSCs, but not more differentiated tracheal cells, produced tumors recapitulating histologic and molecular features of human LSCCs, indicating that they represent the likely cell of origin in this model. Deletion of Keap1 promoted tumor aggressiveness, metastasis, and resistance to oxidative stress and radiotherapy (RT). KEAP1/NRF2 mutation status predicted risk of local recurrence after RT in patients with non-small lung cancer (NSCLC) and could be noninvasively identified in circulating tumor DNA. Thus, KEAP1/NRF2 mutations could serve as predictive biomarkers for personalization of therapeutic strategies for NSCLCs.We developed an LSCC mouse model involving Trp53 and Keap1, which are frequently mutated in human LSCCs. In this model, ABSCs are the cell of origin of these tumors. KEAP1/NRF2 mutations increase radioresistance and predict local tumor recurrence in radiotherapy patients. Our findings are of potential clinical relevance and could lead to personalized treatment strategies for tumors with KEAP1/NRF2 mutations. Cancer Discov; 7(1); 86-101. ©2016 AACR.This article is highlighted in the In This Issue feature, p. 1.
View details for PubMedID 27663899
View details for PubMedCentralID PMC5222718
Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma (NHL), yet 40-50% of patients will eventually succumb to their disease demonstrating a pressing need for novel therapeutic options. Gene expression profiling has identified messenger RNA's that lead to transformation, but critical events transforming cells are normally executed by kinases. Therefore, we hypothesized that previously unrecognized kinases may contribute to DLBCL pathogenesis. We performed the first comprehensive analysis of global kinase activity in DLBCL, to identify novel therapeutic targets, and discovered that Germinal Center Kinase (GCK) was extensively activated. GCK RNA interference and small molecule inhibition induced cell cycle arrest and apoptosis in DLBCL cell lines and primary tumors in vitro and decreased the tumor growth rate in vivo, resulting in a significantly extended lifespan of mice bearing DLBCL xenografts. GCK expression was also linked to adverse clinical outcome in a cohort of 151 primary DLBCL patients. These studies demonstrate, for the first time, that GCK is a molecular therapeutic target in DLBCL tumors and that inhibiting GCK may significantly extend DLBCL patient survival. Since the majority of DLBCL tumors (~80%) exhibit activation of GCK, this therapy may be applicable to most patients.
View details for DOI 10.1182/blood-2016-02-696856
View details for PubMedID 27151888
We present a computational framework, called DISCERN (DIfferential SparsE Regulatory Network), to identify informative topological changes in gene-regulator dependence networks inferred on the basis of mRNA expression datasets within distinct biological states. DISCERN takes two expression datasets as input: an expression dataset of diseased tissues from patients with a disease of interest and another expression dataset from matching normal tissues. DISCERN estimates the extent to which each gene is perturbed-having distinct regulator connectivity in the inferred gene-regulator dependencies between the disease and normal conditions. This approach has distinct advantages over existing methods. First, DISCERN infers conditional dependencies between candidate regulators and genes, where conditional dependence relationships discriminate the evidence for direct interactions from indirect interactions more precisely than pairwise correlation. Second, DISCERN uses a new likelihood-based scoring function to alleviate concerns about accuracy of the specific edges inferred in a particular network. DISCERN identifies perturbed genes more accurately in synthetic data than existing methods to identify perturbed genes between distinct states. In expression datasets from patients with acute myeloid leukemia (AML), breast cancer and lung cancer, genes with high DISCERN scores in each cancer are enriched for known tumor drivers, genes associated with the biological processes known to be important in the disease, and genes associated with patient prognosis, in the respective cancer. Finally, we show that DISCERN can uncover potential mechanisms underlying network perturbation by explaining observed epigenomic activity patterns in cancer and normal tissue types more accurately than alternative methods, based on the available epigenomic data from the ENCODE project.
View details for DOI 10.1371/journal.pcbi.1004888
View details for Web of Science ID 000379348100011
View details for PubMedID 27145341
View details for PubMedCentralID PMC4856318
Accurate survival stratification in early-stage non-small cell lung cancer (NSCLC) could inform the use of adjuvant therapy. We developed a clinically implementable mortality risk score incorporating distinct tumor microenvironmental gene expression signatures and clinical variables.Gene expression profiles from 1106 nonsquamous NSCLCs were used for generation and internal validation of a nine-gene molecular prognostic index (MPI). A quantitative polymerase chain reaction (qPCR) assay was developed and validated on an independent cohort of formalin-fixed paraffin-embedded (FFPE) tissues (n = 98). A prognostic score using clinical variables was generated using Surveillance, Epidemiology, and End Results data and combined with the MPI. All statistical tests for survival were two-sided.The MPI stratified stage I patients into prognostic categories in three microarray and one FFPE qPCR validation cohorts (HR = 2.99, 95% CI = 1.55 to 5.76, P < .001 in stage IA patients of the largest microarray validation cohort; HR = 3.95, 95% CI = 1.24 to 12.64, P = .01 in stage IA of the qPCR cohort). Prognostic genes were expressed in distinct tumor cell subpopulations, and genes implicated in proliferation and stem cells portended poor outcomes, while genes involved in normal lung differentiation and immune infiltration were associated with superior survival. Integrating the MPI with clinical variables conferred greatest prognostic power (HR = 3.43, 95% CI = 2.18 to 5.39, P < .001 in stage I patients of the largest microarray cohort; HR = 3.99, 95% CI = 1.67 to 9.56, P < .001 in stage I patients of the qPCR cohort). Finally, the MPI was prognostic irrespective of somatic alterations in EGFR, KRAS, TP53, and ALK.The MPI incorporates genes expressed in the tumor and its microenvironment and can be implemented clinically using qPCR assays on FFPE tissues. A composite model integrating the MPI with clinical variables provides the most accurate risk stratification.
View details for DOI 10.1093/jnci/djv211
View details for PubMedID 26286589
Molecular profiles of tumors and tumor-associated cells hold great promise as biomarkers of clinical outcomes. However, existing data sets are fragmented and difficult to analyze systematically. Here we present a pan-cancer resource and meta-analysis of expression signatures from ∼18,000 human tumors with overall survival outcomes across 39 malignancies. By using this resource, we identified a forkhead box MI (FOXM1) regulatory network as a major predictor of adverse outcomes, and we found that expression of favorably prognostic genes, including KLRB1 (encoding CD161), largely reflect tumor-associated leukocytes. By applying CIBERSORT, a computational approach for inferring leukocyte representation in bulk tumor transcriptomes, we identified complex associations between 22 distinct leukocyte subsets and cancer survival. For example, tumor-associated neutrophil and plasma cell signatures emerged as significant but opposite predictors of survival for diverse solid tumors, including breast and lung adenocarcinomas. This resource and associated analytical tools (http://precog.stanford.edu) may help delineate prognostic genes and leukocyte subsets within and across cancers, shed light on the impact of tumor heterogeneity on cancer outcomes, and facilitate the discovery of biomarkers and therapeutic targets.
View details for DOI 10.1038/nm.3909
View details for Web of Science ID 000359181000022
View details for PubMedID 26193342
We introduce CIBERSORT, a method for characterizing cell composition of complex tissues from their gene expression profiles. When applied to enumeration of hematopoietic subsets in RNA mixtures from fresh, frozen and fixed tissues, including solid tumors, CIBERSORT outperformed other methods with respect to noise, unknown mixture content and closely related cell types. CIBERSORT should enable large-scale analysis of RNA mixtures for cellular biomarkers and therapeutic targets (http://cibersort.stanford.edu/).
View details for DOI 10.1038/nmeth.3337
View details for PubMedID 25822800
BCR-ABL1(+) precursor B-cell acute lymphoblastic leukemia (BCR-ABL1(+) B-ALL) is an aggressive hematopoietic neoplasm characterized by a block in differentiation due in part to the somatic loss of transcription factors required for B-cell development. We hypothesized that overcoming this differentiation block by forcing cells to reprogram to the myeloid lineage would reduce the leukemogenicity of these cells. We found that primary human BCR-ABL1(+) B-ALL cells could be induced to reprogram into macrophage-like cells by exposure to myeloid differentiation-promoting cytokines in vitro or by transient expression of the myeloid transcription factor C/EBPα or PU.1. The resultant cells were clonally related to the primary leukemic blasts but resembled normal macrophages in appearance, immunophenotype, gene expression, and function. Most importantly, these macrophage-like cells were unable to establish disease in xenograft hosts, indicating that lineage reprogramming eliminates the leukemogenicity of BCR-ABL1(+) B-ALL cells, and suggesting a previously unidentified therapeutic strategy for this disease. Finally, we determined that myeloid reprogramming may occur to some degree in human patients by identifying primary CD14(+) monocytes/macrophages in BCR-ABL1(+) B-ALL patient samples that possess the BCR-ABL1(+) translocation and clonally recombined VDJ regions.
View details for DOI 10.1073/pnas.1413383112
View details for Web of Science ID 000351914500070
View details for PubMedID 25775523
View details for PubMedCentralID PMC4386392
Follicular lymphoma (FL) is incurable with conventional therapies and has a clinical course typified by multiple relapses after therapy. These tumors are genetically characterized by B-cell leukemia/lymphoma 2 (BCL2) translocation and mutation of genes involved in chromatin modification. By analyzing purified tumor cells, we identified additional novel recurrently mutated genes and confirmed mutations of one or more chromatin modifier genes within 96% of FL tumors and two or more in 76% of tumors. We defined the hierarchy of somatic mutations arising during tumor evolution by analyzing the phylogenetic relationship of somatic mutations across the coding genomes of 59 sequentially acquired biopsies from 22 patients. Among all somatically mutated genes, CREBBP mutations were most significantly enriched within the earliest inferable progenitor. These mutations were associated with a signature of decreased antigen presentation characterized by reduced transcript and protein abundance of MHC class II on tumor B cells, in line with the role of CREBBP in promoting class II transactivator (CIITA)-dependent transcriptional activation of these genes. CREBBP mutant B cells stimulated less proliferation of T cells in vitro compared with wild-type B cells from the same tumor. Transcriptional signatures of tumor-infiltrating T cells were indicative of reduced proliferation, and this corresponded to decreased frequencies of tumor-infiltrating CD4 helper T cells and CD8 memory cytotoxic T cells. These observations therefore implicate CREBBP mutation as an early event in FL evolution that contributes to immune evasion via decreased antigen presentation.
View details for DOI 10.1073/pnas.1501199112
View details for PubMedID 25713363
View details for PubMedCentralID PMC4364211
We define a new category of candidate tumor drivers in cancer genome evolution: 'selected expression regulators' (SERs)-genes driving dysregulated transcriptional programs in cancer evolution. The SERs are identified from genome-wide tumor expression data with a novel method, namely SPARROW ( SPAR: se selected exp R: essi O: n regulators identified W: ith penalized regression). SPARROW uncovers a previously unknown connection between cancer expression variation and driver events, by using a novel sparse regression technique. Our results indicate that SPARROW is a powerful complementary approach to identify candidate genes containing driver events that are hard to detect from sequence data, due to a large number of passenger mutations and lack of comprehensive sequence information from a sufficiently large number of samples. SERs identified by SPARROW reveal known driver mutations in multiple human cancers, along with known cancer-associated processes and survival-associated genes, better than popular methods for inferring gene expression networks. We demonstrate that when applied to acute myeloid leukemia expression data, SPARROW identifies an apoptotic biomarker (PYCARD) for an investigational drug obatoclax. The PYCARD and obatoclax association is validated in 30 AML patient samples.
View details for DOI 10.1093/nar/gku1290
View details for PubMedID 25583238
View details for PubMedCentralID PMC4330344
Acute myeloid leukemia (AML) is associated with deregulation of DNA methylation; however, many cases do not bear mutations in known regulators of CpG methylation. We found that mutations in WT1, IDH2, and CEBPA were strongly linked to DNA hypermethylation in AML using a novel integrative analysis of TCGA data based on Boolean implications, if-then rules that identify all individual CpG sites that are hypermethylated in the presence of a mutation. Introduction of mutant WT1 (WT1mut) into wildtype AML cells induced DNA hypermethylation, confirming mutant WT1 to be causally associated with DNA hypermethylation. Methylated genes in WT1mut primary patient samples were highly enriched for polycomb repressor complex 2 (PRC2) targets, implicating PRC2 dysregulation in WT1mut leukemogenesis. We found that PRC2 target genes were aberrantly repressed in WT1mut AML, and that expression of mutant WT1 in CD34+ cord blood cells induced myeloid differentiation block. Treatment of WT1mut AML cells with shRNA or pharmacologic PRC2/EZH2 inhibitors promoted myeloid differentiation, suggesting EZH2 inhibitors may be active in this AML subtype. Our results highlight a strong association between mutant WT1 and DNA hypermethylation in AML, and demonstrate that Boolean implications can be used to decipher mutation-specific methylation patterns that may lead to therapeutic insights.
View details for DOI 10.1182/blood-2014-03-566018
View details for PubMedID 25398938
Idiotypes (Ids), the unique portions of tumor immunoglobulins, can serve as targets for passive and active immunotherapies for lymphoma. We performed a multicenter, randomized trial comparing a specific vaccine (MyVax), comprising Id chemically coupled to keyhole limpet hemocyanin (KLH) plus granulocyte macrophage colony-stimulating factor (GM-CSF) to a control immunotherapy with KLH plus GM-CSF.Patients with previously untreated advanced-stage follicular lymphoma (FL) received eight cycles of chemotherapy with cyclophosphamide, vincristine, and prednisone. Those achieving sustained partial or complete remission (n=287 [44%]) were randomly assigned at a ratio of 2:1 to receive one injection per month for 7 months of MyVax or control immunotherapy. Anti-Id antibody responses (humoral immune responses [IRs]) were measured before each immunization. The primary end point was progression-free survival (PFS). Secondary end points included IR and time to subsequent antilymphoma therapy.At a median follow-up of 58 months, no significant difference was observed in either PFS or time to next therapy between the two arms. In the MyVax group (n=195), anti-Id IRs were observed in 41% of patients, with a median PFS of 40 months, significantly exceeding the median PFS observed in patients without such Id-induced IRs and in those receiving control immunotherapy.This trial failed to demonstrate clinical benefit of specific immunotherapy. The subset of vaccinated patients mounting specific anti-Id responses had superior outcomes. Whether this reflects a therapeutic benefit or is a marker for more favorable underlying prognosis requires further study.
View details for DOI 10.1200/JCO.2012.43.9273
View details for PubMedID 24799467
View details for PubMedCentralID PMC4039868
Follicular lymphoma (FL) is currently incurable using conventional chemotherapy or immunotherapy regimes, compelling new strategies. Advances in high-throughput sequencing technologies that can reveal oncogenic pathways have stimulated interest in tailoring therapies toward actionable somatic mutations. However, for mutation-directed therapies to be most effective, the mutations must be uniformly present in evolved tumor cells as well as in the self-renewing tumor-cell precursors. Here, we show striking intratumoral clonal diversity within FL tumors in the representation of mutations in the majority of genes as revealed by whole exome sequencing of subpopulations. This diversity captures a clonal hierarchy, resolved using immunoglobulin somatic mutations and IGH-BCL2 translocations as a frame of reference and by comparing diagnosis and relapse tumor pairs, allowing us to distinguish early versus late genetic eventsduring lymphomagenesis. We provide evidence that IGH-BCL2 translocations and CREBBP mutations are early events, whereas MLL2 and TNFRSF14 mutations probably represent late events during disease evolution. These observations provide insight into which of the genetic lesions represent suitable candidates for targeted therapies.
View details for DOI 10.1182/blood-2012-09-457283
View details for PubMedID 23297126
View details for PubMedCentralID PMC3587323
View details for Web of Science ID 000314049601071
View details for Web of Science ID 000313838900311
Infiltration of specialized immune cells regulates the growth and survival of neoplasia. Here, in a survey of public whole genome expression datasets we found that the gene for chemerin, a widely expressed endogenous chemoattractant protein, is down-regulated in melanoma as well as other human tumors. Moreover, high chemerin messenger RNA expression in tumors correlated with improved outcome in human melanoma. In experiments using the B16 transplantable mouse melanoma, tumor-expressed chemerin inhibited in vivo tumor growth without altering in vitro proliferation. Growth inhibition was associated with an altered profile of tumor-infiltrating cells with an increase in natural killer (NK) cells and a relative reduction in myeloid-derived suppressor cells and putative immune inhibitory plasmacytoid dendritic cells. Tumor inhibition required host expression of CMKLR1 (chemokine-like receptor 1), the chemoattractant receptor for chemerin, and was abrogated by NK cell depletion. Intratumoral injection of chemerin also inhibited tumor growth, suggesting the potential for therapeutic application. These results show that chemerin, whether expressed by tumor cells or within the tumor environment, can recruit host immune defenses that inhibit tumorigenesis and suggest that down-regulation of chemerin may be an important mechanism of tumor immune evasion.
View details for DOI 10.1084/jem.20112124
View details for Web of Science ID 000307016500006
View details for PubMedID 22753924
View details for PubMedCentralID PMC3409495
LMO2 regulates gene expression by facilitating the formation of multipartite DNA-binding complexes. In B cells, LMO2 is specifically up-regulated in the germinal center (GC) and is expressed in GC-derived non-Hodgkin lymphomas. LMO2 is one of the most powerful prognostic indicators in diffuse large B-cell (DLBCL) patients. However, its function in GC B cells and DLBCL is currently unknown. In this study, we characterized the LMO2 transcriptome and transcriptional complex in DLBCL cells. LMO2 regulates genes implicated in kinetochore function, chromosome assembly, and mitosis. Overexpression of LMO2 in DLBCL cell lines results in centrosome amplification. In DLBCL, the LMO2 complex contains some of the traditional partners, such as LDB1, E2A, HEB, Lyl1, ETO2, and SP1, but not TAL1 or GATA proteins. Furthermore, we identified novel LMO2 interacting partners: ELK1, nuclear factor of activated T-cells (NFATc1), and lymphoid enhancer-binding factor1 (LEF1) proteins. Reporter assays revealed that LMO2 increases transcriptional activity of NFATc1 and decreases transcriptional activity of LEF1 proteins. Overall, our studies identified a novel LMO2 transcriptome and interactome in DLBCL and provides a platform for future elucidation of LMO2 function in GC B cells and DLBCL pathogenesis.
View details for DOI 10.1182/blood-2012-01-403154
View details for Web of Science ID 000307391400022
View details for PubMedID 22517897
View details for PubMedCentralID PMC3369683
View details for Web of Science ID 000318009803142
CD47, a "don't eat me" signal for phagocytic cells, is expressed on the surface of all human solid tumor cells. Analysis of patient tumor and matched adjacent normal (nontumor) tissue revealed that CD47 is overexpressed on cancer cells. CD47 mRNA expression levels correlated with a decreased probability of survival for multiple types of cancer. CD47 is a ligand for SIRPα, a protein expressed on macrophages and dendritic cells. In vitro, blockade of CD47 signaling using targeted monoclonal antibodies enabled macrophage phagocytosis of tumor cells that were otherwise protected. Administration of anti-CD47 antibodies inhibited tumor growth in orthotopic immunodeficient mouse xenotransplantation models established with patient tumor cells and increased the survival of the mice over time. Anti-CD47 antibody therapy initiated on larger tumors inhibited tumor growth and prevented or treated metastasis, but initiation of the therapy on smaller tumors was potentially curative. The safety and efficacy of targeting CD47 was further tested and validated in immune competent hosts using an orthotopic mouse breast cancer model. These results suggest all human solid tumor cells require CD47 expression to suppress phagocytic innate immune surveillance and elimination. These data, taken together with similar findings with other human neoplasms, show that CD47 is a commonly expressed molecule on all cancers, its function to block phagocytosis is known, and blockade of its function leads to tumor cell phagocytosis and elimination. CD47 is therefore a validated target for cancer therapies.
View details for DOI 10.1073/pnas.1121623109
View details for Web of Science ID 000303249100065
View details for PubMedID 22451913
View details for PubMedCentralID PMC3340046
View details for Web of Science ID 000299597100439
The suppression of oncogenic levels of MYC is sufficient to induce sustained tumor regression associated with proliferative arrest, differentiation, cellular senescence, and/or apoptosis, a phenomenon known as oncogene addiction. However, after prolonged inactivation of MYC in a conditional transgenic mouse model of Eμ-tTA/tetO-MYC T-cell acute lymphoblastic leukemia, some of the tumors recur, recapitulating what is frequently observed in human tumors in response to targeted therapies. Here we report that these recurring lymphomas express either transgenic or endogenous Myc, albeit in many cases at levels below those in the original tumor, suggesting that tumors continue to be addicted to MYC. Many of the recurring lymphomas (76%) harbored mutations in the tetracycline transactivator, resulting in expression of the MYC transgene even in the presence of doxycycline. Some of the remaining recurring tumors expressed high levels of endogenous Myc, which was associated with a genomic rearrangement of the endogenous Myc locus or activation of Notch1. By gene expression profiling, we confirmed that the primary and recurring tumors have highly similar transcriptomes. Importantly, shRNA-mediated suppression of the high levels of MYC in recurring tumors elicited both suppression of proliferation and increased apoptosis, confirming that these tumors remain oncogene addicted. These results suggest that tumors induced by MYC remain addicted to overexpression of this oncogene.
View details for DOI 10.1073/pnas.1107303108
View details for Web of Science ID 000295975300044
View details for PubMedID 21969595
View details for PubMedCentralID PMC3198348
The AACR-NCI Conference "Systems Biology: Confronting the Complexity of Cancer" took place from February 27 to March 2, 2011, in San Diego, CA. Several themes resonated during the meeting, notably (i) the need for better methods to distill insights from large-scale networks, (ii) the importance of integrating multiple data types in constructing more realistic models, (iii) challenges in translating insights about tumorigenic mechanisms into therapeutic interventions, and (iv) the role of the tumor microenvironment, at the physical, cellular, and molecular levels. The meeting highlighted concrete applications of systems biology to cancer, and the value of collaboration between interdisciplinary researchers in attacking formidable problems.
View details for DOI 10.1158/0008-5472.CAN-11-1569
View details for Web of Science ID 000294843600019
View details for PubMedID 21896642
View details for PubMedCentralID PMC3174325
Several gene-expression signatures predict survival in diffuse large B-cell lymphoma (DLBCL), but the lack of practical methods for genome-scale analysis has limited translation to clinical practice. We built and validated a simple model using one gene expressed by tumor cells and another expressed by host immune cells, assessing added prognostic value to the clinical International Prognostic Index (IPI). LIM domain only 2 (LMO2) was validated as an independent predictor of survival and the "germinal center B cell-like" subtype. Expression of tumor necrosis factor receptor superfamily member 9 (TNFRSF9) from the DLBCL microenvironment was the best gene in bivariate combination with LMO2. Study of TNFRSF9 tissue expression in 95 patients with DLBCL showed expression limited to infiltrating T cells. A model integrating these 2 genes was independent of "cell-of-origin" classification, "stromal signatures," IPI, and added to the predictive power of the IPI. A composite score integrating these genes with IPI performed well in 3 independent cohorts of 545 DLBCL patients, as well as in a simple assay of routine formalin-fixed specimens from a new validation cohort of 147 patients with DLBCL. We conclude that the measurement of a single gene expressed by tumor cells (LMO2) and a single gene expressed by the immune microenvironment (TNFRSF9) powerfully predicts overall survival in patients with DLBCL.
View details for DOI 10.1182/blood-2011-03-345272
View details for Web of Science ID 000293510000028
View details for PubMedID 21670469
View details for PubMedCentralID PMC3152499
RS-EPI has been suggested as an alternative approach to EPI for high-resolution DWI with reduced distortions. To determine whether RS-EPI is a useful approach for routine clinical use, we implemented GRAPPA-accelerated RS-EPI DWI at our pediatric hospital and graded the images alongside standard accelerated (ASSET) EPI DWI used routinely for clinical studies.GRAPPA-accelerated RS-EPI DWIs and ASSET EPI DWIs were acquired on 35 pediatric patients using a 3T system in 35 pediatric patients. The images were graded alongside each other by using a 7-point Likert scale as follows: 1, nondiagnostic; 2, poor; 3, acceptable; 4, standard; 5, above average; 6, good; and 7, outstanding.The following were the average scores for EPI and RS-EPI, respectively: resolution, 3.5/5.2; distortion level, 2.9/6.0; SNR, 3.4/4.1; lesion conspicuity, 3.3/5.9; and diagnostic confidence, 3.2/6.0. Overall, the RS-EPI had significantly improved diagnostic confidence and more reliably defined the extent and structure of several lesions. Although ASSET EPI scans had better SNR per scanning time, the higher spatial resolution as well as reduced blurring and distortions on RS-EPI scans helped to better reveal important anatomic details at the cortical-subcortical levels, brain stem, temporal and inferior frontal lobes, skull base, sinonasal cavity, cranial nerves, and orbits.This work shows the importance of both resolution and decreased distortions in the clinics, which can be accomplished by a combination of parallel imaging and alternative k-space trajectories such as RS-EPI.
View details for DOI 10.3174/ajnr.A2481
View details for Web of Science ID 000294275100023
View details for PubMedID 21596809
In biological systems that undergo processes such as differentiation, a clear concept of progression exists. We present a novel computational approach, called Sample Progression Discovery (SPD), to discover patterns of biological progression underlying microarray gene expression data. SPD assumes that individual samples of a microarray dataset are related by an unknown biological process (i.e., differentiation, development, cell cycle, disease progression), and that each sample represents one unknown point along the progression of that process. SPD aims to organize the samples in a manner that reveals the underlying progression and to simultaneously identify subsets of genes that are responsible for that progression. We demonstrate the performance of SPD on a variety of microarray datasets that were generated by sampling a biological process at different points along its progression, without providing SPD any information of the underlying process. When applied to a cell cycle time series microarray dataset, SPD was not provided any prior knowledge of samples' time order or of which genes are cell-cycle regulated, yet SPD recovered the correct time order and identified many genes that have been associated with the cell cycle. When applied to B-cell differentiation data, SPD recovered the correct order of stages of normal B-cell differentiation and the linkage between preB-ALL tumor cells with their cell origin preB. When applied to mouse embryonic stem cell differentiation data, SPD uncovered a landscape of ESC differentiation into various lineages and genes that represent both generic and lineage specific processes. When applied to a prostate cancer microarray dataset, SPD identified gene modules that reflect a progression consistent with disease stages. SPD may be best viewed as a novel tool for synthesizing biological hypotheses because it provides a likely biological progression underlying a microarray dataset and, perhaps more importantly, the candidate genes that regulate that progression.
View details for DOI 10.1371/journal.pcbi.1001123
View details for Web of Science ID 000289973600007
View details for PubMedID 21533210
View details for PubMedCentralID PMC3077357
Hematopoietic tissues in acute myeloid leukemia (AML) patients contain both leukemia stem cells (LSC) and residual normal hematopoietic stem cells (HSC). The ability to prospectively separate residual HSC from LSC would enable important scientific and clinical investigation including the possibility of purged autologous hematopoietic cell transplants. We report here the identification of TIM3 as an AML stem cell surface marker more highly expressed on multiple specimens of AML LSC than on normal bone marrow HSC. TIM3 expression was detected in all cytogenetic subgroups of AML, but was significantly higher in AML-associated with core binding factor translocations or mutations in CEBPA. By assessing engraftment in NOD/SCID/IL2Rγ-null mice, we determined that HSC function resides predominantly in the TIM3-negative fraction of normal bone marrow, whereas LSC function from multiple AML specimens resides predominantly in the TIM3-positive compartment. Significantly, differential TIM3 expression enabled the prospective separation of HSC from LSC in the majority of AML specimens with detectable residual HSC function.
View details for DOI 10.1073/pnas.1100551108
View details for Web of Science ID 000288712200061
View details for PubMedID 21383193
View details for PubMedCentralID PMC3064328
In many cancers, specific subpopulations of cells appear to be uniquely capable of initiating and maintaining tumors. The strongest support for this cancer stem cell model comes from transplantation assays in immunodeficient mice, which indicate that human acute myeloid leukemia (AML) is driven by self-renewing leukemic stem cells (LSCs). This model has significant implications for the development of novel therapies, but its clinical relevance has yet to be determined.To identify an LSC gene expression signature and test its association with clinical outcomes in AML.Retrospective study of global gene expression (microarray) profiles of LSC-enriched subpopulations from primary AML and normal patient samples, which were obtained at a US medical center between April 2005 and July 2007, and validation data sets of global transcriptional profiles of AML tumors from 4 independent cohorts (n = 1047).Identification of genes discriminating LSC-enriched populations from other subpopulations in AML tumors; and association of LSC-specific genes with overall, event-free, and relapse-free survival and with therapeutic response.Expression levels of 52 genes distinguished LSC-enriched populations from other subpopulations in cell-sorted AML samples. An LSC score summarizing expression of these genes in bulk primary AML tumor samples was associated with clinical outcomes in the 4 independent patient cohorts. High LSC scores were associated with worse overall, event-free, and relapse-free survival among patients with either normal karyotypes or chromosomal abnormalities. For the largest cohort of patients with normal karyotypes (n = 163), the LSC score was significantly associated with overall survival as a continuous variable (hazard ratio [HR], 1.15; 95% confidence interval [CI], 1.08-1.22; log-likelihood P <.001). The absolute risk of death by 3 years was 57% (95% CI, 43%-67%) for the low LSC score group compared with 78% (95% CI, 66%-86%) for the high LSC score group (HR, 1.9 [95% CI, 1.3-2.7]; log-rank P = .002). In another cohort with available data on event-free survival for 70 patients with normal karyotypes, the risk of an event by 3 years was 48% (95% CI, 27%-63%) in the low LSC score group vs 81% (95% CI, 60%-91%) in the high LSC score group (HR, 2.4 [95% CI, 1.3-4.5]; log-rank P = .006). In multivariate Cox regression including age, mutations in FLT3 and NPM1, and cytogenetic abnormalities, the HRs for LSC score in the 3 cohorts with data on all variables were 1.07 (95% CI, 1.01-1.13; P = .02), 1.10 (95% CI, 1.03-1.17; P = .005), and 1.17 (95% CI, 1.05-1.30; P = .005).High expression of an LSC gene signature is independently associated with adverse outcomes in patients with AML.
View details for Web of Science ID 000285518000015
View details for PubMedID 21177505
Under normal physiological conditions, cellular homeostasis is partly regulated by a balance of pro- and anti-phagocytic signals. CD47, which prevents cancer cell phagocytosis by the innate immune system, is highly expressed on several human cancers including acute myeloid leukemia, non-Hodgkin's lymphoma, and bladder cancer. Blocking CD47 with a monoclonal antibody results in phagocytosis of cancer cells and leads to in vivo tumor elimination, yet normal cells remain mostly unaffected. Thus, we postulated that cancer cells must also display a potent pro-phagocytic signal. Here, we identified calreticulin as a pro-phagocytic signal that was highly expressed on the surface of several human cancers, but was minimally expressed on most normal cells. Increased CD47 expression correlated with high amounts of calreticulin on cancer cells and was necessary for protection from calreticulin-mediated phagocytosis. Blocking the interaction of target cell calreticulin with its receptor, low-density lipoprotein receptor-related protein, on phagocytic cells prevented anti-CD47 antibody-mediated phagocytosis. Furthermore, increased calreticulin expression was an adverse prognostic factor in diverse tumors including neuroblastoma, bladder cancer, and non-Hodgkin's lymphoma. These findings identify calreticulin as the dominant pro-phagocytic signal on several human cancers, provide an explanation for the selective targeting of tumor cells by anti-CD47 antibody, and highlight the balance between pro- and anti-phagocytic signals in the immune evasion of cancer.
View details for DOI 10.1126/scitranslmed.3001375
View details for Web of Science ID 000288444900003
View details for PubMedID 21178137
Deletions of chromosome 1p36 are one of the most frequently encountered subtelomeric alterations. Clinical features of monosomy 1p36 include neurocognitive impairment, hearing loss, seizures, cardiac defects, and characteristic facial features. The majority of cases have occurred sporadically, implying that genomic instability plays a role in the prevalence of the syndrome. Here, we report two siblings with mild phenotypic features of the deletion syndrome, including developmental delay, hearing loss, and left ventricular non-compaction (LVNC). Microarray analysis using bacterial artificial chromosome and oligonucleotide microarrays indicated the deletions were identical, suggesting germline mosaicism. Parental phenotypes were normal, and analysis by fluorescence in situ hybridization (FISH) did not show mosaicism. These small interstitial deletions were not detectable by conventional subtelomeric FISH analysis. To investigate the mechanism of deletion further, the breakpoints were cloned and sequenced, demonstrating the presence of a complex rearrangement. Sequence analysis of genes in the deletion interval did not reveal any mutations on the intact homologue that may have contributed to the LVNC seen in both children. This is the first report of apparent germline mosaicism for this disorder. Thus, our findings have important implications for diagnostic approaches and for recurrence risk counseling in families with a child with monosomy 1p36. In addition, our results further refine the minimal critical region for LVNC and hearing loss.
View details for DOI 10.1002/ajmg.a.33733
View details for Web of Science ID 000285251800019
View details for PubMedID 21108392
View details for PubMedCentralID PMC3058890
View details for Web of Science ID 000289662202229
Primary effusion lymphoma (PEL) is an aggressive B-cell lymphoma most commonly diagnosed in HIV-positive patients and universally associated with Kaposi's sarcoma-associated herpesvirus (KSHV). Chemotherapy treatment of PEL yields only short-term remissions in the vast majority of patients, but efforts to develop superior therapeutic approaches have been impeded by lack of animal models that accurately mimic human disease. To address this issue, we developed a direct xenograft model, UM-PEL-1, by transferring freshly isolated human PEL cells into the peritoneal cavities of NOD/SCID mice without in vitro cell growth to avoid the changes in KSHV gene expression evident in cultured cells. We used this model to show that bortezomib induces PEL remission and extends overall survival of mice bearing lymphomatous effusions. The proapoptotic effects of bortezomib are not mediated by inhibition of the prosurvival NF-kappaB pathway or by induction of a terminal unfolded protein response. Transcriptome analysis by genomic arrays revealed that bortezomib down-regulated cell-cycle progression, DNA replication, and Myc-target genes. Furthermore, we demonstrate that in vivo treatment with either bortezomib or doxorubicin induces KSHV lytic reactivation. These reactivations were temporally distinct, and this difference may help elucidate the therapeutic window for use of antivirals concurrently with chemotherapy. Our findings show that this direct xenograft model can be used for testing novel PEL therapeutic strategies and also can provide a rational basis for evaluation of bortezomib in clinical trials.
View details for DOI 10.1073/pnas.1002985107
View details for Web of Science ID 000280144500066
View details for PubMedID 20615981
View details for PubMedCentralID PMC2919898
Information theoretic approaches are increasingly being used for reconstructing regulatory networks from microarray data. These approaches start by computing the pairwise mutual information (MI) between all gene pairs. The resulting MI matrix is then manipulated to identify regulatory relationships. A barrier to these approaches is the time-consuming step of computing the MI matrix. We present a method to reduce this computation time. We apply spectral analysis to re-order the genes, so that genes that share regulatory relationships are more likely to be placed close to each other. Then, using a "sliding window" approach with appropriate window size and step size, we compute the MI for the genes within the sliding window, and the remainder is assumed to be zero. Using both simulated data and microarray data, we demonstrate that our method does not incur performance loss in regions of high-precision and low-recall, while the computational time is significantly lowered. The proposed method can be used with any method that relies on the mutual information to reconstruct networks.
View details for DOI 10.1089/cmb.2009.0052
View details for Web of Science ID 000279271200005
View details for PubMedID 20078227
View details for PubMedCentralID PMC3148830
Interleukin-21 (IL-21), a member of the IL-2 cytokine family, has diverse regulatory effects on natural killer (NK), T, and B cells. In contrast to other cytokines that are usually immunostimulatory, IL-21 can induce apoptosis of murine B cells at specific activation-differentiation stages. This effect may be used for treatment of B-cell malignancies. Herein we report that diffuse large B-cell lymphoma (DLBCL) cell lines exhibit widespread expression of the IL-21 receptor (IL-21R) and that IL-21 stimulation leads to cell-cycle arrest and caspase-dependent apoptosis. IL-21 also induces apoptosis in de novo DLBCL primary tumors but does not affect viability of human healthy B cells. Furthermore, IL-21 promotes tumor regression and prolongs survival of mice harboring xenograft DLBCL tumors. The antilymphoma effects of this cytokine are dependent on a mechanism involving IL-21-activated signal transducer and activator of transcription 3 (STAT3) up-regulating expression of c-Myc. This up-regulation promotes a decrease in expression of antiapoptotic Bcl-2 and Bcl-X(L) proteins triggering cell death. Our results represent one of the first examples in which the STAT3-c-Myc signaling pathway, which can promote survival and oncogenesis, can induce apoptosis in neoplastic cells. Moreover, based on IL-21's potency in vitro and in animal models, our findings indicate that this cytokine should be examined in clinical studies of DLBCL.
View details for DOI 10.1182/blood-2009-08-239996
View details for Web of Science ID 000273820600018
View details for PubMedID 19965678
View details for PubMedCentralID PMC2810990
View details for Web of Science ID 000272725800623
View details for Web of Science ID 000272725803506
Histologic transformation (HT) of follicular lymphoma to diffuse large B-cell lymphoma (DLBCL-t) is associated with accelerated disease course and drastically worse outcome, yet the underlying mechanisms are poorly understood. We show that a network of gene transcriptional modules underlies HT. Central to the network hierarchy is a signature strikingly enriched for pluripotency-related genes. These genes are typically expressed in embryonic stem cells (ESCs), including MYC and its direct targets. This core ESC-like program was independent of proliferation/cell-cycle and overlapped but was distinct from normal B-cell transcriptional programs. Furthermore, we show that the ESC program is correlated with transcriptional programs maintaining tumor phenotype in transgenic MYC-driven mouse models of lymphoma. Although our approach was to identify HT mechanisms rather than to derive an optimal survival predictor, a model based on ESC/differentiation programs stratified patient outcomes in 2 independent patient cohorts and was predictive of propensity of follicular lymphoma tumors to transform. Transformation was associated with an expression signature combining high expression of ESC transcriptional programs with reduced expression of stromal programs. Together, these findings suggest a central role for an ESC-like signature in the mechanism of HT and provide new clues for potential therapeutic targets.
View details for DOI 10.1182/blood-2009-02-202465
View details for Web of Science ID 000270595700007
View details for PubMedID 19636063
View details for PubMedCentralID PMC2759646
The mechanisms involved in the formation of subtelomeric rearrangements are now beginning to be elucidated. Breakpoint sequencing analysis of 1p36 rearrangements has made important contributions to this line of inquiry. Despite the unique architecture of segmental duplications inherent to human subtelomeres, no common mechanism has been identified thus far and different nonexclusive recombination-repair mechanisms seem to predominate. In order to gain further insights into the mechanisms of chromosome breakage, repair, and stabilization mediating subtelomeric rearrangements in humans, we investigated the constitutional rearrangements of 1p36. Cloning of the breakpoint junctions in a complex rearrangement and three non-reciprocal translocations revealed similarities at the junctions, such as microhomology of up to three nucleotides, along with no significant sequence identity in close proximity to the breakpoint regions. All the breakpoints appeared to be unique and their occurrence was limited to non-repetitive, unique DNA sequences. Several recombination- or cleavage-associated motifs that may promote non-homologous recombination were observed in close proximity to the junctions. We conclude that NHEJ is likely the mechanism of DNA repair that generates these rearrangements. Additionally, two apparently pure terminal deletions were also investigated, and the refinement of the breakpoint regions identified two distinct genomic intervals ~25-kb apart, each containing a series of 1p36 specific segmental duplications with 90-98% identity. Segmental duplications can serve as substrates for ectopic homologous recombination or stimulate genomic rearrangements.
View details for DOI 10.1007/s00439-009-0650-9
View details for Web of Science ID 000266261600006
View details for PubMedID 19271239
We present a new software implementation to more efficiently compute the mutual information for all pairs of genes from gene expression microarrays. Computation of the mutual information is a necessary first step in various information theoretic approaches for reconstructing gene regulatory networks from microarray data. When the mutual information is estimated by kernel methods, computing the pairwise mutual information is quite time-consuming. Our implementation significantly reduces the computation time. For an example data set of 336 samples consisting of normal and malignant B-cells, with 9563 genes measured per sample, the current available software for ARACNE requires 142 hours to compute the mutual information for all gene pairs, whereas our algorithm requires 1.6 hours. The increased efficiency of our algorithm improves the feasibility of applying mutual information based approaches for reconstructing large regulatory networks.
View details for DOI 10.1016/j.cmpb.2008.11.003
View details for Web of Science ID 000264951300007
View details for PubMedID 19167129
Characterization of patient-specific disease features at a molecular level is an important emerging field. Patients may be characterized by differences in the level and activity of relevant biomolecules in diseased cells. When high throughput, high dimensional data is available, it becomes possible to characterize differences not only in the level of the biomolecules, but also in the molecular interactions among them. We propose here a novel approach to characterize patient specific signaling, which augments high throughput single cell data with state nodes corresponding to patient and disease states, and learns a Bayesian network based on this data. Features distinguishing individual patients emerge as downstream nodes in the network. We illustrate this approach with a six phospho-protein, 30,000 cell-per-patient dataset characterizing three comparably diagnosed follicular lymphoma, and show that our approach elucidates signaling differences among them.
View details for Web of Science ID 000280543605113
View details for PubMedID 19963681
View details for PubMedCentralID PMC3124088
Approximately one in 500 individuals carries a reciprocal translocation. Balanced translocations are usually associated with a normal phenotype unless the translocation breakpoints disrupt a gene(s) or cause a position effect. We investigated breakpoint junctions at the sequence level in phenotypically normal balanced translocation carriers. Eight breakpoint junctions derived from four nonrelated subjects with apparently balanced translocation t(1;22)(p36;q13) were examined. Additions of nucleotides, deletions, duplications, and a triplication identified at the breakpoints demonstrate high complexity at the breakpoint junctions and indicate involvement of multiple mechanisms in the DNA breakage and repair process during translocation formation. Possible detailed nonhomologous end-joining scenarios for t(1;22) cases are presented. We propose that cryptic imbalances in phenotypically normal, balanced translocation carriers may be more common than currently appreciated.
View details for DOI 10.1101/gr.077453.108
View details for Web of Science ID 000260536100006
View details for PubMedID 18765821
View details for PubMedCentralID PMC2577863
MYC overexpression has been implicated in the pathogenesis of most types of human cancers. MYC is likely to contribute to tumorigenesis by its effects on global gene expression. Previously, we have shown that the loss of MYC overexpression is sufficient to reverse tumorigenesis. Here, we show that there is a precise threshold level of MYC expression required for maintaining the tumor phenotype, whereupon there is a switch from a gene expression program of proliferation to a state of proliferative arrest and apoptosis. Oligonucleotide microarray analysis and quantitative PCR were used to identify changes in expression in 3,921 genes, of which 2,348 were down-regulated and 1,573 were up-regulated. Critical changes in gene expression occurred at or near the MYC threshold, including genes implicated in the regulation of the G(1)-S and G(2)-M cell cycle checkpoints and death receptor/apoptosis signaling. Using two-dimensional protein analysis followed by mass spectrometry, phospho-flow fluorescence-activated cell sorting, and antibody arrays, we also identified changes at the protein level that contributed to MYC-dependent tumor regression. Proteins involved in mRNA translation decreased below threshold levels of MYC. Thus, at the MYC threshold, there is a loss of its ability to maintain tumorigenesis, with associated shifts in gene and protein expression that reestablish cell cycle checkpoints, halt protein translation, and promote apoptosis.
View details for DOI 10.1158/0008-5472.CAN-07-6192
View details for Web of Science ID 000257415300024
View details for PubMedID 18593912
We describe a method for extracting Boolean implications (if-then relationships) in very large amounts of gene expression microarray data. A meta-analysis of data from thousands of microarrays for humans, mice, and fruit flies finds millions of implication relationships between genes that would be missed by other methods. These relationships capture gender differences, tissue differences, development, and differentiation. New relationships are discovered that are preserved across all three species.
View details for Web of Science ID 000260587300020
View details for PubMedID 18973690
View details for PubMedCentralID PMC2760884
Short INterspersed Elements (SINEs) are non-autonomous retrotransposons, usually between 100 and 500 base pairs (bp) in length, which are ubiquitous components of eukaryotic genomes. Their activity, distribution, and evolution can be highly informative on genomic structure and evolutionary processes. To determine recent activity, we amplified more than one hundred SINE1 loci in a panel of 43 M. domestica individuals derived from five diverse geographic locations. The SINE1 family has expanded recently enough that many loci were polymorphic, and the SINE1 insertion-based genetic distances among populations reflected geographic distance. Genome-wide comparisons of SINE1 densities and GC content revealed that high SINE1 density is associated with high GC content in a few long and many short spans. Young SINE1s, whether fixed or polymorphic, showed an unbiased GC content preference for insertion, indicating that the GC preference accumulates over long time periods, possibly in periodic bursts. SINE1 evolution is thus broadly similar to human Alu evolution, although it has an independent origin. High GC content adjacent to SINE1s is strongly correlated with bias towards higher AT to GC substitutions and lower GC to AT substitutions. This is consistent with biased gene conversion, and also indicates that like chickens, but unlike eutherian mammals, GC content heterogeneity (isochore structure) is reinforced by substitution processes in the M. domestica genome. Nevertheless, both high and low GC content regions are apparently headed towards lower GC content equilibria, possibly due to a relative shift to lower recombination rates in the recent Monodelphis ancestral lineage. Like eutherians, metatherian (marsupial) mammals have evolved high CpG substitution rates, but this is apparently a convergence in process rather than a shared ancestral state.
View details for DOI 10.1016/j.gene.2007.02.028
View details for Web of Science ID 000247479000005
View details for PubMedID 17442506
The genome of the gray short-tailed opossum Monodelphis domestica is notable for its large size ( approximately 3.6 Gb). We characterized nearly 500 families of interspersed repeats from the Monodelphis. They cover approximately 52% of the genome, higher than in any other amniotic lineage studied to date, and may account for the unusually large genome size. In comparison to other mammals, Monodelphis is significantly rich in non-LTR retrotransposons from the LINE-1, CR1, and RTE families, with >29% of the genome sequence comprised of copies of these elements. Monodelphis has at least four families of RTE, and we report support for horizontal transfer of this non-LTR retrotransposon. In addition to short interspersed elements (SINEs) mobilized by L1, we found several families of SINEs that appear to use RTE elements for mobilization. In contrast to L1-mobilized SINEs, the RTE-mobilized SINEs in Monodelphis appear to shift from G+C-rich to G+C-low regions with time. Endogenous retroviruses have colonized approximately 10% of the opossum genome. We found that their density is enhanced in centromeric and/or telomeric regions of most Monodelphis chromosomes. We identified 83 new families of ancient repeats that are highly conserved across amniotic lineages, including 14 LINE-derived repeats; and a novel SINE element, MER131, that may have been exapted as a highly conserved functional noncoding RNA, and whose emergence dates back to approximately 300 million years ago. Many of these conserved repeats are also present in human, and are highly over-represented in predicted cis-regulatory modules. Seventy-six of the 83 families are present in chicken in addition to mammals.
View details for DOI 10.1101/gr.6070707
View details for Web of Science ID 000247701600004
View details for PubMedID 17495012
View details for PubMedCentralID PMC1899126
We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.
View details for DOI 10.1038/nature05805
View details for Web of Science ID 000246338700035
View details for PubMedID 17495919
Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Updating and maintenance of the database requires specialized tools, which we have created and made available for use with Repbase, and which may be useful as a template for other curated databases.We describe the software tools RepbaseSubmitter and Censor, which are designed to facilitate updating and screening the content of Repbase. RepbaseSubmitter is a java-based interface for formatting and annotating Repbase entries. It eliminates many common formatting errors, and automates actions such as calculation of sequence lengths and composition, thus facilitating curation of Repbase sequences. In addition, it has several features for predicting protein coding regions in sequences; searching and including Pubmed references in Repbase entries; and searching the NCBI taxonomy database for correct inclusion of species information and taxonomic position. Censor is a tool to rapidly identify repetitive elements by comparison to known repeats. It uses WU-BLAST for speed and sensitivity, and can conduct DNA-DNA, DNA-protein, or translated DNA-translated DNA searches of genomic sequence. Defragmented output includes a map of repeats present in the query sequence, with the options to report masked query sequence(s), repeat sequences found in the query, and alignments.Censor and RepbaseSubmitter are available as both web-based services and downloadable versions. They can be found at http://www.girinst.org/repbase/submission.html (RepbaseSubmitter) and http://www.girinst.org/censor/index.php (Censor).
View details for DOI 10.1186/1471-2105-7-474
View details for Web of Science ID 000241763200001
View details for PubMedID 17064419
View details for PubMedCentralID PMC1634758
Human processed pseudogenes are copies of cellular RNAs reverse transcribed and inserted into the nuclear genome by the enzymatic machinery of L1 (LINE1) non-LTR retrotransposons. Although it is generally accepted that germline expression is crucial for the heritable retroposition of cellular mRNAs, little is known about the influences of RNA stability, mRNA quality control and compartmentalization of translation on the retroposition of processed pseudogenes. We found that frequently retroposed human mRNAs are derived from stable transcripts with translation-competent functional reading frames that are resistant to nonsense-mediated RNA decay. They are preferentially translated on free cytoplasmic ribosomes and encode soluble proteins. Our results indicate that interactions between mRNAs and L1 proteins seem to occur at free cytoplasmic ribosomes.
View details for DOI 10.1016/j.tig.2005.11.005
View details for Web of Science ID 000235576900003
View details for PubMedID 16356584
View details for PubMedCentralID PMC1379630
We analyze minisatellites derived from Alu fragments corresponding approximately to the first 44 bases of human Alu consensus sequences from different subfamilies. The origin of Alu-derived minisatellites appears to have been mediated by short flanking repeats, as first proposed by Haber and Louis [Haber, J.E., Louis, E.J., 1998. Minisatellite origins in yeast and humans. Genomics 48, 132-135.]. We also present evidence for base substitutions and deletions introduced to minisatellites by gene conversion with partially similar but unrelated flanking regions. Segments flanked by short direct repeats are relatively common in different regions of Alu and other repetitive sequences. Our analysis shows that they can be effectively used in comparative studies of the overall sequence context which may contribute to instability of DNA segments flanked by short direct repeats.
View details for DOI 10.1016/j.gene.2005.09.029
View details for Web of Science ID 000236312100004
View details for PubMedID 16343813
Velo-cardio-facial syndrome/DiGeorge syndrome results from unequal crossing-over events between two 240-kb low-copy repeats termed LCR22 (LCR22-2 and LCR22-4) on Chromosome 22q11.2, comprised of modules, each of which are >99% identical in sequence. To delineate regions in the LCR22s that might contain hotspots for 22q11.2 rearrangements, we scanned the interval for increased rates of recombination with the hypothesis that these regions might be more prone to breakage. We generated an algorithm to detect sites of altered recombination by searching for single nucleotide polymorphic positions in BAC clones from different libraries mapped to LCR22-2 and LCR22-4. This method distinguishes single nucleotide polymorphisms from paralogous sequence variants and complex polymorphic positions. Sites of shared polymorphism are considered potential sites of gene conversion or double cross-over between the two LCR22s. We found an inverse correlation between regions of paralogous sequence variants that are unique to a given position within one LCR22 and clusters of shared polymorphic sites, suggesting that these clusters depict altered recombination and not remnants of ancestral single nucleotide polymorphisms. We postulate that most shared polymorphic sites are products of past transfers of DNA information between the LCR22s, suggesting that frequent traffic of genetic material may induce genomic instability in the two LCR22s. We also found that gaps up to 1.5 kb long can be transferred between LCR22s.
View details for DOI 10.1101/gr.4281205
View details for Web of Science ID 000232889400004
View details for PubMedID 16251458
View details for PubMedCentralID PMC1310636
Short interspersed elements (SINEs) make up a significant fraction of total DNA in mammalian genomes, providing a rich substrate for chromosomal rearrangements by SINE-SINE recombinations. Proliferation of mammalian SINEs is mediated primarily by long interspersed element 1 (L1) non-long terminal repeat retrotransposons that preferentially integrate at DNA sequence targets with an average length of approximately 15 bp and containing conserved endonucleolytic nicking signals at both ends. We report that sequence variations in the first of the two nicking signals, represented by a 5'-TT-AAAA consensus sequence, affect the position of the second signal thus leading to target site duplications (TSDs) of different lengths. The length distribution of TSDs appears to be affected also by L1-encoded enzyme variants because targets with the same 5' nicking site can be of different average lengths in different mammalian species. Taking this into account, we reanalyzed the second nicking site and found that it is larger and includes more conserved sites than previously appreciated, with a consensus of 5'-ANTNTN-AA. We also studied potential involvement of the nicking sites in stimulating recombinations between SINEs. We determined that SINEs retaining TSDs with perfect 5'-TT-AAAA nicking sites appear to be lost relatively rapidly from the human and rat genomes and less rapidly from dog. We speculate that the introduction of DNA breaks induced by recurring endonucleolytic attacks at these sites, combined with the ubiquitousness of SINEs, may significantly promote recombination between repetitive elements, leading to the observed losses. At the same time, new L1 subfamilies may be selected for "incompatibility" with preexisting targets. This provides a possible driving force for the continual emergence of new L1 subfamilies which, in turn, may affect selection of L1-dependent SINE subfamilies.
View details for DOI 10.1093/molbev/msi188
View details for Web of Science ID 000231826500005
View details for PubMedID 15944437
View details for PubMedCentralID PMC1400617
As we enter the post-genomic era, with the accelerating availability of complete genome sequences, new theoretical approaches and new experimental techniques, our ability to dissect cellular processes at the molecular level continues to expand. Recent advances include the application of RNA interference methods to characterize loss-of-function phenotype genes in higher eukaryotes, comparative analysis of the human and mouse genome sequences, and methods for reconciling contradictory phylogenetic reconstructions. New developments feed into the increasingly rich content of databases such as the COG database. The next phase of research will be increasingly dominated by efforts to integrate the deluge of data into our understanding of biological systems.
View details for DOI 10.1016/S0959-440X(03)00073-3
View details for Web of Science ID 000184112300011
View details for PubMedID 12831886
Overlapping gene groups (OGGs) arise when exons of one gene are contained within the introns of another. Typically, the two overlapping genes are encoded on opposite DNA strands. OGGs are often associated with specific disease phenotypes. In this report, we identify genes with OGG architecture and genes encoding multiple long amino acid runs and examine their relations to diseases. OGGs appear to be susceptible to genomic rearrangements as happens commonly with the loci of the DiGeorge syndrome on human chromosome 22. We also examine the degree of conservation of OGGs between human and mouse. Our analyses suggest that (i) a high proportion of genes in OGG regions are disease-associated, (ii) genomic rearrangements are likely to occur within OGGs, possibly as a consequence of anomalous sequence features prevalent in these regions, and (iii) multiple amino acid runs are also frequently associated with pathologies.
View details for DOI 10.1073/pnas.262658799
View details for Web of Science ID 000180101600090
View details for PubMedID 12473749
View details for PubMedCentralID PMC139260
Human chromosomes 21 and 22 (mainly the q-arms) were the first complete parts of the human genome released. Our analysis of genes, pseudogenes (Psig), and Alu repeats across these chromosomes include the following findings: The number of gene structures containing untranslated exons exceeds 25%; the terminal exon tends to be the largest among exons, whereas, the initial intron tends to be the largest among introns; single-exon gene length is approximately the mean gene exon number times the mean internal exon length; processed Psig lengths are on average approximately the same as single-exon gene length; and the G+C content and length of genes are uncorrelated. The counts and distribution of genes, Psig, and Alu sequences and G+C variation are evaluated with respect to clusters and overdispersions. Other assessments concern comparisons of intergenic lengths, properties of Psig sequences, and correlations between Alu and Psig sequences.
View details for DOI 10.1073/pnas.052692099
View details for Web of Science ID 000174284600062
View details for PubMedID 11867739
View details for PubMedCentralID PMC122450
We present a comparative proteome analysis of the five complete eukaryotic genomes (human, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana), focusing on individual and multiple amino acid runs, charge and hydrophobic runs. We found that human proteins with multiple long runs are often associated with diseases; these include long glutamine runs that induce neurological disorders, various cancers, categories of leukemias (mostly involving chromosomal translocations), and an abundance of Ca(2 +) and K(+) channel proteins. Many human proteins with multiple runs function in development and/or transcription regulation and are Drosophila homeotic homologs. A large number of these proteins are expressed in the nervous system. More than 80% of Drosophila proteins with multiple runs seem to function in transcription regulation. The most frequent amino acid runs in Drosophila sequences occur for glutamine, alanine, and serine, whereas human sequences highlight glutamate, proline, and leucine. The most frequent runs in yeast are of serine, glutamine, and acidic residues. Compared with the other eukaryotic proteomes, amino acid runs are significantly more abundant in the fly. This finding might be interpreted in terms of innate differences in DNA-replication processes, repair mechanisms, DNA-modification systems, and mutational biases. There are striking differences in amino acid runs for glutamine, asparagine, and leucine among the five proteomes.
View details for Web of Science ID 000173233300061
View details for PubMedID 11782551
View details for PubMedCentralID PMC117561
We examined dinucleotide relative abundances and their biases in recent sequences of eukaryotic genomes and chromosomes, including human chromosomes 21 and 22, Saccharomyces cerevisiae, Arabidopsis thaliana, and Drosophila melanogaster. We found that dinucleotide relative abundances are remarkably constant across human chromosomes and within the DNA of a particular species. The dinucleotide biases differ between species, providing a genome signature that is characteristic of the bulk properties of an organism's DNA. We detail the relations between species genome signatures and suggest possible mechanisms for their origin and maintenance.
View details for Web of Science ID 000167885400005
View details for PubMedID 11282969
View details for PubMedCentralID PMC311039