Bio

Current Role at Stanford


Senior Data Scientist

Honors & Awards


  • 2nd place in Stanford HealthAI hackathon, Stanford University (2019 Jan)
  • Travel grants, AACR (2015 April)
  • Travel Fellowship, International Society for Computational Biology (ISMB) meeting (2015)
  • Best poster award in Oncology retreat, Stanford University (2014 October)
  • Travel fellowships of School of Life Sciences, Arizona State University (2009 September)
  • GPSA Conference Travel Grant, Arizona State University (2008)
  • Travel Fellowship, RECOM Computational Cancer Biology (2007)
  • Dr. John and Rose Maher Alumni Scholarship for students involved in cancer research, Arizona State University (2006)

Education & Certifications


  • PhD, Arizona State University, Molecular & Cellular Biology with Bioinformatics (2012)
  • Master of Science, Arizona State University, Computational Biology (2005)
  • Bachelor of Science, Yonsei University, Biology (2002)

Patents


  • HoJoon Lee (co-inventor). "United States Patent US PTO 62/200,904 High Resolution STR analysis using Next Generation Sequencing", Leland Stanford Junior University

Professional

Professional Interests


My primary research interest is ?Cancer Treatment and Prevention through Precision Medicine? based on the analysis of genomic data of cancer patient for using genomics information to make proper clinical decisions. During my B.S in biology major, I was captivated by the fact that DNA ? digital information - codes the life. To gain computational skill, I joined computational biology program for master degree. I learned the principle and techniques of various sequence analysis such as sequence alignment models, molecular phylogenetics, and motif searching from estimating the neural mutation rate by comparing human and mouse genomes. To have impact on real life, I then applied my existing expertise to cancer sequencing data in order to identify neo-antigens for breast cancer vaccine development during my Ph.D. study. I learned how to analyze RNA-seq data by my own algorithm and organize/manage large data generated from the project using MySQL. As a post-doc at Stanford, I expanded my research to investigate the clinical implications of genomic features. I applied regularized regression (Elastic-net) to integrate multiple, heterogeneous genomic assays data from the Cancer Genome Atlas (TCGA) project and identify known and novel candidate drive mutations that predict tumor stage and other clinical parameters. As a research scientist, I lead a team to develop analysis pipeline to identify clonal neoantigens for clinical phase 1 trial of personalized immune therapy. Currently, I am building a bioinformatics pipeline to examine the landscape of T cell receptor (TCR) using single cell sequencing data. These tools will characterizes the immune phenotype in addition to clinical phenotypes.

Work Experience


  • Senior Research Engineer, Stanford University (3/16/2019 - Present)

    -Manage bioinformatics analysis pipeline for all genomics data
    -Lead a team to develop bioinformatics analysis pipeline for single cell immunogenomics data
    -Lead a team to develop dynamic representing the human reference genomes, which enable population level sequencing analysis
    -Lead a team to develop new cancer immune therapy targets (under IP process)

    Location

    Stanford, CA

  • Project Leader, Stanford University (3/16/2017 - 3/15/2019)

    -Developed bioinformatics pipeline to analyze single cell sequencing of T cell receptors (TCRs)
    -Developed bioinformatics pipeline to identify personalized neo-antigens for clinical phase 1 trial of immune therapy by pLADD. (licensing to Aduro)
    -Developed new algorithms to analyze sequencing reads for substitution, indels, gene fusions, copy number variation, and Cas9 mutagenesis
    -Developed analysis pipeline for whole genome sequencing from clinical samples with Intermountain Healthcare

    Location

    Stanford, CA

  • Post-doctoral fellow, Stanford University (3/16/2012 - 3/15/2017)

    -Set up bioinformatics tools on Amazon Web Service through Seven Bridge for immune-genomics analysis of the Cancer Genome Atlas (TCGA)
    -Identified clinically relevant genomic/proteomic changes from >10,000 samples of > 32 cancers in TCGA by integrative analysis.
    -Developed a web portal for the exploration of the clinical associations of the TCGA data; http://genomeportal.stanford.edu/pan-tcga
    -Designed the optimal probes for targeted sequencing such as STR-OS seq and digital droplet PCR

    Location

    Stanford, CA

  • Research Associate, Arizona State University (8/2005 - 2/2012)

    Worked in the cancer vaccine project that funded by Department of Defense (DoD) and Keck grant.
    -Developed algorithm to identify tumor-specific frame-shifted mutations derived from gene fusions, alternative splicing and insertion/deletion as neo antigens that could be used in vaccine.
    -Validated these putative candidates by molecular biology such as RT-PCR and cloning. Validated candidates were tested in mouse model.
    -Constructed database to organize all data using mysql with all available information such as epitopes, MHC binders, gene expression, exon structure, GO annotation and etc to evaluate them as vaccine antigens.

    Location

    Tempe, AZ

Publications

All Publications


  • CRISPRpic: fast and precise analysis for CRISPR-induced mutations via prefixed index counting. NAR genomics and bioinformatics Lee, H., Chang, H. Y., Cho, S. W., Ji, H. P. 2020; 2 (2): lqaa012

    Abstract

    Analysis of CRISPR-induced mutations at targeted locus can be achieved by polymerase chain reaction amplification followed by parallel massive sequencing. We developed a novel algorithm, named as CRISPRpic, to analyze the sequencing reads for the CRISPR experiments via counting exact-matching and pattern-searching. Compare to the other methods based on sequence alignment, CRISPRpic provides precise mutation calling and ultrafast analysis of the sequencing results. Python script of CRISPRpic is available at https://github.com/compbio/CRISPRpic.

    View details for DOI 10.1093/nargab/lqaa012

    View details for PubMedID 32118203

  • Whole genome analysis identifies the association of TP53 genomic deletions with lower survival in Stage III colorectal cancer. Scientific reports Xia, L. C., Van Hummelen, P., Kubit, M., Lee, H., Bell, J. M., Grimes, S. M., Wood-Bouwens, C., Greer, S. U., Barker, T., Haslem, D. S., Ford, J. M., Fulde, G., Ji, H. P., Nadauld, L. D. 2020; 10 (1): 5009

    Abstract

    DNA copy number aberrations (CNA) are frequently observed in colorectal cancers (CRC). There is an urgent need for CNA-based biomarkers in clinics,. n For Stage III CRC, if combined with imaging or pathologic evidence, these markers promise more precise care. We conducted this Stage III specific biomarker discovery with a cohort of 134 CRCs, and with a newly developed high-efficiency CNA profiling protocol. Specifically, we developed the profiling protocol for tumor-normal matched tissue samples based on low-coverage clinical whole-genome sequencing (WGS). We demonstrated the protocol's accuracy and robustness by a systematic benchmark with microarray, high-coverage whole-exome and -genome approaches, where the low-coverage WGS-derived CNA segments were highly accordant (PCC >0.95) with those derived from microarray, and they were substantially less variable if compared to exome-derived segments. A lasso-based model and multivariate cox regression analysis identified a chromosome 17p loss, containing the TP53 tumor suppressor gene, that was significantly associated with reduced survival (P?=?0.0139, HR?=?1.688, 95% CI?=?[1.112-2.562]), which was validated by an independent cohort of 187 Stage III CRCs. In summary, this low-coverage WGS protocol has high sensitivity, high resolution and low cost and the identified 17p-loss is an effective poor prognosis marker for Stage III patients.

    View details for DOI 10.1038/s41598-020-61643-6

    View details for PubMedID 32193467

  • Author Correction: RNA Transcription and Splicing Errors as a Source of Cancer Frameshift Neoantigens for Vaccines. Scientific reports Shen, L., Zhang, J., Lee, H., Batista, M. T., Johnston, S. A. 2020; 10 (1): 6251

    Abstract

    An amendment to this paper has been published and can be accessed via a link at the top of the paper.

    View details for DOI 10.1038/s41598-020-63114-4

    View details for PubMedID 32253381

  • RNA Transcription and Splicing Errors as a Source of Cancer Frameshift Neoantigens for Vaccines. Scientific reports Shen, L., Zhang, J., Lee, H., Batista, M. T., Johnston, S. A. 2019; 9 (1): 14184

    Abstract

    The success of checkpoint inhibitors in cancer therapy is largely attributed to activating the patient's immune response to their tumor's neoantigens arising from DNA mutations. This realization has motivated the interest in personal cancer vaccines based on sequencing the patient's tumor DNA to discover neoantigens. Here we propose an additional, unrecognized source of tumor neoantigens. We show that errors in transcription of microsatellites (MS) and mis-splicing of exons create highly immunogenic frameshift (FS) neoantigens in tumors. The sequence of these FS neoantigens are predictable, allowing creation of a peptide array representing all possible neoantigen FS peptides. This array can be used to detect the antibody response in a patient to the FS peptides. A survey of 5 types of cancers reveals peptides that are personally reactive for each patient. This source of neoantigens and the method to discover them may be useful in developing cancer vaccines.

    View details for DOI 10.1038/s41598-019-50738-4

    View details for PubMedID 31578439

  • Author Correction: RNA Transcription and Splicing Errors as a Source of Cancer Frameshift Neoantigens for Vaccines. Scientific reports Shen, L., Zhang, J., Lee, H., Batista, M. T., Johnston, S. A. 2019; 9 (1): 17815

    Abstract

    An amendment to this paper has been published and can be accessed via a link at the top of the paper.

    View details for DOI 10.1038/s41598-019-54300-0

    View details for PubMedID 31767927

  • Targeted short read sequencing and assembly of re-arrangements and candidate gene loci provide megabase diplotypes. Nucleic acids research Shin, G., Greer, S. U., Xia, L. C., Lee, H., Zhou, J., Boles, T. C., Ji, H. P. 2019

    Abstract

    The human genome is composed of two haplotypes, otherwise called diplotypes, which denote phased polymorphisms and structural variations (SVs) that are derived from both parents. Diplotypes place genetic variants in the context of cis-related variants from a diploid genome. As a result, they provide valuable information about hereditary transmission, context of SV, regulation of gene expression and other features which are informative for understanding human genetics. Successful diplotyping with short read whole genome sequencing generally requires either a large population or parent-child trio samples. To overcome these limitations, we developed a targeted sequencing method for generating megabase (Mb)-scale haplotypes with short reads. One selects specific 0.1-0.2 Mb high molecular weight DNA targets with custom-designed Cas9-guide RNA complexes followed by sequencing with barcoded linked reads. To test this approach, we designed three assays, targeting the BRCA1 gene, the entire 4-Mb major histocompatibility complex locus and 18 well-characterized SVs, respectively. Using an integrated alignment- and assembly-based approach, we generated comprehensive variant diplotypes spanning the entirety of the targeted loci and characterized SVs with exact breakpoints. Our results were comparable in quality to long read sequencing.

    View details for DOI 10.1093/nar/gkz661

    View details for PubMedID 31350896

  • Therapeutic Monitoring of Circulating DNA Mutations in Metastatic Cancer with Personalized Digital PCR. The Journal of molecular diagnostics : JMD Wood-Bouwens, C. M., Haslem, D., Moulton, B., Almeda, A. F., Lee, H., Heestand, G. M., Nadauld, L. D., Ji, H. P. 2019

    Abstract

    As a high-performance solution for longitudinal monitoring of patients being treated for metastatic cancer, we developed and a single-color digital PCR (dPCR) assay that detects and quantifies specific cancer mutations present in circulating tumor DNA (ctDNA). This customizable assay has a high sensitivity of detection. One can detect a mutation allelic fraction of 0.1%, equivalent to three mutation-bearing DNA molecules among 3,000 genome equivalents. The objective of this study was to validate the use of personalized dPCR mutation assays to monitor patients with metastatic cancer. We compared our digital PCR results to serum biomarkers indicating disease progression or response. Patients had metastatic colorectal, biliary, breast, lung and melanoma cancers. Mutations occurred in essential cancer drivers such as BRAF, KRAS and PIK3CA. We monitored patients over multiple cycles of treatment up to a year. All patients had detectable ctDNA mutations. Our results correlated with serum markers of metastatic cancer burden including CEA, CA-19-9, and CA-15-3, and qualitatively corresponding to imaging studies. We observed corresponding trends among these patients receiving active treatment with chemotherapy or targeted agents. For example, in one patient under active treatment, we detected increasing quantities of ctDNA molecules over time, indicating recurrence of tumor. Our study demonstrates that personalized digital PCR enables longitudinal monitoring of patients with metastatic cancer and maybe a useful indicator for treatment response.

    View details for DOI 10.1016/j.jmoldx.2019.10.008

    View details for PubMedID 31837432

  • SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution. GigaScience Xia, L. C., Ai, D., Lee, H., Andor, N., Li, C., Zhang, N. R., Ji, H. P. 2018

    Abstract

    Background: Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic fraction and haplotypes.Findings: We developed SVEngine, an open source tool to address this need. SVEngine simulates next generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs) and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletions, insertions, duplications, inversions and translocations. Finally, SVEngine simulates sequence data that replicates the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time.Conclusions: We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated SVEngine's accuracy by simulating genome-wide structural variants of NA12878 and a heterogenous cancer genome. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift and neighbouring hanging read pairs for representative variant types. Structural variant callers Lumpy and Manta and tumor heterogeneity estimator THetA2 were able to perform realistically on the simulated data. SVEngine is implemented as a standard Python package and is freely available for academic use at: https://bitbucket.org/charade/svengine.

    View details for PubMedID 29982625

  • Mapping the comprehensive landscape of missense-mutation neoantigens across the human genome Lee, H., Greer, S. U., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2018
  • Improved detection and identification of microsatellite instability features in colorectal cancer: Implications for immunotherapy Shin, G., Lee, H., Grimes, S. M., Kubit, M. A., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2018
  • High-quality CNV segments from low-coverage whole genome sequencing from FFPE cancer biopsies based on an evaluation of multiple CNV tools Lee, H., Xia, L., Greer, S., Bell, J., Grimes, S. M., Bouwens, C., Shin, G., Lau, B. C., Johnson, L., Andor, N., Day, K., Miller, M., Escobar, H., Nadauld, L., Ji, H. P., Van Hummelen, P. AMER ASSOC CANCER RESEARCH. 2018
  • CRISPR-Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis NATURE COMMUNICATIONS Shin, G., Grimes, S. M., Lee, H., Lau, B. T., Xia, L. C., Ji, H. P. 2017; 8

    Abstract

    Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR-Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples.

    View details for DOI 10.1038/ncomms14291

    View details for PubMedID 28169275

  • Single-Color Digital PCR Provides High-Performance Detection of Cancer Mutations from Circulating DNA. The Journal of molecular diagnostics : JMD Wood-Bouwens, C., Lau, B. T., Handy, C. M., Lee, H., Ji, H. P. 2017; 19 (5): 697?710

    Abstract

    We describe a single-color digital PCR assay that detects and quantifies cancer mutations directly from circulating DNA collected from the plasma of cancer patients. This approach relies on a double-stranded DNA intercalator dye and paired allele-specific DNA primer sets to determine an absolute count of both the mutation and wild-type-bearing DNA molecules present in the sample. The cell-free DNA assay uses an input of 1 ng of nonamplified DNA, approximately 300 genome equivalents, and has a molecular limit of detection of three mutation DNA genome-equivalent molecules per assay reaction. When using more genome equivalents as input, we demonstrated a sensitivity of 0.10% for detecting the BRAF V600E and KRAS G12D mutations. We developed several mutation assays specific to the cancer driver mutations of patients' tumors and detected these same mutations directly from the nonamplified, circulating cell-free DNA. This rapid and high-performance digital PCR assay can be configured to detect specific cancer mutations unique to an individual cancer, making it a potentially valuable method for patient-specific longitudinal monitoring.

    View details for PubMedID 28818432

  • The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations GENOME MEDICINE Lee, H., Palm, J., Grimes, S. M., Ji, H. P. 2015; 7

    Abstract

    The Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical-genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer.The Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user's input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that include clinical stage or smoking history.The Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.

    View details for DOI 10.1186/s13073-015-0226-3

    View details for Web of Science ID 000363619100002

    View details for PubMedID 26507825

    View details for PubMedCentralID PMC4624593

  • The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations. Genome medicine Lee, H., Palm, J., Grimes, S. M., Ji, H. P. 2015; 7 (1): 112-?

    Abstract

    The Cancer Genome Atlas (TCGA) project has generated genomic data sets covering over 20 malignancies. These data provide valuable insights into the underlying genetic and genomic basis of cancer. However, exploring the relationship among TCGA genomic results and clinical phenotype remains a challenge, particularly for individuals lacking formal bioinformatics training. Overcoming this hurdle is an important step toward the wider clinical translation of cancer genomic/proteomic data and implementation of precision cancer medicine. Several websites such as the cBio portal or University of California Santa Cruz genome browser make TCGA data accessible but lack interactive features for querying clinically relevant phenotypic associations with cancer drivers. To enable exploration of the clinical-genomic driver associations from TCGA data, we developed the Cancer Genome Atlas Clinical Explorer.The Cancer Genome Atlas Clinical Explorer interface provides a straightforward platform to query TCGA data using one of the following methods: (1) searching for clinically relevant genes, micro RNAs, and proteins by name, cancer types, or clinical parameters; (2) searching for genomic/proteomic profile changes by clinical parameters in a cancer type; or (3) testing two-hit hypotheses. SQL queries run in the background and results are displayed on our portal in an easy-to-navigate interface according to user's input. To derive these associations, we relied on elastic-net estimates of optimal multiple linear regularized regression and clinical parameters in the space of multiple genomic/proteomic features provided by TCGA data. Moreover, we identified and ranked gene/micro RNA/protein predictors of each clinical parameter for each cancer. The robustness of the results was estimated by bootstrapping. Overall, we identify associations of potential clinical relevance among genes/micro RNAs/proteins using our statistical analysis from 25 cancer types and 18 clinical parameters that include clinical stage or smoking history.The Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.

    View details for DOI 10.1186/s13073-015-0226-3

    View details for PubMedID 26507825

  • Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis BMC MEDICAL GENOMICS Lee, H., Flaherty, P., Ji, H. P. 2013; 6

    Abstract

    Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis.We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities.A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis.We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression.

    View details for DOI 10.1186/1755-8794-6-54

    View details for Web of Science ID 000328897400001

    View details for PubMedID 24308539

  • Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis. BMC medical genomics Lee, H., Flaherty, P., Ji, H. P. 2013; 6: 54-?

    Abstract

    Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis.We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities.A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis.We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression.

    View details for DOI 10.1186/1755-8794-6-54

    View details for PubMedID 24308539

Footer Links:

Stanford Medicine Resources: