Health Research and Policy


DATE: April 18, 2013
TIME: 1:15 - 3:00 pm
LOCATION: Medical School Office Building, Rm x303
TITLE: Normalization and Differential Expression in RNA-Seq
SPEAKER: Sandrine Dudoit
Professor of Biostatistics and Statistics, Chair and Head Graduate Advisor,
Graduate Group in Biostatistics, University of California, Berkeley

This talk concerns statistical methods for the analysis of RNA abundance by sequencing (RNA-Seq). We first present exploratory data analysis (EDA) approaches for quality assessment/control (QA/QC) of RNA-Seq reads. Next, we propose within-sample normalization methods to adjust for sample-specific gene-level effects such as length and GC-content. We also provide between-sample normalization procedures to account for distributional differences such as lane sequencing depth. Finally, we consider the quantitation of (differential) gene expression levels using generalized linear models (GLM). Our exploratory data analysis and normalization methods are implemented in the open-source Bioconductor R package EDASeq (

Suggested readings:
JH Bullard, E Purdom, KD Hansen and S Dudoit. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinfomatics 2010, 11:94.

D Risso, K Schwartz, G Sherlock and S Dudoit. GC-Content Normalization for RNA-Seq Data. BMC Bioinformatics 2011, 12:480.

MD Robinson, DJ McCarthy and GK Smyth. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (2010) 26(1): 139-140.

Stanford Medicine Resources:

Footer Links: