Health Research and Policy

Abstract

DATE: January 23, 2014
TIME: 1:15 - 3:00 pm
LOCATION: Medical School Office Building, Rm x303
TITLE: Modeling and Accounting for Coverage Bias in High-throughput Sequencing
SPEAKER: Yuval Benjamini, Stein Fellow
Department of Statistics, Stanford

High-throughput sequencing is rapidly becoming the technology of choice for estimating the abundance of templates in a DNA sample. Important examples include quantifying gene-expression (RNA-seq), identifying aberrations in the genome of tumor cells (copy number estimation), and finding candidate protein binding sites (ChIP-seq). These estimates depend on the sequencing coverage: the number of fragments covering each base should reflect the proportions of DNA in the sample, assuming all fragments have an equal chance of getting sequenced and mapped (1 sample approach). Alternatively, the ratio between coverage under two different conditions should reflect the ratio of template abundance in those samples (2 sample approach).

Unfortunately, we now know that the distribution of mapped fragments is not uniform, and, furthermore, the patterns of this coverage bias often vary across samples. It is therefore necessary to understand the nature of these biases in order to minimize them experimentally, normalize for them, and modify the estimation procedures to account for them.

In this workshop I will describe different forms of coverage biases in the contexts of copy number estimation and RNA-seq, focusing on the GC-content bias, the observed dependence between sequencing coverage and the proportion of G and C bases in the template. I will discuss modeling techniques for such biases, estimation methods, and ideas regarding their correction and normalization.

Suggested readings:
D. Bentley and others (2008), "Accurate whole human genome sequencing using reversible terminator chemistry." Nature 456, 53-59.

S. Yoon, Z. Xuan, V. Makarov, K. Ye and J. Sebat (2009), "Sensitive and accurate detection of copy number variants using read depth of coverage." Genome Res. 2009 Sep 19(9):1586-92.

A. Roberts, C. Trapnell, J. Donaghey, J.L. Rinn and L. Pachter (2011), "Improving RNA-Seq expression estimates by correcting for fragment bias". Genome Biology, 12(3):R22.

Y. Benjamini and T.P. Speed (2012), "Summarizing and correcting the GC-content bias in high throughput sequencing". Nucleic Acids Research 40(10):e72.

Stanford Medicine Resources:

Footer Links: