|DATE:||October 27, 2016|
|TIME:||1:30 - 2:50 pm|
|LOCATION:||Medical School Office Building, Rm x303|
|TITLE:||Capture-recapture models for DNA sequencing experiments
Timothy Patrick Daley
Current technologies DNA sequencing experiments involve sampling genomic fragments from a large pool, called a library. The library is constructed from a small initial amount of DNA using ampliﬁcation procedures, hence each original fragment exists in the thousands or millions of copies. This ampliﬁcation, although necessary to produce enough material for the experiment, can introduce large biases and implies that the properties of the library cannot be known beforehand. Our goal is to infer properties of the experiment based on a small initial sample of the library. The capture-recapture framework naturally ﬁts this scenario, however next-generation sequencing experiments produce data several orders of magnitude larger than traditional capture-recapture experiments. This gives rise to challenges in extrapolating but also opportunities for for methods that utilize the size of the data for highly accurate inferences. We will discuss the application of non-parametric empirical Bayes models to predict critical aspects of sequencing experiments to allow for optimal allocation of sequencing resources in large-scale sequencing experiments.
Estimating the Number of Unsen Species: How Many Words Did Shakespeare Know?" by Efron & Thisted.
Predicting the molecular complexity of sequencing libraries (http://www.nature.com/nmeth/journal/v10/n4/abs/nmeth.2375.html),
Modeling genome coverage in single-cell sequencing (http://bioinformatics.oxfordjournals.org/content/30/22/3159), &
Applications of species accumulation curves in large-scale biological data analysis (http://link.springer.com/article/10.1007/s40484-015-0049-7).