Workshop in Biostatistics

Medical School Office Building (MSOB)
Rm x303

DATE: November 16, 2017
TIME: 1:30 - 2:50 pm
TITLE: Column subset selection for single-cell RNA-Seq clustering
Shannon McCurdy
Postdoctoral Scholar
California Institute for Quantitative Biosciences (QB3), UC Berkeley



The first step in the analysis of single-cell RNA sequencing (scRNA-Seq) is dimensionality reduction, which reduces noise and simplifies data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity and sparsity structures present in the original matrices, and the coordinates of projected cells are not easily interpretable. Commonly used thresholding methods avoid those pitfalls, but ignore collinearity and covariance in the original matrix. We show that a deterministic column subset selection (DCSS) method possesses many of the favorable properties of PCA and common thresholding methods, while avoiding pitfalls from both. We derive new spectral bounds for DCSS. We apply DCSS to two measures of gene expression from two scRNA-Seq experiments with different clustering workflows, and compare to three thresholding methods. In each case study, the clusters based on the small subset of the complete gene expression profile selected by DCSS are similar to clusters produced from the full set. The resulting clusters are informative for cell type.

Suggested readings:

Wagner, Allon, Aviv Regev, and Nir Yosef. “Revealing the Vectors of Cellular Identity with Single-Cell Genomics.” Nature Biotechnology 34, no. 11 (November 2016): 1145–60. doi:10.1038/nbt.3711.

Mahoney, Michael W., and Petros Drineas. “CUR Matrix Decompositions for Improved Data Analysis.” Proceedings of the National Academy of Sciences 106, no. 3 (January 20, 2009): 697–702. doi:10.1073/pnas.0803205106.

McCurdy, Shannon, Vasilis Ntranos, and Lior Pachter. “Column Subset Selection for Single-Cell RNA-Seq Clustering.” bioRxiv, July 3, 2017, 159079. doi:10.1101/159079.