Bio: David Van Valen is an Assistant Professor in the Division of Biology and Bioengineering at Caltech. Before becoming faculty, he studied mathematics (B.S. 2003) and physics (B.S. 2003) at the Massachusetts Institute of Technology, applied physics (Ph.D. 2011) at Caltech, medicine (M.D. 2013) at UCLA, and bioengineering as a postdoctoral fellow at Stanford University. At Caltech, his research group develops new technologies at the intersection of imaging, genomics, and machine learning to produce quantitative measurements of living systems with single-cell resolution. David is the recipient of several awards, including a Hertz Fellowship (2005), a Rita Allen Scholar award (2020), A Pew-Stewart Cancer Research Scholar award (2021), a Heritage Medical Research Investigator award (2021), a Moore Inventor Fellowship (2021), and the NIH New Innovator award (2022).

Abstract: Biological systems are difficult to study because they consist of tens of thousands of parts, vary in space and time, and their fundamental unit—the cell—displays remarkable variation in its behavior. These challenges have spurred the development of genomics and imaging technologies over the past 30 years that have revolutionized our ability to capture information about biological systems in the form of images. Excitingly, these advances are poised to place the microscope back at the center of the modern biologist’s toolkit. Because we can now access temporal, spatial, and “parts list” variation via imaging, images have the potential to be a standard data type for biology.

For this vision to become reality, biology needs a new data infrastructure. Imaging methods are of little use if it is too difficult to convert the resulting data into quantitative, interpretable information. New deep learning methods are proving to be essential to reliable interpretation of imaging data. These methods differ from conventional algorithms in that they learn how to perform tasks from labeled data; they have demonstrated immense promise, but they are challenging to use in practice. The expansive training data required to power them are sorely lacking, as are easy-to-use software tools for creating and deploying new models. Solving these challenges through open software is a key goal of the Van Valen lab. In this talk, I describe DeepCell, a collection of software tools that meet the data, model, and deployment challenges associated with deep learning. These include tools for distributed labeling of biological imaging data, a collection of modern deep learning architectures tailored for biological image analysis tasks, and cloud-native software for making deep learning methods accessible to the broader life science community. I discuss how we have used DeepCell to label large-scale imaging datasets to power deep learning methods that achieve human level performance and enable new experimental designs for imaging-based experiments.


