Current Research and Scholarly Interests
~1.5% of the human genome codes for proteins. What is the rest of the genome doing? How does it interact with human epigenomics? Does it affect disease?
The genome can be thought of as hardware specifying genes, and the epigenome is the complex software that governs how genes turn "on"? and "off". This software is driven by regulatory proteins such as transcription factors (TFs) and their interactions with DNA. These interactions can be assayed experimentally by biochemically targeting TFs bound to DNA, sequencing the DNA bound, and finding the location of that subsequence in our full DNA.
A common assay for TF binding is Chromatin Immunoprecipitation followed by Sequencing (ChIP-seq). ChIP-seq measures the genome-wide locations of a single TF, requires ~1 million cells, and >1 month of experimental work. There are hundreds of TFs and they activate and repress genes differently in hundreds of human cell types. It is not feasible to experimentally assay all of these combinations with ChIP-seq, and it would not work in precious clinically relevant samples where we cannot gather ~1 million cells. An alternative assay called ATAC-seq can map out the locations of many TFs in one experiment using <50,000 cells and only a few days of work. But ATAC-seq does not reveal which of the many proteins is present in each of the sites it detects. Working with my advisor Anshul Kundaje, I designed Deep Learning models that could overcome this limitation and predict the sites of tens of regulatory proteins from a single ATAC-seq experiment. Thus, our models can effectively infer the main results of many costly ChIP-seq experiments from a single cost-effective ATAC-seq experiment and enable these interrogations in precious samples.
The ability to infer specific mechanisms that regulate gene expression from cost-effective data such as ATAC-seq is opening the door to a wide range of interrogations in precious samples. Our collaborators are typically experimentalist who want to 1) gain more out of their cost-effective data and/or 2) already have data form ChIP-seq or a similar assay and use our Deep Learning expertise to mine that data to the fullest extent. For collaborations on these fronts, please contact myself at firstname.lastname@example.org or my advisor, Anshul Kundaje, at email@example.com