11/17/22 1:30PM-2:50PM
Lorin Crawford
Principal Researcher, Microsoft Research New England; Associate Professor of Biostatistics, Brown University


Machine Learning for Human Genetics: A Multi-Scale View on Complex Traits and Disease


A common goal in genome-wide association (GWA) studies is to characterize the relationship between genotypic and phenotypic variation. Linear models are widely used tools in GWA analyses, in part, because they provide significance measures which detail how individual single nucleotide polymorphisms (SNPs) are statistically associated with a trait or disease of interest. However, traditional linear regression largely ignores non-additive genetic variation, and the univariate SNP-level mapping approach has been shown to be underpowered and challenging to interpret for certain trait architectures. While machine learning (ML) methods such as neural networks are well known to account for complex data structures, these same algorithms have also been criticized as “black box” since they do not naturally carry out statistical hypothesis testing like classic linear models. This limitation has prevented ML approaches from being used for association mapping tasks in GWA applications. In this talk, we present flexible and scalable classes of Bayesian feedforward models which provide interpretable probabilistic summaries such as posterior inclusion probabilities and credible sets which allows researchers to simultaneously perform (i) fine- mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. While analyzing real data assayed in diverse self-identified human ancestries from the UK Biobank, the Biobank Japan, and the PAGE consortium we demonstrate that interpretable ML has the power to increase the return on investment in multi-ancestry biobanks. Furthermore, we highlight that by prioritizing biological mechanism we can identify associations that are robust across ancestries---suggesting that ML can play a key role in making personalized medicine a reality for all.


A.R. Martin, M. Kanai, Y. Kamatani, Y. Okada, B.M. Neale, and M.J. Daly (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nature Genetics. 51: 584–591.

S.P. Smith, S. Shahamatdar, W. Cheng, S. Zhang, J. Paik, M. Graff, C. Haiman, T.C. Matise, K.E. North, U. Peters, E. Kenny, C. Gignoux, G. Wojcik, L. Crawford, and S. Ramachandran (2022). Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries. American Journal of Human Genetics. 109: 871-884.

P. Demetci, W. Cheng, G. Darnell, X. Zhou, S. Ramachandran, and L. Crawford (2021). Multi-scale inference of genetic architecture using biologically annotated neural networks. PLOS Genetics. 17(8): e1009754.

Zoom link: &from=addon
Password: 705300

DBDS on Diversity

We are committed to our historical and ongoing mission to use biomedical data science to improve human health. A cornerstone of this mission is diversity, reflected in embracing a breadth of complementary research interests, research styles, and a diverse and inclusive community. DBDS recognizes that we have significant work to do in shaping our future as we work towards achieving justice, equity, diversity and inclusion throughout our work and operations, our research and activities, and our professional relationships and partnerships.

Stanford's Land Acknowledgment Statement

Stanford sits on the ancestral land of the Muwekma Ohlone Tribe. This land was and continues to be of great importance to the Ohlone people. Consistent with our values of community and inclusion, we have a responsibility to acknowledge, honor, and make visible the University’s relationship to Native peoples.

This acknowledgment has been developed in collaboration with the Muwekma Ohlone Tribe.