I am a physicist by training and a biotechnologist by profession. I believe that with the explosion of data in healthcare and with new methods to analyze such large amounts of data, we will see massive changes in how human diseases are addressed via novel drugs, large scale genomics, wearable sensors, and software to tie it all together. I want to drive part of this revolution.

Prior to joining Stanford in 2012, I spent a dozen years at various biotechs in the Bay Area. This includes experiences as technology lead at Life Technologies (now Thermo Fisher) and founding team member of Verseon, a drug discovery company. Along the way, I have had fantastic opportunities to work alongside some of the smartest people in the field, learn from some of the most brilliant minds of our times, solve some fundamental technological problems, and delivered business impact.

Current Role at Stanford

I am currently the Director of Research IT at School of Medicine (SoM) Information Resources and Technology (IRT). This is a newly created division in support of Stanford's Precision Health Strategy. Research IT exists to supply infrastructure, tools, and services used by researchers, patients/participants, and clinicians to collect and combine data to make discoveries and to improve human health and wellness. A major effort of our group will be the creation of a large-scale Data Commons that strives to aggregate, link and provide access to a wide variety of data sets - clinical, imaging, omics, wearable, population health - to propel Big Data driven biomedical research and translation. This will be done in close collaboration with many groups throughout Stanford.

I joined Stanford in Oct 2012 as the Director of Bioinformatics at Stanford Center for Genomics and Personalized Medicine (SCGPM). My responsibility at the Center was to develop and lead the bioinformatics team and establish a genomics data analysis facility. Currently, SCGPM bioinformatics team is comprised of a dozen scientists and software engineers. The team has a wide range of skill sets including various omics, computational biology, machine learning, software engineering, data management, Databases, Visualization, High Performance Computing, IT, and Cloud DevOps. The team is currently supporting several large scale research and clinical programs at Stanford including prestigious consortium efforts and inter-disciplinary collaborations.

Among our various efforts at SCGPM is Genetics Bioinformatics Service Center, a Big Data Biomedical and Bioinformatics Core Facility, created in 2013 to streamline the availability of infrastructure to the wider biomedical community at Stanford and our research affiliates. The Core facility provides best-in-class high performance computational systems, scalable Cloud computing and cutting edge bioinformatics services for the Stanford community.

Education & Certifications

  • PhD, Boston University, MA, USA, Computational Physics (thesis on non-equilibrium statistical mechanics) (2000)
  • MSc, Indian Institute of Technology, Madras (aka Chennai), India, Physics (thesis on stochastic systems) (1994)
  • BSc, Jadavpur University, Calcutta (aka Kolkata), India, Physics, Math (1992)


All Publications

  • Digital Health: Tracking Physiomes and Activity Using Wearable Biosensors Reveals Useful Health-Related Information. PLoS biology Li, X., Dunn, J., Salins, D., Zhou, G., Zhou, W., Schüssler-Fiorenza Rose, S. M., Perelman, D., Colbert, E., Runge, R., Rego, S., Sonecha, R., Datta, S., McLaughlin, T., Snyder, M. P. 2017; 15 (1)


    A new wave of portable biosensors allows frequent measurement of health-related physiology. We investigated the use of these devices to monitor human physiological changes during various activities and their role in managing health and diagnosing and analyzing disease. By recording over 250,000 daily measurements for up to 43 individuals, we found personalized circadian differences in physiological parameters, replicating previous physiological findings. Interestingly, we found striking changes in particular environments, such as airline flights (decreased peripheral capillary oxygen saturation [SpO2] and increased radiation exposure). These events are associated with physiological macro-phenotypes such as fatigue, providing a strong association between reduced pressure/oxygen and fatigue on high-altitude flights. Importantly, we combined biosensor information with frequent medical measurements and made two important observations: First, wearable devices were useful in identification of early signs of Lyme disease and inflammatory responses; we used this information to develop a personalized, activity-based normalization framework to identify abnormal physiological signals from longitudinal data for facile disease detection. Second, wearables distinguish physiological differences between insulin-sensitive and -resistant individuals. Overall, these results indicate that portable biosensors provide useful information for monitoring personal activities and physiology and are likely to play an important role in managing health and enabling affordable health care access to groups traditionally limited by socioeconomic class or remote geography.

    View details for DOI 10.1371/journal.pbio.2001402

    View details for PubMedID 28081144

  • Cloud-based Interactive Analytics for Terabytes of Genomic Variants Data Bioinformatics Pan, C., McInnes, G., Deflaux, N., Snyder, M. P., Bingham, J., Datta, S., Tsao, P. S. 2017
  • Secure cloud computing for genomic data Nature Biotechnology Somalee, D., Keith, B., Michael, S. 2016; 34 (6): 588-91

    View details for DOI 10.1038/nbt.3496

  • Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data PLOS GENETICS Dewey, F. E., Grove, M. E., Priest, J. R., Waggott, D., Batra, P., Miller, C. L., Wheeler, M., Zia, A., Pan, C., Karzcewski, K. J., Miyake, C., Whirl-Carrillo, M., Klein, T. E., Datta, S., Altman, R. B., Snyder, M., Quertermous, T., Ashley, E. A. 2015; 11 (10)


    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework.

    View details for DOI 10.1371/journal.pgen.1005496

    View details for Web of Science ID 000364401600008

    View details for PubMedID 26448358

    View details for PubMedCentralID PMC4598191

  • The Integrative Human Microbiome Project: Dynamic Analysis of Microbiome-Host Omics Profiles during Periods of Human Health and Disease CELL HOST & MICROBE Proctor, L. M. 2014; 16 (3): 276-289


    Much has been learned about the diversity and distribution of human-associated microbial communities, but we still know little about the biology of the microbiome, how it interacts with the host, and how the host responds to its resident microbiota. The Integrative Human Microbiome Project (iHMP,, the second phase of the NIH Human Microbiome Project, will study these interactions by analyzing microbiome and host activities in longitudinal studies of disease-specific cohorts and by creating integrated data sets of microbiome and host functional properties. These data sets will serve as experimental test beds to evaluate new models, methods, and analyses on the interactions of host and microbiome. Here we describe the three models of microbiome-associated human conditions, on the dynamics of preterm birth, inflammatory bowel disease, and type 2 diabetes, and their underlying hypotheses, as well as the multi-omic data types to be collected, integrated, and distributed through public repositories as a community resource.

    View details for DOI 10.1016/j.chom.2014.08.014

    View details for Web of Science ID 000342057000006

    View details for PubMedID 25211071