The VA Big Data Genomics Group is a collaboration between GBSC, the VA Palo Alto Epidemiology Research and Information Center (ERIC), and Palo Alto Veterans Institute for Research (PAVIR) to apply big data technologies and algorithms to drawing medical insights from big genomic data.

As part of the VA Million Veteran Program (MVP), the Big Data Genomics group is leading the effort process, quality control, and release 150,000 whole sequenced genomes to the MVP research community. 

MVP Whole Genome Sequencing Data Release 1

Our initial release of whole-genome sequencing data to the MVP research community is coming in the first quarter of 2022. 

  • Quality Control Pipeline and Dataset Description: PowerPointPDF
  • Hail GWAS Notebook: Coming soon!

Trellis for Efficient Data and Task Management in the VA Million Veteran Program

To handle the petabytes of genomic data generated while analyzing whole genome sequencing data generated from 150,000 participants in the Million Veteran Program, we designed Trellis to automatically launch workflows and track the data and jobs generated by them.

Short Variant Calling for Whole-Genome Sequencing Data

We are using the GATK best-practices workflow to call short variants (SNVs + INDELS) for 150,000 MVP whole genomes. The workflow is run by Cromwell on Google Cloud Platform. Check on the progress of our variant calling by following the link to the variant calling dashboard.

Decoding the Genomics of Abdominal Aortic Aneurysm

We developed a machine-learning framework to integrate personal genomes and electronic health record (EHR) data, and used this framework to study abdominal aortic aneurysm (AAA), a prevalent cardiovascular disease with unclear etiology. Performing whole-genome sequencing on AAA patients and controls, we demonstrated its predictive precision solely from personal genomes.

Cloud-based interactive analytics for terabytes of genomic variants data

We demonstrated that big data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information.

Epidemiology Research and Information Center (ERIC) for Genomics

VA Palo Alto Health Care System (VAPAHCS) is a long term partner and affiliate of Stanford University. Stanford Center for Genomics and Personalized Medicine (SCGPM) and VA Palo Alto Epidemiology Research and Information Center (ERIC) for Genomics share a common mission in genomics driven precision health. The ERIC center is Directed by Philip Tsao, PhD and co-Directed by Lawrence Leung, MD. Both Directors also hold appointments at Stanford School of Medicine.

Established in 2015, the Palo Alto ERIC for Genomics is located on the campus of the VA Palo Alto Healthcare System. The ERIC is tasked with taking advantage of recent advances in obtaining data from a person's genes and applying it to the rich data associated with the electronic medical record.