Genome Technology Center

Statistical methods for analyzing human genetic variation

Sriram Sankararaman
UC Berkeley

Advances in genotyping and sequencing technologies are enabling a deeper understanding of human genetic variation. At the same time, the impact of the availability of such data on the privacy of individuals has been a growing concern. In this talk, I will talk about statistical problems that arise from these two questions:

  1. Inferring locus-specific ancestry in admixed populations: Characterizing the genetic variation of recently admixed populations is an important step in the detection of SNPs associated with diseases through association studies and admixture mapping. Locus-specific ancestries are crucial to our understanding of the genetic variation of such populations. I will describe LAMP - a fast and accurate method for inference of locus-specific ancestries. LAMP can infer the ancestries even when the genotypes from the pure ancestral populations are unknown or unavailable. Empirical results show that LAMP is both accurate and efficient enabling it to handle whole-genome datasets. I will also show how the approach underlying LAMP can be extended to accurately infer ancestries in admixtures of closely-related populations (e.g. admixtures of Europeans ).
  2. Genomic privacy: Methods for the detection of an individual genotype in summary data from genome-wide association studies have recently been shown to have sufficient power to jeopardize the privacy of the study's subjects. I will present an analytical and empirical study of the statistical power of such methods. The analysis provides an upper bound on the statistical power achievable by any detection method and provides quantitative guidelines for researchers wishing to make a limited number of SNPs available publicly without compromising privacy. Further, the analytical and empirical results show that the maximum power attained when SNPs in whole-genome datasets are exposed is relatively low. Our work provides guidelines on how genomic data can be shared while preserving the privacy of individuals.

Footer Links: