Morning Session: Current Practice and Future Opportunities in Patient Privacy
8:30 - 9:00 am
Prof Michael Snyder, Department of Genetics, Stanford University
Big Data for Managing Health
We are using multiomics profiling that combines different state-of–the-art technologies to perform longitudinal detailed integrative personal omics profiles that make billions of measurements for a person. These measurements include whole genome sequences, transcriptome levels (transcriptome measures gene expression), multi site microbiome sequencing (to measure changes to skin, saliva, gut microbiome and virome), proteome levels (the proteome is the collection of proteins), and metabolomics (metabolomics is the collection of metabolites). We can zoom into a single individual’s data and observe longitudinal changes as individual undergo weight gain and loss. These detailed studies reveal individual health profiles and biomolecules and pathways that are present during health and those that change during disease and other perturbations (weight gain and loss). It is now also possible to integrate highly granular sensor data (e.g. activities, sleep, nutrition, heart rate, skin temperature, blood pressure) to study how the environment and individual lifestyle affects omics profile. The ability to bring multiple sources and many individuals together lends power to precision health.
9:00 - 9:30 AM
Prof. Jean-Pierre Hubaux, School of Computer and Communication Sciences, EPFL, Switzerland
Protecting Genomic Data
In this talk, we will describe the unique portfolio of solutions we have developed to protect genomic data, in tight collaboration with the Lausanne University Hospital (CHUV), the Swiss HIV cohort clinicians, the EPFL School of Life Science, a Lausanne-headquartered company called Sophia Genetics, and the GA4GH Security Working Group. We will first introduce a few relevant cryptographic techniques, including deterministic, semantically-secure, property-preserving, (partially) homomorphic encryption, as well as secure multi-party computation (we commit to make it fully understandable by a biomedical audience). We will then discuss possible architectures for genomic (and phenotypic) data generation, processing, and protection, and present the solutions we are currently deploying along the data pipeline.
We will then address the benefits (and limitations) of using partial homomorphic encryption for the protection of VCF files, and mention the pros and cons of the Paillier and ElGamal schemes. We will also discuss the potential of lattice-based encryption. Then, we will detail our solution for the protection of the Swiss HIV cohort and will report about the related survey that we made among the involved clinicians. With the goal of reconciling utility and privacy in genome-wide association studies, we will present a way to apply differential privacy by leveraging on bounded priors. Finally, we will discuss the protection of BAM files.
The community Web site we have set up on the topic of genome privacy and security can be found here.
9:30 - 10:00 AM
Prof Bradley Malin, Department of Biomedical Informatics, Vanderbilt University
Measuring and Managing Risks in Genomic Data Sharing
The past decade has led to dramatic advances in our ability to collect, share, and analyze genomic data. Historically, technical approaches to privacy in such settings have been achieved by removing explicit identifiers and publishing only summary statistics associated with association studies. Yet a growing collection of demonstrations have shown that simple “de-identification” can be cracked, calling into question the extent to which such protections should be relied upon. This issue has moved to the forefront of the debate on human subjects protections as life science researchers evolve from small scale studies to “big data” investigations that incorporate an array resources ranging from longitudinal medical records to sensor-based mobile computing platforms. However, there is a substantial difference between between what is possible and what is probable. As such, the goal of this talk is to investigate how such attacks on privacy transpire, how risks can be measured in the context of reasonable adversarial frameworks, and how we can mitigate them using a combination of social pressures and computational strategies that model, measure and mitigate such risks. This talk will draw upon examples from the speaker’s experiences with establishing the country’s largest de-identified biorepository tied to an electronic medical record system and directing a privacy research program for the NIH-sponsored Electronic Medical Records and Genomics (EMERE) consortia.
David Maher, Intertrust
Data Sharing and Genomic Privacy in Practice
Healthcare depends integrally on data sharing, especially in genomics. Studies have shown that approximately 96% of variants predicted to be functionally important are rare, with an allele frequency below 0.5%. As a result, it is unlikely that any single institution will hold enough samples to achieve sufficient statistical power. In this context, collaboration is a necessity, not an option.
In our enthusiasm to share data, however, we must not neglect the rights of patients to control the disclosure and use of their data. The inability of current systems to protect these rights is one of the primary impediments to data sharing. It is no exaggeration to say that until we begin to address the "privacy problem," we will not be able to recognize the true potential of the data we are collecting. We must begin to adopt pragmatic solutions today, even as our theoretical understanding of privacy problems is developing.
In this talk, I will describe some of these pragmatic solutions as they are being developed and deployed by members of the Global Alliance for Genomics and Health, as well as in my own project, Genecloud. Even when these techniques fail to provide for perfect privacy protection, they allow us to mitigate the potential damage and provide a means for forensic investigation of violations. Many of these measures can take advantage of emerging cryptographic technologies as they become more computationally feasible.
Morning Session (cont.): Current Practice and Future Opportunities in Patient Privacy
11:00 - 11:30 AM
Prof Carl Gunter, Department of Computer Science, University of Illinois
Genomic Personal Health Records (GPHRs)
Personal Health Records (PHRs) offer patients the chance to see key parts of their medical record, such as lab results and diagnoses. They are made available through portals on web pages and via downloadable documents and XML files. There is little experience with including genomic data in PHRs from medical systems, but this will be an important area in the future. PHRs that include or manage genomic data (GPHRs) can draw on ideas from Direct-To-Consumer genomics, where there is more experience with sharing such data with individuals, but will need to deal with its own special issues and opportunities concerning permissions for primary use in treatment and secondary use in research. This talk explores some of these issues about giving added control to individuals via GPHRs while the maintaining medical value and mitigating privacy risks.
11:30 - 12:00 PM
Robert Shelton, Private Access
Privacy and Access to Genomic Data in Perfect Balance
Individual privacy concerns and broad access by researchers to genomic data are commonly thought as being in conflict. But is this truly irreconcilable, or perhaps simply the consequence of attempting to employ out-of-date approaches to address a far more complex problem in the age of genomics? What if a technology could instead enable individually-mediated, fine-grained, dynamic access to otherwise confidential information within a broadly networked system, and thereby harmonize informational self-determination by patients and their clinicians, with incredibly fast, highly intuitive, and clearly permissible accessibility by properly authorized researchers?
Panel Discussion: How do privacy concerns complicate genomic studies?
1:00 - 2:00 pm
Moderator: Frank Nothaft, UC Berkeley
- Phil Tsao, Epidemiological Center for Research and Information for Genomics, VA Palo Alto Health Care System
- Nathaniel Pearson, New York Genome Center
- Alexander (Sasha) Wait Zaranek, Curoverse
- Devin Locke, Seven Bridges Genomics
- Michael Snyder, Stanford University
Afternoon Session: Computational Techniques for Enhancing Privacy
2:00 - 2:30 PM
Cynthia Dwork, Microsoft Research
Privacy, Accuracy, and Validity: When Right is Wrong and Wrong is Right
Beginning with a summary of advances on the class of privacy attacks famously identified by Homer et al (2008), we discuss differential privacy -- a definition of privacy tailored to statistical analysis of large datasets (2006). Signal properties of differential privacy include its resilience to arbitrary side information and the ability to understand cumulative privacy loss over multiple statistical analyses. Finally, we describe a tight connection between differential privacy and statistical validity under adaptive (exploratory) data analysis (2014).
2:30 - 3:00 pm
Kristin Lauter, Microsoft Research
How to keep your genome secret while allowing computation
Over the last 10 years, the cost of sequencing the human genome has come down to around $1,000 per person. Human genomic data is a gold-mine of information, potentially unlocking the secrets to human health and longevity. As a society, we face ethical and privacy questions related to how to handle human genomic data. Should it be aggregated and made available for medical research? What are the risks to individual's privacy? This talk will describe a mathematical solution for securely handling computation on genomic data, and highlight the results of a recent international contest in this area. The solution uses “Homomorphic Encryption", based on hard problems in number theory related to lattices.
3:00 - 3:30 PM
Prof XiaoFeng Wang, School of Informatics and Computing at Indiana University, Bloomington
Genome Privacy Challenges: Bringing Security Technologies to Biomedical Users
The growth of genome data and computational requirements overwhelm the capacity of servers. Many institutions and NIH are considering the cloud computing service as a cost-effective alternative to scale up research. Privacy and security are the major concerns when deploying cloud-based data analysis tools. In the past few years, progress has been made on privacy-preserving data sharing and computation technologies but there is still lack of understanding about the gap between what they can provide and what are expected by the biomedical community. In the past two year, the genome privacy team at Indiana University works together with the iDASH NIH NCBC center organized two genome privacy competitions. In this talk, I will provide information about these challenges and what we have learnt.
Panel Discussion: How do we cross the theoretician/practitioner divide?
4:00 - 5:00
Moderator: Frank Nothaft, UC Berkeley
- Brad Malin, Vanderbilt
- David Maher, Intertrust
- Kristin Lauter, Microsoft Research
- XiaoFeng Wang, Indiana University
- Suyash Shringapure, Stanford University
- Narges Bani Asadi, Bina (Roche)
- Robert Shelton, Private Access
Panel Discussion: Opportunities in Genomic Data Management and Privacy Considerations
5:00 - 6:00 PM
Moderator: Ursheet Parikh, Mayfield Fund
What kind of products are emerging from the new developments in genomic data management? Will they drive the adoption of privacy best practices or hinder the same? What will be the innovation driven by large company and where are the opportunities for startups?
- Andro Hsu, Syapse
- Todd Ferris, Stanford School of Medicine
- Jonathan Sheffi, Curoverse
- Ramon Felciano, Ingenuity (Qiagen)
- Sanjay Joshi, EMC
- James Williams, Google
- John Shon, Illumina
- Scott Burke, Helix