Workshop in Biostatistics

DATE: May 26, 2016
TIME: 1:30 - 3:00 pm
LOCATION: Medical School Office Building, Rm x303
TITLE: Differentially Private Data Analysis
SPEAKER: Cynthia Dwork
Distinguished Scientist, Microsoft Research


Differential privacy (2006), a mathematically rigorous definition of privacy tailored to situations in which data are plentiful, has provided a theoretically sound and powerful framework for privacy-preserving data analysis, and given rise to an explosion of research.  Signal properties of differential privacy include its resilience to arbitrary side information and the ability to understand and control cumulative privacy loss over multiple statistical analyses.  These properties thwart both the linkage attack of Sweeney (1997) on the medical records of William Weld and the GWAS attacks of Homer et al (2008) that resulted in changes to NIH policy regarding the release of aggregate statistics. Of greater scientific interest, the composition property  permits us to "program" using differentially private "building blocks" while tracking and controlling the cumulative privacy loss suffered by any member of the dataset, allowing the construction of differentially private methods to carry out complex analytic tasks.

Suggested readings:

The first three chapters of Dwork and Roth, The Algorithmic Foundations of Differential Privacy, available here:

A very recent negative result: Dwork, Smith, Steinke, and Ullman, Robust Traceability from Trace Amounts, available here:

A differentially private version of the LASSO: Smith and Thakurta, Differentially Private Feature Seclection via Sability Arguments, and the Robustness of the Lasso, available here:

A use of differential privacy when privacy is not itself a concern: Dwork, Hardt, Feldman, Pitassi, Reingold, and Roth, The Reusable Holdout: Preserving Validity in Adaptive Data Analysis, available here: