Stanford researcher’s cryptography can preserve genetic privacy in criminal DNA profiling

Crime scene DNA analysis can help identify perpetrators, but current methods may divulge the genetic information of innocent people. Cryptography can protect genetic privacy without hampering law enforcement, Stanford researchers say.

An algorithm that law enforcement officials can download free of charge protects genetic privacy when samples are collected from potentially innocent suspects.
Fer Gregory/Shutterstock

Any armchair detective knows that DNA left at a crime scene can be a valuable way to identify a perpetrator — either by comparing the collected DNA with criminal DNA databases or with genetic samples collected from suspects or “persons of interest.”

Some criminal databases, however, retain DNA profiles found at the crime scene or collected from suspects even if they are eventually not linked to any crime. In some states, genetic samples are also collected and stored from people who have been arrested, but not necessarily convicted, for certain types of crime.

This process leads to the de facto genetic profiling of many thousands of innocent Americans, say researchers at Stanford Medicine, the University of Virginia and the Broad Institute. They add that doing so may be a significant civil rights infringement.

“We Americans consider our genomic signature to be private,” said Gill Bejerano, PhD, professor of developmental biology, of computer science and of pediatrics. “It’s clear that DNA analysis is an extremely effective and wonderful tool for crime solving. But, if you bank the DNA profile of every person you’ve questioned, you are in effect profiling whole sectors of the population by economic status, geographic location and race.”

Anonymous interaction

Bejerano and his colleagues have devised advanced cryptographic techniques to search for matches while maintaining the genetic privacy of the suspect. After the comparison is completed, checked DNA profiles that don’t match any in the database can be immediately discarded.

The research was published in Nature Computational Science April 26. Bejerano is a senior author along with David Wu, PhD, a former doctoral student at Stanford who is now an assistant professor of computer science at the University of Virginia. The first authors of the study are Jacob Blindenbach, an undergraduate student at the University of Virginia, and Karthik Jagadeesh, PhD, a postdoctoral scholar at the Broad Institute. 

“With this technique, we can query the database with an individual’s genetic profile without depositing the information into the database,” Bejerano said. “It’s an anonymous interaction, and that profile lives only on the device on which it was collected. When the agent hits ‘delete,’ that profile is gone.”

Gill Bejerano

The use of DNA to investigate crimes began in the 1980s, when scientists developed the genetic tools to discern one individual from another. It works because each of us leaves behind microcapsules of genetic information — unique to each individual — in the form of hair, skin flakes and saliva as we go about our daily life. Perpetrators of violent or sexual crimes also sometimes leave behind blood or semen. Cheek swabs from suspects or people arrested for a crime can also be used to generate DNA profiles.

In 1994, the Federal Bureau of Investigation was authorized to establish the National DNA Index System, which contains DNA profiles gathered by nearly 200 public law enforcement laboratories at the federal, state and local level. (CODIS, or the Combined DNA Index System, refers both to the FBI’s support of criminal justice DNA databases and the software used to run them.) Increasingly, however, cities and states also maintain their own DNA databases that are not linked to the FBI’s system and are less regulated.

Police can acquire information about DNA from crime scene samples or via cheek swabs using a suitcase-sized instrument. Over the course of about two hours, the instrument isolates and analyzes the DNA at 13-20 highly variable regions throughout the genome, generating a sequence of about 40 numbers that uniquely identifies every individual like a kind of genetic bar code. This code can then be used to quickly search for matches among thousands or millions of other similar codes in CODIS or in the state or municipal DNA databases.

Currently there are two ways to search for a match between an individual’s DNA profile and the contents of a large DNA database: share the sample of interest (designated in this case as a series of numbers) directly with the database or download the entire database onto the instrument. If the profile is shared with the database, it could be retained even if the person is innocent. But repeatedly downloading a large database of sensitive genetic information also isn’t ideal due to security concerns.

To combat these problems, Bejerano and his colleagues have written a short piece of software code that can be installed on the database computers as well as on the instrument. The code allows the two sides to “talk” to one another and conduct the comparison indirectly as a series of “if this, then that” instructions.

'Like playing 20 questions'

“It’s like playing a version of 20 questions,” Bejerano said. “The central database says ‘If you have this number at this location, go here, but if you have that number, go there. The field device follows the series of instructions matching the profile it has, and at the end the law enforcement agent learns whether their profile matches one in the database and nothing else. And the database learns nothing about the profile in the agent’s hands.”

It’s the ultimate blind date: Each party learns the other’s identity only if several specific parameters match exactly (Night owl or early bird? Red chowder or white? Dog person or cat person?). It’s also fast, efficient and, now, available free to any agency.

“It takes 40 seconds and just 180 MB to query a million profiles,” Bejerano said. “And it’s easy to implement. We are releasing the code widely, and law enforcement agencies could start using it tomorrow if they wanted.”

Bejerano and his colleagues didn’t start out focusing on the criminal justice system. His past research looked into helping clinicians better diagnose severe childhood diseases like cystic fibrosis while protecting patients’ genetic privacy. But recent events changed the trajectory of the research.

“The George Floyd murder got us thinking and researching,” Bejerano said. “CODIS databases exist at the national, state and even local levels. Law enforcement agents tend to profile disadvantaged neighborhoods, and sometimes they bank all the profiles, regardless of whether those individuals have been convicted of a crime. We realized that the technology we’d been working on was ideal to combat this type of discrimination.”

Because other, similar criminal justice DNA databases are in use in many other countries, the implementation of the team’s algorithm could have sweeping implications.

“We have to decide,” Bejerano said. “If genomic privacy is a value we cherish as a society, we have to start thinking about and implementing these types of tools. This approach allows us to have our cake and eat it too — keeping the powerful tool for criminal justice while also preserving the freedom of law-abiding citizens to keep their genetic information private.”

The research was supported by the National Science Foundation, a University of Virginia SEAS Research Innovation Award, the Joint University Microelectronics Program Undergraduate Research Initiative and the Stanford Artificial Intelligence Lab.



Stanford Medicine integrates research, medical education and health care at its three institutions - Stanford University School of Medicine, Stanford Health Care (formerly Stanford Hospital & Clinics), and Lucile Packard Children's Hospital Stanford. For more information, please visit the Office of Communication & Public Affairs site at http://mednews.stanford.edu.

COVID-19 Updates

Stanford Medicine is closely monitoring the outbreak of novel coronavirus (COVID-19). A dedicated page provides the latest information and developments related to the pandemic.

Leading In Precision Health

Stanford Medicine is leading the biomedical revolution in precision health, defining and developing the next generation of care that is proactive, predictive and precise.