Latest information on COVID-19

5 Questions: Genevieve Wojcik on the need for diversity in genome-based studies

Data scientist Genevieve Wojcik speaks about the lack of diversity in genomewide association studies, why it’s a problem and how increasing diversity in these studies can elevate the entire population.

- By Hanae Armitage


If you’ve ever had genetic testing done, you might have noticed a few lines of fine print in the accompanying paperwork acknowledging that the results may not accurately represent individuals of non-European descent. That’s not unusual, said Genevieve Wojcik, PhD, a postdoctoral scholar at the School of Medicine who studies biomedical data science, but it is a problem.

This caveat is often seen in something called a genomewide association study, or GWAS, which harnesses technology to read DNA and look for genetic variants that show up in association with specific diseases. The idea is to look at the DNA of a population and spot common variants that seem to correspond to an illness, such as heart disease, cancer or diabetes. But giving these studies the power they need to uncover these links requires data from thousands of individuals. 

The problem, Wojcik said, is that often the available genetic data comes from individuals of European descent, and results from one racial or ethnic group don’t always apply to another. In a study published June 19 in Nature, Wojcik and her colleagues dig into that problem and show how diversity in genetic analyses improves our understanding of complex traits and disease risk. 

Science writer Hanae Armitage spoke with Wojcik about the study, discussing the lack of diversity in GWAS research, how diversity can elevate it and what scientists can do to better include a broad range of populations when conducting these kinds of genomic analyses.

1. What are a couple of key factors behind the lack of diversity in GWAS?

Wojcik: A GWAS requires a large number of samples to get a strong signal, and a lot of the large cohorts and health-related studies are done in groups that are mostly of European descent. This is for a variety of reasons, including patient participation, the geographic location of these studies and the overall recruitment efforts.

Historically, genomic studies tended to favor somewhat homogenous populations to minimize confounding factors and make sure that any signal you pick up on is genuine. So scientists would often organize their studies by the different racial or ethnic groups, and the European-descent group was usually the largest. Eventually, that became the norm since there was simply more data. But now things are changing. We have much larger studies occurring in metropolitan areas and on a national level with more diverse populations. We also have computational methods that help us analyze less-homogeneous data more effectively, and we can gain more information by having everyone together versus focusing on the largest group.

2. Why is it important to increase diversity in genetic studies?

Wojcik: If a GWAS only incorporates data from one racial or ethnic group, the results are typically most applicable to that one population. The effect of that variant in other groups may be smaller or nonexistent. Non-European descent groups have a disproportionately higher burden of disease, especially in the United States. When GWASes focus on data from just individuals of European descent, you can exacerbate that disparity. That’s one of the things we’re trying to address by having diverse groups in our genomic analyses. By increasing minority representation, you’re ensuring that these groups don’t get left behind.

3. What is the PAGE study and how is it different from most genomewide studies?

Wojcik: The PAGE study, which stands for Population Architecture Using Genomics and Epidemiology, was originally developed by the National Human Genome Research Institute in an effort to address the very problem we’ve been talking about: A large majority of genomic research is done in European-descent groups.

The PAGE study is bringing together numerous institutes and existing studies to pool genomic data from individuals who have historically been sidelined in genomewide association studies — namely, racial and ethnic minorities. The study’s overall goal is to investigate a diverse population to better understand how genetic factors affect susceptibility to disease. PAGE has been around for about a decade, and their recent focus has been on gathering and analyzing gene information from 50,000 individuals of non-European descent. Our role at Stanford is to help coordinate and analyze the data.

4. How does a diverse cohort add to the power of a genomewide screen?

Wojcik: Variants in or near genes can be associated with risk for a particular condition; that’s why we do GWASes. But it’s not always clear-cut. For instance, a GWAS usually points to several variants within one gene that all seem to correspond to that particular condition. Adding more diverse data can actually help zero in on which variants are most likely to be causal for the condition, and which just coincidentally pop up in our GWASes because they are nearby. We demonstrated this using data from the PAGE study, in which we looked at genetic associations with height. We already had a lot of data from a previous GWAS of 250,000 people, who were predominantly of European-descent. For one of the identified genes, there were four genetic variants associated with height, but we weren’t able to narrow it down to the likely causal variant. We tried adding data from another 50,000 European-descent participants, but still weren’t able to narrow down our list. But when we added genomic data from the 50,000 multi-ethnic participants from the PAGE study, we found that there was only one genetic variant that was consistently involved with height.

5. What can researchers do to ensure that their genetic studies are more diverse and inclusive?

Wojcik: I think first and foremost, there needs to be a shift in mindset. In addition to the social justice merits of this work, there’s true scientific cause for increasing diversity in GWASes. This paper does a great job of laying down a framework to show that. We’re saying, “You really have no excuse; look at the scientific rationale for including diverse groups in genomic studies. Just because it’s harder to include broad diversity doesn’t mean you don’t have to do it.”

Pooling data from diverse groups actually enhances precision in disease risk predictions or associations when properly analyzed. Nowadays, we have computer methods that can handle immense complexity in your data, so it actually doesn’t always make a whole lot of sense to run separate GWASes for different populations. Genetic diversity is a spectrum, not a box you check on a survey, and it’s crucial that we adjust our approach to genomic studies to treat it as such.

Stanford Medicine integrates research, medical education and health care at its three institutions - Stanford School of Medicine, Stanford Health Care, and Stanford Children's Health. For more information, please visit the Office of Communications website at

2022 ISSUE 1

Understanding the world within us

COVID-19 Updates

Stanford Medicine is closely monitoring the outbreak of novel coronavirus (COVID-19). A dedicated page provides the latest information and developments related to the pandemic.