Men's fertility specialist mines medical insurance data for health biomarkers

Michael Eisenberg has been using medical insurance data to support the hypothesis that male fertility may be a predictor of overall patient health.

Photo by Steve Fisch

Early in his practice, Michael Eisenberg, MD, an associate professor of urology specializing in men’s health, had a hunch that male infertility could be a predictor of future health issues. To explore this hypothesis, he began conducting a series of retrospective studies where he analyzed the medical histories of hundreds of thousands of men over eight years. Now, with the help of the Data Core at the Stanford Center for Population Health Sciences, he is building a compelling case that male fertility may serve as a predictive biomarker of overall health.

In one 2014 study for example, Eisenberg analyzed data from a large medical insurance claims database to see if there was an increased risk of cancer in infertile men. First, he and his team downloaded the deidentified health records of 76,083 infertile men, 112,655 men who had undergone vasectomies and 760,830 male controls from the IBM® MarketScan® Research Databases, a large dataset that includes person-specific clinical expenditures segmented into patient, prescription drug and service categories. After the data was adjusted for subject age, evaluation year, comorbidity and follow-up care, they compared the risk of cancer in infertile men to the risk in the controls using the Cox proportional hazard regression model, an analysis technique that allowed them to identify statistically significant relationships between health variables.

Bottom line, the team discovered that there was an increased risk of testicular cancer in infertile men. The data also suggested that there were higher risks for all cancers in the years after a diagnosis of infertility. These findings now provide useful information to clinicians and patients, as well as to researchers looking for biochemical pathways relevant in the development of cancers.

Since then, Eisenberg has used the IBM® MarketScan® Research Databases to conduct similar analyses for other chronic medical and autoimmune disorders, and these studies are adding scientific support to his original premise — that male infertility may provide a window onto the overall health of a patient.

Other PHS data resources for researchers

This is just one example of how the almost 200 datasets in the Population Health Data Core can be used by researchers to make clinical discoveries and to better understand the social determinants of health, including factors such as poverty, inequality and climate change.

The Stanford Center of Population Sciences, founded in 2015, offers researchers a central hub for efficiently accessing, visualizing and analyzing data from a wide variety of sources. The Center recently integrated its data assets into a single PHS Data Portal, powered by Redivis, providing users with an easier interface for discovering datasets, requesting data access and leveraging high-performance compute tools.

“We made big investments early on in technology that would be able to handle really large datasets and keep them secure without driving researchers crazy with hoops to jump through. We now have hundreds of researchers using our datasets on our secure PHS Data Portal,” says Isabella Chu, associate director of the Stanford PHS Data Core. “As an administrator, our toolkit helps me quickly evaluate researchers for data access and makes it easier for researchers to work and collaborate with the datasets.”

Some of PHS’s other popular data sources include:

  • Optum Clinformatics Data Mart stores administrative health claims for more than 72 million members of a large national managed-care company affiliated with Optum. It includes data from Medicare + Choice enrollees.
  • The Health Inequality Project has data on the differences in life expectancy by income and identifies strategies to improve health outcomes for low-income Americans
  • The Centers for Medicare and Medicaid Services allow researchers to evaluate geographic variations in the use and quality of health-care services for the Medicare fee-for-service population in a 20 percent sample of the national Medicare population.
  • The Integrated Public Use Microdata Series, which stores individual and household census records, is a good source for research on social and economic changes.
  • The “Born in Bradford” cohort study has data on 12,500 pregnant women from 2007 to 2010 and subsequent data on 13,500 offspring, all from a resource-poor town in the United Kingdom.


To help you get up to speed on using this data, the group offers extensive online documentation, a Slack user channel and support office hours. For more information, visit the Data Core website.