American Family Cohort (AFC)
Dataset Website:
https://americanfamilycohort.org/
Vital Statistics
Sample size: More than 8 million
Sampling frame: Patients in primary care clinics from throughout the United States
Years of data: 2010 - 2024
Type of data: Electronic Health Records
Dataset Description
The American Family Cohort (AFC) data are derived from the American Board of Family Medicine (ABFM) PRIME Registry. The PRIME Registry is a Qualified Clinical Data Registry (QCDR) for primary care and provides tools to evaluate primary care practice performance, support population health and risk stratification, improve primary care practice as well as patient outcomes, and alleviate Centers for Medicare and Medicaid Services (CMS) reporting burden for their payment programs. The PRIME Registry, certified by CMS in 2016, represents over 3,000 active clinicians representing 50 states from data on over eight million patients, dating back to 2010. The PRIME Registry is the largest clinical registry for primary care in the nation.
PRIME includes both structured and unstructured data, typical of disparate EHRs. Data elements include patient demographics, diagnoses and interventions for the patients such as medications and therapies, encounter-specific data, patient-reported outcomes (PROs), and some limited clinician-specific details. All data are collected during routine assessment of clinical care of patients whose main goals are to support practice-specific quality improvement activities as well as CMS-specific quality reporting for payment. The PRIME registry includes National Qualify Forum (NQF)-endorsed measures and a patient-reported outcome (PRO) measure tool that aids in tracking practice performance.
The American Family Cohort or AFC is a research dataset derived from the PRIME Registry. There are several versions curated for different research use cases. These data have been transformed into the Observational Medical Outcomes Partnership (OMOP) Common Data Model.
Strengths and limitations of the dataset
These data are ideal for:
Questions which require granular geographic information over time, overlay by granular geography or linkage by individuals within primary care practice to answer a research question.
Research in primary care including health services, patterns of care (e.g. treatment and diagnosis) and clinical and service utilization-related outcomes.
Research on social determinants of health (e.g. social deprivation index, neighborhood characteristics, race/ethnicity, healthcare resources within geographic location) including family or household.
Enables opportunity for research on services that are usually performed in primary care with clinical depth afforded through the free-text notes
Allows for comparison across practices in the United States to assess quality measures with a lens and attention towards primary care
Opportunities to conduct health equity studies with a focus on racial/ethnic disparities
Ability to conduct and perform race imputation
Limitations:
These data do not currently include inpatient data or data from specialists;therefore, the information is limited to all activities of primary care practitioners that take place in primary care clinics.
These data may not have sufficient sample size for rare diseases.
Although the data overall have good representation from many populations, there may be sparse representation in some geographic areas.
These data only represent patients that are seen within a specific primary care practice; therefore, patients that may seek care outside the primary care practice where they are documented with a visit cannot be tracked anywhere else (e.g. other primary care practices, emergency departments, inpatient)
Patient Demographics
There is a strong representation from populations from all 50 states. AFC includes patients on private insurance plans, Medicaid and Medicare thereby increasing the representation of vulnerable populations and subsequently the generalizability of the sample to the overall US population.
AFC includes racially and ethnically diverse data include over 540,000 Black patients, 150,000 Asian patients, 51,000 Native American and Alaska Native patients and 16,000 Native Hawaiian and Pacific Islander patients. The remaining 4.8 million patients are White, and 758,000 patients have identified as Hispanic or Latino. This racial/ethnic diversity is a major strength for research on underserved and marginalized populations.
The data include nearly 1 million children whom we are able to link to parent health records, including a large number from minority populations from practices in rural areas of the U.S., allowing for analyses across the life course, and across generations in diverse groups.
Selected Publications
Cheng, Lingwei, Isabel O. Gallegos, Derek Ouyang, Jacob Goldin, and Dan Ho. 2023. “How Redundant Are Redundant Encodings? Blindness in the Wild and Racial Disparity When Race Is Unobserved.” Pp. 667–86 in 2023 ACM Conference on Fairness, Accountability, and Transparency. Chicago IL USA: ACM.
Ganguli, Ishani, Kathleen L. Mulligan, Robert L. Phillips, and Sanjay Basu. 2022. “How the Gender Wage Gap for Primary Care Physicians Differs by Compensation Approach: A Microsimulation Study.” Annals of Internal Medicine 175(8):1135–42. doi: 10.7326/M22-0664.
Hao, Shiying, David H. Rehkopf, Esther Velasquez, Ayin Vala, Andrew W. Bazemore, and Robert L. Phillips. 2023. “COVID-19 Vaccine Strategy Left Small Primary Care Practices On The Sidelines: Study Examines COVID-19 Vaccine Strategy and the Impact on Small Primary Care Practices.” Health Affairs 42(8):1147–51. doi: 10.1377/hlthaff.2023.00114.
Kremers, Mark S., Stephen C. Hammill, Charles I. Berul, Christina Koutras, Jeptha S. Curtis, Yongfei Wang, Jim Beachy, Laura Blum Meisnere, Del M. Conyers, Matthew R. Reynolds, Paul A. Heidenreich, Sana M. Al-Khatib, Ileana L. Pina, Kathleen Blake, Mary Norine Walsh, Bruce L. Wilkoff, Alaa Shalaby, Frederick A. Masoudi, and John Rumsfeld. 2013. “The National ICD Registry Report: Version 2.1 Including Leads and Pediatrics for Years 2010 and 2011.” Heart Rhythm 10(4):e59–65. doi: 10.1016/j.hrthm.2013.01.035.
Yang, Zhou, Christina Silcox, Mark Sendak, Sherri Rose, David Rehkopf, Robert Phillips, Lars Peterson, Miguel Marino, John Maier, Steven Lin, Winston Liaw, Ioannis A. Kakadiaris, John Heintzman, Isabella Chu, and Andrew Bazemore. 2022. “Advancing Primary Care with Artificial Intelligence and Machine Learning.” Healthcare 10(1):100594. doi: 10.1016/j.hjdsi.2021.100594.
Process for accessing the data and associated costs
The process to access the data can be found here: AFC Data Access Instructions. Access takes about 3 months. Costs depends on funding source and nature of access.
Data Support
PHS has AFC office hours, slack channels, AFC data documentation and recorded trainings.
Other Datasets: