Zipcode in de-identified OMOP data
Social determinants of health
De-identified STARR-OMOP data has zip5 for 78.5% of patients
May 17, 2021:
In collaboration with University Privacy Office (UPO), Research IT has made five digit zipcode available in the de-identified STARR OMOP data. The de-identified OMOP data is designed to reduce the startup barrier and our goal has been to retain the data richness that is typically expected from identified data. For example, the clinical notes are de-identified and dates are zittered to preserve the richness and longitudinal order. Now 78.5% of the patients retain their full five digit zip codes. These zipcodes have >20,000 people living in them. We are currently using 2010 census but will move to 2020 as soon as the newer census data is available to use. For the rest of the patients, we retain the first three digits (***00) and the remaining are set to zero.
With the zipcode, you can look at social determinants of health. Since OMOP is in Google Big Query, one of the first places to look for useful datasets is the Google marketplace. You can further sub-select by Social Determinants of Health (or SDOH). There are some interesting datasets such as income by county level, county level enrollment for supplemental nutritional assistance, federally designated areas with shortage of healthcare workers and more.
A Oct 2019 report titled "2019 Edition — Health Disparities by Race and Ethnicity, The California Landscape" from CA Health Care Foundation (link) presents these key facts:
- Life expectancy at birth in California was 80.8 years. It was lowest for Blacks, at 75.1 years, and highest for Asians, at 86.3 years, an 11-year gap.
- Latinos were more likely to report being in fair/poor health, to have incomes below the federal poverty level, and to be uninsured. About one in five Latinos did not have a usual source of care, and one in six Latinos reported difficulty finding a specialist.
- Blacks had the highest rates of new prostate, colorectal, and lung cancer cases, and highest death rates for breast, colorectal, lung, and prostate cancer.
- About 1 in 5 multiracial, Black, and white adults reported being told they have depression compared to about 1 in 10 Asian adults.
- Blacks fare worse on maternal/childbirth measures, with higher rates of low-risk, first-birth cesareans, preterm births, low-birthweight births, infant mortality, and maternal mortality.
Addendum (Sep 14, 2021):
Note that patient's five digit zipcode (or zip5) in section above refers to ZCTA. During the deID, we are using ZCTA to look up the census population. Where we claim to retain zip5, we are essentially retaining the ZCTA. In most cases, the zip5 and ZCTA are geographically close but in a small number of cases, they are geographically dissimilar.
Note further that zipcode represents a postal delivery route and can change over the course of a decade. The zip+4 changes frequently. The five digit zipcode changes infrequently. These zipcode changes happen due to post office openings, closures and boundary changes. So, in a small number of patients, the older five digit zipcode stored in EHR may not represent the zipcode corresponding to the street address.
Also note that for social determinant of health studies, the ZCAT is often too large an area and too socio-economically diverse. A smaller, more homogenous sub-division is likely to be census tract. But census tract often has a very small population e.g. ~4000.