Many historical administrative documents, such as the 1940 census, have been digitized and thus could be merged with geographic data. Merged data could reveal social determinants of health, health and social policy milieu, life course events, and selection effects otherwise masked in longitudinal datasets. However, most exact boundaries of 1940 census enumeration districts have not yet been georeferenced. These exact boundaries could aid in analysis of redlining and other geographic and social contextual factors important for health outcomes today. Our objective is to locate and map a large set of 1940 enumeration districts. We use online resources and algorithmic solutions to locate and georeference unknown 1940 enumeration districts. We geocode addresses using the OpenCage API and construct "virtual" enumeration districts by using a convex hull algorithm on those geocoded addresses. We also merge in Home Owners' Loan Corporation (HOLC) redlining maps from the 1930s to demonstrate how 1940 enumeration districts could be used in future work to examine the association between historic redlining and current health. We geocode 7,228,656 1940 census addresses from the largest 191 US cities in 1940 that contained 84% of the 1940 US urban population from the Geographic Reference File and construct 34,472 virtual enumeration districts in areas that had HOLC redlining maps. 18,340 virtual enumeration districts were previously unmapped, covering cities containing an additional 40% of the 1940 US urban population. Where virtual enumeration districts match with previously mapped districts, 96.8% of paired districts share HOLC redlining categorization. Researchers can use algorithmic methods to quickly process, geocode, merge, and analyze large scale repositories of historical documents that provide important data on social determinants of health. These 1940 enumeration district maps could be used with studies such as the Health and Retirement Study, Panel Study for Income Dynamics, and Wisconsin Longitudinal Study.
Copyright: © 2025 Huang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.