Death ORACL: An Algorithm to Predict Death Using Insurance Claims Data

Am J Epidemiol. 2024 Sep 11:kwae348. doi: 10.1093/aje/kwae348. Online ahead of print.

Abstract

The inability to identify dates of death in insurance claims data is the United States is a major limitation to retrospective claims-based research. While deaths result in disenrollment, disenrollment can also occur due to changes in insurance providers. We created an algorithm to differentiate between disenrollment from health plans due to death and disenrollment for other reasons. We identified 5,259,735 adults who disenrolled from private insurance between 2007 and 2018. Using death dates ascertained from the Social Security Death Index, inpatient discharge status, and death indicators in the administrative data, 7.6% of all disenrollments were classified as resulting from death. We used elastic net regression to build an algorithm using claims data in the year prior to disenrollment; candidate predictors included medical conditions, individual demographic characteristics, treatment utilization, and structural factors related to health insurance eligibility and coding. Using a predicted probability threshold of 0.9 (selected to reflect the corresponding known prevalence of mortality), internal validation found that the algorithm classified death at disenrollment with a positive predictive value of 0.815, sensitivity of 0.721 and specificity of 0.986 (AUC=0.97). Independent data sources were used for external validation and for an applied example. Code for implementation is publicly available.

Keywords: claims-based mortality algorithm; competing risks; insurance claims; machine learning.