Precisely and efficiently identifying subgroups with heterogeneous treatment effects (HTEs) in real-world evidence studies remains a challenge. Based on the causal forest (CF) method, we developed an iterative CF (iCF) algorithm to identify HTEs in subgroups defined by important variables. Our method iteratively grows different depths of the CF with important effect modifiers, performs plurality votes to obtain decision trees (subgroup decisions) for a family of CFs with different depths, and then finds the cross-validated subgroup decision that best predicts the treatment effect as a final subgroup decision. We simulated 12 different scenarios and showed that the iCF outperformed other machine learning methods for interaction/subgroup identification in the majority of scenarios assessed. Using a 20% random sample of fee-for-service Medicare beneficiaries initiating sodium-glucose cotransporter-2 inhibitors or glucagon-like peptide-1 receptor agonists, we implemented the iCF to identify subgroups with HTEs for hospitalized heart failure. Consistent with previous studies suggesting patients with heart failure benefit more from sodium-glucose cotransporter-2 inhibitors, iCF successfully identified such a subpopulation with HTEs and additive interactions. The iCF is a promising method for identifying subgroups with HTEs in real-world data where the potential for unmeasured confounding can be limited by study design.
Keywords: causal forest; heterogeneous treatment effect; iterative causal forest; pharmacoepidemiology; precision medicine; subgroup identification.
© The Author(s) 2023. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.