Predicting serious postoperative complications and evaluating racial fairness in machine learning algorithms for metabolic and bariatric surgery

Dong-Won Kang; Shouhao Zhou; Russell Torres; Abhinandan Chowdhury; Suman Niranjan; Ann Rogers; Chan Shen

doi:10.1016/j.soard.2024.08.008

Predicting serious postoperative complications and evaluating racial fairness in machine learning algorithms for metabolic and bariatric surgery

Surg Obes Relat Dis. 2024 Nov;20(11):1056-1064. doi: 10.1016/j.soard.2024.08.008. Epub 2024 Aug 13.

Authors

Dong-Won Kang¹, Shouhao Zhou², Russell Torres³, Abhinandan Chowdhury⁴, Suman Niranjan⁵, Ann Rogers¹, Chan Shen⁶

Affiliations

¹ Department of Surgery, Penn State College of Medicine, Hershey, Pennsylvania.
² Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania.
³ Department of Information Technology and Decision Sciences, University of North Texas, Denton, Texas.
⁴ Department of Mathematics, Savannah State University, Savannah, Georgia.
⁵ Department of Logistics and Operations Management, G. Brint Ryan College of Business, University of North Texas, Denton, Texas.
⁶ Department of Surgery, Penn State College of Medicine, Hershey, Pennsylvania; Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania. Electronic address: chanshen@psu.edu.

PMID: 39232870
DOI: 10.1016/j.soard.2024.08.008

Abstract

Background: Predicting the risk of complications is critical in metabolic and bariatric surgery (MBS).

Objectives: To develop machine learning (ML) models to predict serious postoperative complications of MBS and evaluate racial fairness of the models.

Setting: Metabolic and Bariatric Surgery Accreditation and Quality Improvement Program (MBSAQIP) national database, United States.

Methods: We developed logistic regression, random forest (RF), gradient-boosted tree (GBT), and XGBoost model using the MBSAQIP Participant Use Data File from 2016 to 2020. To address the class imbalance, we randomly undersampled the complication-negative class to match the complication-positive class. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), precision, recall, and F1 score. Fairness across White and non-White patient groups was assessed using equal opportunity difference and disparate impact metrics.

Results: A total of 40,858 patients were included after undersampling the complication-negative class. The XGBoost model was the best-performing model in terms of AUROC; however, the difference was not statistically significant. While the F1 score and precision did not vary significantly across models, the RF exhibited better recall compared to the logistic regression. Surgery type was the most important feature to predict complications, followed by operative time. The logistic regression model had the best fairness metrics for race.

Conclusions: The XGBoost model achieved the highest AUROC, albeit without a statistically significant difference. The RF may be useful when recall is the primary concern. Undersampling of the privileged group may improve the fairness of boosted tree models.

Keywords: Complication; Machine learning; Metabolic and bariatric surgery; Roux-en-Y gastric bypass; Sleeve gastrectomy.

MeSH terms

Adult
Algorithms
Bariatric Surgery*
Female
Humans
Machine Learning*
Male
Middle Aged
Obesity, Morbid / surgery
Postoperative Complications* / epidemiology
Postoperative Complications* / ethnology
Postoperative Complications* / etiology
Risk Assessment / methods
United States