Robust diabetic prediction using ensemble machine learning models with synthetic minority over-sampling technique

Sci Rep. 2024 Nov 22;14(1):28984. doi: 10.1038/s41598-024-78519-8.

Abstract

This paper addresses the pressing issue of diabetes, which is a widespread condition affecting a huge population worldwide. As cells become less responsive to insulin or fail to produce it adequately, blood sugar levels rise. This has the potential to cause severe health complications including kidney disease, vision impairment and heart conditions. Early diagnosis is paramount in mitigating the risk and severity of diabetes-related complications. To tackle this, we proposed a robust framework for diabetes prediction using Synthetic Minority Over-sampling Technique (SMOTE) with ensemble machine learning techniques. Our approach incorporates strategies such as imputation of missing values, outlier rejection, feature selection using correlation analysis and class distribution balancing using SMOTE. The extensive experimentation shows that the proposed combination of AdaBoost and XGBoost shows exceptional performance, with an impressive AUC of 0.968+/-0.015. This outperforms not only alternative methodologies presented in our study but also surpasses current state-of-the-art results. We anticipate that our model will significantly improve diabetes prediction, offering a promising avenue for improved healthcare outcomes in diabetes management.

Keywords: AdaBoost; Diabetic; Machine learning; Outlier detection; SMOTE; XGBoost.

MeSH terms

  • Algorithms
  • Blood Glucose / analysis
  • Diabetes Mellitus* / diagnosis
  • Diabetes Mellitus* / epidemiology
  • Humans
  • Machine Learning*

Substances

  • Blood Glucose