Development and construction of a cataract risk prediction model based on biochemical indices: the National Health and Nutrition Examination Survey, 2005-2008

Front Med (Lausanne). 2024 Oct 21:11:1452756. doi: 10.3389/fmed.2024.1452756. eCollection 2024.

Abstract

Purpose: The aim of this study is to develop and validate a novel multivariable prediction model capable of accurately estimating the probability of cataract development, utilizing parameters such as blood biochemical markers and age.

Design: This population-based cross-sectional study comprised 9,566 participants drawn from the National Health and Nutrition Examination Survey (NHANES) across the 2005-2008 cycles.

Methods: Demographic information and laboratory test results from the patients were collected and analyzed using LASSO regression and multivariate logistic regression to accurately capture the influence of biochemical indicators on the outcomes. The SHAP (Shapley Additive Explanations) scale was employed to assess the importance of each clinical feature, excluding age. A multivariate logistic regression model was then developed and visualized as a nomogram. To assess the model's performance, its discrimination, calibration, and clinical utility were evaluated using receiver operating characteristic (ROC) curves, 10-fold cross-validation, Hosmer-Lemeshow calibration curves, and decision curve analysis (DCA), respectively.

Results: Logistic regression analysis identified age, erythrocyte folate (nmol/L), blood glucose (mmol/L), and blood urea nitrogen (mmol/L) as independent risk factors for cataract, and these variables were incorporated into a multivariate logistic regression-based nomogram for cataract risk prediction. The area under the receiver operating characteristic (ROC) curve (AUC) for cataract risk prediction was 0.917 (95% CI: 0.9067-0.9273) in the training cohort, and 0.9148 (95% CI: 0.8979-0.9316) in the validation cohort. The Hosmer-Lemeshow calibration curve demonstrated a good fit, indicating strong model calibration. Ten-fold cross-validation confirmed the logistic regression model's robust predictive performance and stability during internal validation. Decision curve analysis (DCA) demonstrated that the nomogram prediction model provided greater clinical benefit for predicting cataract risk when the patient's threshold probability ranged from 0.10 to 0.90.

Conclusion: This study identified blood urea nitrogen (mmol/L), serum glucose (mmol/L), and erythrocyte folate (mmol/L) as significant risk factors for cataract. A risk prediction model was developed, demonstrating strong predictive accuracy and clinical utility, offering clinicians a reliable tool for early and effective diagnosis. Cataract development may be delayed by reducing levels of blood urea nitrogen, serum glucose, and erythrocyte folate through lifestyle improvements and dietary modifications.

Keywords: blood biochemical indicators; cataract; machine learning; nomogram; prediction model.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. The Natural Science Foundation of Xinjiang Uygur Autonomous Region, China, is funding the key project with the reference number 2022D01D68.