A comparative hydrochemical assessment of groundwater quality for drinking and irrigation purposes using different statistical and ML models in lower gangetic alluvial plain, eastern India

Chemosphere. 2025 Jan 13:372:144074. doi: 10.1016/j.chemosphere.2025.144074. Online ahead of print.

Abstract

Groundwater toxicity and water level depletion are serious concerns today. Assessing groundwater quality (GWQ) is crucial for effective planning and management due to increasing demands for drinking and irrigation water. Therefore, this study aims to analyze groundwater hydrochemistry, variability, and factors influencing quality for drinking and irrigation purposes using indices and models. For this purpose, 107 sampling sites were investigated considering 14 parameters. To assess the suitability of irrigation water, nine irrigation indices (magnesium hazard, sodium-adsorption-ratio, residual sodium-carbonate, residual sodium-bicarbonate, sodium percentage, potential salinity, Kelly's index, total hardness and permeability index) were applied. Shannon-entropy-based water quality index (SEWQI) and statistical techniques such as Pearson correlation, principal component analysis, and hierarchical cluster analysis were used assess to the selected parameters. Six machine learning models, both conventional and ensemble, (AdaBoost, DT, MLP, SVM, RF, and XGBoost) were employed for predictive analysis. The SEWQI reveals 38% samples are excellent to good, while 62% are poor to unsuitable, covering 5905.64 km2 area. Assessed irrigation indices confirm most samples are unsuitable. As per Gibbs and USSL diagrams, groundwater samples are primarily affected by rock dominance and suitable for irrigation despite high salinity and low sodium (C3S1 = 43.99%). Overall, the rock dominance zone is shaped by silicate and carbonate mineral dissolution and human activities, impacting GWQ. Hyperparameter optimization using the grid search method improves the performance accuracy of the XGBoost model with R2 of 0.999 and RMSE of 0.269. The results of this study can help implement appropriate management and monitoring strategies and provide insights into safe drinking water in the future.

Keywords: Partial dependency plot; Principal component analysis; Shannon entropy; USSL diagram; XGBoost.