Hybrid modeling techniques for predicting chemical oxygen demand in wastewater treatment: a stacking ensemble learning approach with neural networks

Environ Monit Assess. 2024 Nov 27;196(12):1259. doi: 10.1007/s10661-024-13390-8.

Abstract

To ensure operational efficiency, promote sustainable wastewater treatment practices, and maintain compliance with environmental regulations, it is crucial to evaluate the parameters of treated effluent in wastewater treatment plants (WWTPs). Artificial neural network (ANN) analysis is a promising tool to predict the wastewater characteristics, as a substitute to tedious laboratory techniques. It enables proactive decision-making and contributes to the overall effectiveness of the treatment processes. The primary aim of this work is to develop a robust model for predicting chemical oxygen demand (COD), a key parameter for evaluating the operational efficiency of WWTPs. The research employed a dataset consisting of 527 samples and 22 features, derived from daily sensor readings in an urban WWTP. Additionally, to enhance the efficiency of the ANN framework for regression modeling, the dataset was augmented to 1054 samples using a robust synthetic data generator known as generative adversarial networks (GAN). In this study, K-means clustering combined with principal component analysis (PCA) is employed for feature selection and anomaly detection, aiming to enhance the regression model's performance by incorporating advanced ANN extensions, including polynomial, additive, and radial basis networks. Moreover, the model optimizes ANNs using advanced heuristic techniques such as genetic algorithms (GA), ant colony optimization (ACO), and particle swarm optimization (PSO). Furthermore, the performance of the models was assessed using the R2 (coefficient of determination), MSE (mean squared error), and loss value metrics along with visual performance indicators. The proposed model stacked ensemble method resulted in an MSE of 0.0012 and an R2 score of 0.95 and the GA_ANN model, despite undergoing optimization, achieved only an R2 score of 0.70, indicating considerable potential for improvement. In contrast, the ACO_ANN and PSO_ANN models showed significant performance boosts, with near-perfect R2 scores of about 0.98 and 0.99, respectively, making them the top-performing models overall. The results outlined in this article introduce an ANN stacking ensemble regression mode framework that can aid researchers in improving the operational efficiency of treatment plants.

Keywords: Artificial neural network; Chemical oxygen demand; Generative adversarial networks; Optimization; Stacking ensemble regression; Wastewater treatment plant.

MeSH terms

  • Biological Oxygen Demand Analysis*
  • Environmental Monitoring / methods
  • Machine Learning
  • Neural Networks, Computer*
  • Principal Component Analysis
  • Waste Disposal, Fluid* / methods
  • Wastewater* / chemistry
  • Water Pollutants, Chemical / analysis

Substances

  • Wastewater
  • Water Pollutants, Chemical