Household electricity consumption prediction using database combinations, ensemble and hybrid modeling techniques

Sci Rep. 2024 Oct 2;14(1):22891. doi: 10.1038/s41598-024-57550-9.

Abstract

Household electricity consumption (HEC) is changing over time, depends on multiple factors, and leads to effects on the prediction accuracy of the model. The objective of this work is to propose a novel methodology for improving HEC prediction accuracy. This study uses two original datasets, namely questionnaire survey (QS) and monthly consumption (MC), which contain data from 225 consumers from Maharashtra, India. The original datasets are combined to create three additional datasets, namely QS + MC, QS equation (QsEq) + next month's consumptions, and QsEq + MC. Furthermore, the HEC prediction accuracy is boosted by applying different approaches, like correlation methods, feature engineering techniques, data quality assessment, heterogeneous ensemble prediction (HEP), and the hybrid model. Five HEP models are created using dataset combinations and machine learning algorithms. Based on the MC dataset, the random forest provides the best prediction of RMSE (36.18 kWh), MAE (25.73 kWh), and R2 (0.76). Similarly, QsEq + MC dataset adaptive boosting provides a better prediction of RMSE (36.77 kWh), MAE (26.18 kWh), and R2 (0.76). This prediction accuracy is further increased using the proposed hybrid model to RMSE (22.02 kWh), MAE (13.04 kWh), and R2 (0.92). This research work benefits researchers, policymakers, and utility companies in obtaining accurate prediction models and understanding HEC.

Keywords: Data quality assessment; Heterogeneous ensemble; Household electricity consumption; Hybrid model; Monthly prediction.