Remotely sensed estimates of long-term biochemical oxygen demand over Hong Kong marine waters using machine learning enhanced by imbalanced label optimisation

Sci Total Environ. 2024 Sep 15:943:173748. doi: 10.1016/j.scitotenv.2024.173748. Epub 2024 Jun 8.

Abstract

In many coastal cities around the world, continuing water degradation threatens the living environment of humans and aquatic organisms. To assess and control the water pollution situation, this study estimated the Biochemical Oxygen Demand (BOD) concentration of Hong Kong's marine waters using remote sensing and an improved machine learning (ML) method. The scheme was derived from four ML algorithms (RBF, SVR, RF, XGB) and calibrated using a large amount (N > 1000) of in-situ BOD5 data. Based on labeled datasets with different preprocessing, i.e., the original BOD5, the log10(BOD5), and label distribution smoothing (LDS), three types of models were trained and evaluated. The results highlight the superior potential of the LDS-based model to improve BOD5 estimate by dealing with imbalanced training dataset. Additionally, XGB and RF outperformed RBF and SVR when the model was developed using log10(BOD5) or LDS(BOD5). Over two decades, the BOD5 concentration of Hong Kong marine waters in the autumn (Sep. to Nov.) shows a downward trend, with significant decreases in Deep Bay, Western Buffer, Victoria Harbour, Eastern Buffer, Junk Bay, Port Shelter, and the Tolo Harbour and Channel. Principal component analysis revealed that nutrient levels emerged as the predominant factor in Victoria Harbour and the interior of Deep Bay, while chlorophyll-related and physical parameters were dominant in Southern, Mirs Bay, Northwestern, and the outlet of Deep Bay. LDS provides a new perspective to improve ML-based water quality estimation by alleviating the imbalance in the labeled dataset. Overall, the remotely sensed BOD5 can offer insight into the spatial-temporal distribution of organic matter in Hong Kong coastal waters and valuable guidance for the pollution control.

Keywords: BOD(5); Downward trend; Hong Kong marine water; Label distribution smoothing (LDS); Machine learning.

MeSH terms

  • Biological Oxygen Demand Analysis
  • Environmental Monitoring* / methods
  • Hong Kong
  • Machine Learning*
  • Remote Sensing Technology
  • Seawater* / chemistry
  • Water Pollutants, Chemical / analysis
  • Water Pollution / analysis
  • Water Pollution / statistics & numerical data

Substances

  • Water Pollutants, Chemical