Prediction of the fluoride contents of different crop species via the random forest algorithm

Environ Geochem Health. 2024 Sep 9;46(10):418. doi: 10.1007/s10653-024-02206-w.

Abstract

Fluoride (F) is a trace element that is essential to the human body and occurs naturally in the environment. However, a deficiency or excess of F in the environment can potentially lead to human health issues. The pseudototal amount of F in soil often does not correlate directly with the F content in plants. Instead, the F content within plants tends to have a greater correlation with the bioavailable F in soils. In large-scale soil surveys, only the pseudototal elemental content of soils is typically measured, which may not be highly reliable for developing agricultural zoning plans. There are significant variations in the ability of different plants to accumulate F from soil. Additionally, due to variations in soil elemental absorption mechanisms among different plant species, when multiple crops are grown in an area, it is typically necessary to study the elemental absorption mechanisms of each crop. To address these issues, in this study, we examined the factors influencing F bioaccumulation coefficients in different crops based on 1:50,000 soil geochemical survey data. Using the random forest algorithm, four indicators-bioavailable P, bioavailable Zn, leachable Pb, and Sr-were selected from among 29 parameters to predict the F content within crops to replace bioavailable F in the soil. Compared with the multivariate linear regression (MLR) model, the random forest (RF) model provided more accurate and reliable predictions of the fluoride content in crops, with the RF model's prediction accuracy improving by approximately 95.23%. Additionally, while the partial least squares regression (PLSR) model also offered improved accuracy over MLR, the RF model still outperformed PLSR in terms of prediction accuracy and robustness. Additionally, it maximized the utilization of existing geochemical survey data, enabling cross-species studies for the first time and avoiding redundant evaluations of different types of agricultural products in the same region. In this investigation, we selected the Xining-Ledu region of Qinghai Province, China, as the study area and employed a random forest model to predict the crop F content in soils, providing a new methodological framework for crop production that effectively enhances agricultural quality and efficiency.

Keywords: Bioavailability; Crop management; Fluoride; Plant fluoride; Random forest model; Soil–plant system.

MeSH terms

  • Algorithms*
  • Crops, Agricultural* / chemistry
  • Crops, Agricultural* / metabolism
  • Environmental Monitoring / methods
  • Fluorides* / analysis
  • Linear Models
  • Random Forest
  • Soil / chemistry
  • Soil Pollutants* / analysis

Substances

  • Fluorides
  • Soil Pollutants
  • Soil