Performance of Conditional Random Forest and Regression Models at Predicting Human Fecal Contamination of Produce Irrigation Ponds in the Southeastern United States

ACS ES T Water. 2024 Nov 27;4(12):5844-5855. doi: 10.1021/acsestwater.4c00839.

Abstract

Irrigating fresh produce with contaminated water contributes to the burden of foodborne illness. Identifying fecal contamination of irrigation waters and characterizing fecal sources and associated environmental factors can help inform fresh produce safety and health hazard management. Using two previously collected data sets, we developed and evaluated the performance of logistic regression and conditional random forest models for predicting general and human-specific fecal contamination of ponds in southwest Georgia used for fresh produce irrigation. Generic Escherichia coli served as a general fecal indicator, and human-associated Bacteroides (HF183), crAssphage, and F+ coliphage genogroup II were used as indicators of human fecal contamination. Increased rainfall in the previous 7 days and the presence of a building within 152 m (a proxy for proximity to septic systems) were associated with increased odds of human fecal contamination in the training data set. However, the models did not accurately predict the presence of human-associated fecal indicators in a second data set collected from nearby irrigation ponds in different years. Predictive statistical models should be used with caution to assess produce irrigation water quality as models may not reliably predict fecal contamination at other locations and times, even within the same growing region.

Keywords: agricultural water; conditional random forest; dead-end ultrafiltration (DEUF); foodborne illness; fresh produce safety; microbial source tracking; predictive modeling; quantitative polymerase chain reaction (qPCR).