Machine learning approach as an early warning system to prevent foodborne Salmonella outbreaks in northwestern Italy

Vet Res. 2024 Jun 5;55(1):72. doi: 10.1186/s13567-024-01323-9.

Abstract

Salmonellosis, one of the most common foodborne infections in Europe, is monitored by food safety surveillance programmes, resulting in the generation of extensive databases. By leveraging tree-based machine learning (ML) algorithms, we exploited data from food safety audits to predict spatiotemporal patterns of salmonellosis in northwestern Italy. Data on human cases confirmed in 2015-2018 (n = 1969) and food surveillance data collected in 2014-2018 were used to develop ML algorithms. We integrated the monthly municipal human incidence with 27 potential predictors, including the observed prevalence of Salmonella in food. We applied the tree regression, random forest and gradient boosting algorithms considering different scenarios and evaluated their predictivity in terms of the mean absolute percentage error (MAPE) and R2. Using a similar dataset from the year 2019, spatiotemporal predictions and their relative sensitivities and specificities were obtained. Random forest and gradient boosting (R2 = 0.55, MAPE = 7.5%) outperformed the tree regression algorithm (R2 = 0.42, MAPE = 8.8%). Salmonella prevalence in food; spatial features; and monitoring efforts in ready-to-eat milk, fruits and vegetables, and pig meat products contributed the most to the models' predictivity, reducing the variance by 90.5%. Conversely, the number of positive samples obtained for specific food matrices minimally influenced the predictions (2.9%). Spatiotemporal predictions for 2019 showed sensitivity and specificity levels of 46.5% (due to the lack of some infection hotspots) and 78.5%, respectively. This study demonstrates the added value of integrating data from human and veterinary health services to develop predictive models of human salmonellosis occurrence, providing early warnings useful for mitigating foodborne disease impacts on public health.

Keywords: Supervised learning; decision tree algorithms; disease surveillance; food products; salmonellosis; transdisciplinarity.

MeSH terms

  • Animals
  • Disease Outbreaks* / prevention & control
  • Disease Outbreaks* / veterinary
  • Food Microbiology
  • Foodborne Diseases / epidemiology
  • Foodborne Diseases / microbiology
  • Foodborne Diseases / prevention & control
  • Humans
  • Italy / epidemiology
  • Machine Learning*
  • Prevalence
  • Salmonella / physiology
  • Salmonella Food Poisoning* / epidemiology
  • Salmonella Food Poisoning* / prevention & control
  • Salmonella Infections / epidemiology
  • Salmonella Infections / prevention & control