Bacillus strains are ubiquitous in the environment and are widely used in the microbiological industry as valuable enzyme sources, as well as in agriculture to stimulate plant growth. The Bacillus genus comprises several closely related groups of species. The rapid classification of these remains challenging using existing methods. Techniques based on MALDI-TOF MS data analysis hold significant promise for fast and precise microbial strains classification at both the genus and species levels. In previous work, we proposed a geometric approach to Bacillus strain classification based on mass spectra analysis via the centroid method (CM). One limitation of such methods is the noise in MS spectra. In this study, we used a denoising autoencoder (DAE) to improve bacteria classification accuracy under noisy MS spectra conditions. We employed a denoising autoencoder approach to convert noisy MS spectra into latent variables representing molecular patterns in the original MS data, and the Random Forest method to classify bacterial strains by latent variables. Comparison of the DAE-RF with the CM method using the artificially noisy test samples showed that DAE-RF offers higher noise robustness. Hence, the DAE-RF method could be utilized for noise-robust, fast, and neat classification of Bacillus species according to MALDI-TOF MS data.
Keywords: MALDI-TOF; classification of Bacillus species; denoising autoencoder; random forest.
© 2023 the author(s), published by De Gruyter, Berlin/Boston.