In the everyday routine of an analytic lab, one is often confronted with the challenge to identify an unknown microbial sample lacking prior information to set the search limits. In the present work, we propose a workflow, which uses the spectral diversity of a commercial database (SARAMIS) to narrow down the search field at a certain taxonomic level, followed by a refined classification by supervised modelling. As supervised learning algorithm, we have chosen a shrinkage discriminant analysis approach, which takes collinearity of the data into account and provides a scoring system for biomarker ranking. This ranking can be used to tailor specific biomarker subsets, which optimize discrimination between subgroups, allowing a weighting of misclassification. The suitability of the approach was verified based on a dataset containing the mass spectra of three Yersinia species Yersinia enterocolitica, Y. pseudotuberculosis and Yersinia pestis. Thereby, we laid the emphasis on the discrimination between the highly related species Yersinia pseudotuberculosis and Y. pestis. All three species were correctly identified at the genus level by the commercial database. Whereas Y. enterocolitica was correctly identified at the species level, discrimination between the highly related Y. pseudotuberculosis and Y. pestis strains was ambiguous. With the use of the supervised modelling approach, we were able to accurately discriminate all the species even when grown under different culture conditions.
Copyright © 2010 Elsevier GmbH. All rights reserved.