Supervised Statistical Learning Prediction of Soybean Varieties and Cultivation Sites Using Rapid UPLC-MS Separation, Method Validation, and Targeted Metabolomic Analysis of 31 Phenolic Compounds in the Leaves

Metabolites. 2021 Dec 17;11(12):884. doi: 10.3390/metabo11120884.

Abstract

Soybean (Glycine max; SB) leaf (SL) is an abundant non-conventional edible resource that possesses value-adding bioactive compounds. We predicted the attributes of SB based on the metabolomes of an SL using targeted metabolomics. The SB was planted in two cities, and SLs were regularly obtained from the SB plant. Nine flavonol glycosides were purified from SLs, and a validated simultaneous quantification method was used to establish rapid separation by ultrahigh-performance liquid chromatography-mass detection. Changes in 31 targeted compounds were monitored, and the compounds were discriminated by various supervised machine learning (ML) models. Isoflavones, quercetin derivatives, and flavonol derivatives were discriminators for cultivation days, varieties, and cultivation sites, respectively, using the combined criteria of supervised ML models. The neural model exhibited higher prediction power of the factors with high fitness and low misclassification rates while other models showed lower. We propose that a set of phytochemicals of SL is a useful predictor for discriminating characteristics of edible plants.

Keywords: chemometrics; flavonoid; machine learning; multivariate analysis; non-conventional edible plants; soybean leaf; targeted metabolomics.