Rapid, robust, and accurate biomass compositional analyses are required in the bioenergy industry to accurately determine the chemical composition of biomass feedstocks. A stacked regression ensemble approach using near infrared spectroscopic method was developed for the quantitative determination of glucan, xylan, lignin, ash, and extract in biomass feedstocks. A comprehensive comparison of the performance of various machine learning techniques including support vector regression (linear and radial), least absolute shrinkage and selection operator (LASSO), ridge regression, elastic net, partial least squares, random forests, recursive partitioning and regression trees, gradient boosting, and gaussian process regression was assessed in the training set data (n = 188). The predictive performance of the aforementioned machine learning approaches was then compared with stacked regression, an ensemble learning algorithm which collates the performance of the abovementioned machine learning regression techniques. Results show that the stacked regression primarily outperformed other machine learning techniques (Root mean square error of prediction (RMSEP)average=1.660%wt,R2=0.907) across all five constituents in the validation set data (n = 81). Further results also show that the RMSEP of the stacked ensemble technique is significantly different than that of the partial least squares (PLS) approach in predicting glucan, ash, lignin, and extract components in biomass samples. The stacked ensemble learning approach offers an alternative method for a more accurate prediction of biomass compositions than the traditional PLS technique.
Keywords: Biomass; Chemometrics; Near infrared spectroscopy; Partial least squares; Stacking.
Copyright © 2022 Elsevier B.V. All rights reserved.