[A novel metabolomic data scaling method based on K-L divergence]

Guang Pu Xue Yu Guang Pu Fen Xi. 2014 Oct;34(10):2868-72.
[Article in Chinese]

Abstract

A new scaling method in the current study based on Kullback-Leibler (K-L) divergence is proposed for NMR metabolomic data. The proposed method (called K-L scaling) is a supervised scaling method as group information is incorporated in the scaling procedure. Notably, K-L divergence measures the difference between two different datasets by their probability distributions, it can be used for the analysis of data that either follows Gaussian or non-Gaussian distributions. In K-L scaling, all variables were first standardized to unit variance, then their variance was adjusted using Kullback-Leibler divergence to highlight the significant variables. K-L scaling can tell effectively the difference in spectral data points between two experimental groups, and then enhances the weights of biological-relevant variables, and at the same time reduces the weight of noise and uninformative variables. The developed method was applied to a H-NMR metabolomic dataset acquired from human urine. Analysis results of the dataset showed that this new scaling method is efficient in suppressing the contribution of noise in the resulting multivariate model In addition, it can increase the weights of important variables, and improve the interpretability and predictability of subsequent principal component regression (PCR) and partial least squares discriminant analysis (PLS-DA). Furthermore, the scaling method facilitated the identification of metabolic signatures. The current result suggested that the developed K-L scaling method may become a useful alternative for the preprocessing of NMR-based metabolomic data.

MeSH terms

  • Discriminant Analysis
  • Least-Squares Analysis
  • Magnetic Resonance Spectroscopy
  • Metabolomics / methods*