On overfitting, generalization, and randomly expanded training sets

IEEE Trans Neural Netw. 2000;11(5):1050-7. doi: 10.1109/72.870038.

Abstract

An algorithmic procedure is developed for the random expansion of a given training set to combat overfitting and improve the generalization ability of backpropagation trained multilayer perceptrons (MLPs). The training set is K-means clustered and locally most entropic colored Gaussian joint input-output probability density function (pdf) estimates are formed per cluster. The number of clusters is chosen such that the resulting overall colored Gaussian mixture exhibits minimum differential entropy upon global cross-validated shaping. Numerical studies on real data and synthetic data examples drawn from the literature illustrate and support these theoretical developments.