Ensemble of linear models for predicting drug properties

J Chem Inf Model. 2006 Jan-Feb;46(1):416-23. doi: 10.1021/ci050375+.

Abstract

We propose a new classification method for the prediction of drug properties, called random feature subset boosting for linear discriminant analysis (LDA). The main novelty of this method is the ability to overcome the problems with constructing ensembles of linear discriminant models based on generalized eigenvectors of covariance matrices. Such linear models are popular in building classification-based structure-activity relationships. The introduction of ensembles of LDA models allows for an analysis of more complex problems than by using single LDA, for example, those involving multiple mechanisms of action. Using four data sets, we show experimentally that the method is competitive with other recently studied chemoinformatic methods, including support vector machines and models based on decision trees. We present an easy scheme for interpreting the model despite its apparent sophistication. We also outline theoretical evidence as to why, contrary to the conventional AdaBoost ensemble algorithm, this method is able to increase the accuracy of LDA models.

MeSH terms

  • ATP Binding Cassette Transporter, Subfamily B, Member 1 / metabolism
  • Algorithms
  • Biological Transport, Active
  • Computer Simulation*
  • Humans
  • Linear Models
  • Models, Chemical*
  • Pharmaceutical Preparations / chemistry*
  • Pharmaceutical Preparations / metabolism*
  • Structure-Activity Relationship
  • Substrate Specificity

Substances

  • ATP Binding Cassette Transporter, Subfamily B, Member 1
  • Pharmaceutical Preparations