Analyzing the topology of active sites: on the prediction of pockets and subpockets

J Chem Inf Model. 2010 Nov 22;50(11):2041-52. doi: 10.1021/ci100241y. Epub 2010 Oct 14.

Abstract

Automated prediction of protein active sites is essential for large-scale protein function prediction, classification, and druggability estimates. In this work, we present DoGSite, a new structure-based method to predict active sites in proteins based on a Difference of Gaussian (DoG) approach which originates from image processing. In contrast to existing methods, DoGSite splits predicted pockets into subpockets, revealing a refined description of the topology of active sites. DoGSite correctly predicts binding pockets for over 92% of the PDBBind and the scPDB data set, being in line with the best-performing methods available. In 63% of the PDBBind data set the detected pockets can be subdivided into smaller subpockets. The cocrystallized ligand is contained in exactly one subpocket in 87% of the predictions. Furthermore, we introduce a more precise prediction performance measure by taking the pairwise ligand and pocket coverage into account. In 90% of the cases DoGSite predicts a pocket that contains at least half of the ligand. In 70% of the cases additionally more than a quarter of the respective pocket itself is covered by the cocrystallized ligand. Consideration of subpockets produces an increase in coverage yielding a success rate of 83% for the latter measure.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Catalytic Domain*
  • Computational Biology / methods*
  • Databases, Protein
  • Humans
  • Models, Molecular
  • Normal Distribution
  • Pattern Recognition, Automated
  • Proteins / chemistry*
  • Proteins / metabolism*

Substances

  • Proteins