Background: Chemical similarity searching allows the retrieval of preferred screening molecules from a compound database. Candidates are ranked according to their similarity to a reference compound (query). Assessing the statistical significance of chemical similarity scores helps prioritizing significant hits, and identifying cases where the database does not contain any promising compounds.
Method: Our text-based similarity measure, Pharmacophore Alignment Search Tool (PhAST), employs pair-wise sequence alignment. We adapted the concept of E-values as significance estimates and employed a sampling technique that incorporates the principle of importance sampling in a Markov chain Monte Carlo simulation to generate distributions of random alignment scores. These distributions were used to compute significance estimates for similarity scores in a preliminary prospective virtual screen for inhibitors of Aurora A kinase.
Conclusion: Assessing the significance of compound similarity computed with PhAST allows for a statistically motivated identification of candidate screening compounds. Inhibitors of Aurora A kinase were retrieved from a large compound library.