Motivation: Most bioactive molecules perform their action by interacting with proteins or other macromolecules. However, for a significant fraction of them, the primary target remains unknown. In addition, the majority of bioactive molecules have more than one target, many of which are poorly characterized. Computational predictions of bioactive molecule targets based on similarity with known ligands are powerful to narrow down the number of potential targets and to rationalize side effects of known molecules.
Results: Using a reference set of 224 412 molecules active on 1700 human proteins, we show that accurate target prediction can be achieved by combining different measures of chemical similarity based on both chemical structure and molecular shape. Our results indicate that the combined approach is especially efficient when no ligand with the same scaffold or from the same chemical series has yet been discovered. We also observe that different combinations of similarity measures are optimal for different molecular properties, such as the number of heavy atoms. This further highlights the importance of considering different classes of similarity measures between new molecules and known ligands to accurately predict their targets.