Identifying promising compounds during the early stages of drug development is a major challenge for both academia and the pharmaceutical industry. The difficulties are even more pronounced when we consider multi-target pharmacology, where the compounds often target more than one protein, or multiple compounds are used together. Here, we address this problem by using machine learning and network analysis to process sequence and interaction data from human proteins to identify promising compounds. We used this strategy to identify properties that make certain proteins more likely to cause harmful effects when targeted; such proteins usually have domains commonly found throughout the human proteome. Additionally, since currently marketed drugs hit multiple targets simultaneously, we combined the information from individual proteins to devise a score that quantifies the likelihood of a compound being harmful to humans. This approach enabled us to distinguish between approved and problematic drugs with an accuracy of 60-70%. Moreover, our approach can be applied as soon as candidate drugs are available, as demonstrated with predictions for more than 5000 experimental drugs. These resources are available at http://sourceforge.net/projects/psin/.
Keywords: drug safety; machine learning; multi-target drugs; protein networks; supervised learning; target validation.