A Deep Learning-Based Method for Identification of Bacteriophage-Host Interaction

IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1801-1810. doi: 10.1109/TCBB.2020.3017386. Epub 2021 Oct 7.

Abstract

Multi-drug resistance (MDR) has become one of the greatest threats to human health worldwide, and novel treatment methods of infections caused by MDR bacteria are urgently needed. Phage therapy is a promising alternative to solve this problem, to which the key is correctly matching target pathogenic bacteria with the corresponding therapeutic phage. Deep learning is powerful for mining complex patterns to generate accurate predictions. In this study, we develop PredPHI (Predicting Phage-Host Interactions), a deep learning-based tool capable of predicting the host of phages from sequence data. We collect >3000 phage-host pairs along with their protein sequences from PhagesDB and GenBank databases and extract a set of features. Then we select high-quality negative samples based on the K-Means clustering method and construct a balanced training set. Finally, we employ a deep convolutional neural network to build the predictive model. The results indicate that PredPHI can achieve a predictive performance of 81 percent in terms of the area under the receiver operating characteristic curve on the test set, and the clustering-based method is significantly more robust than that based on randomly selecting negative samples. These results highlight that PredPHI is a useful and accurate tool for identifying phage-host interactions from sequence data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bacteria / genetics
  • Bacteriophages / genetics*
  • Computational Biology / methods*
  • DNA, Bacterial / genetics
  • DNA, Viral / genetics
  • Deep Learning*
  • Drug Resistance, Bacterial / genetics
  • Microbial Interactions / genetics*
  • Sequence Analysis, DNA / methods*

Substances

  • DNA, Bacterial
  • DNA, Viral