GPS-pPLM: A Language Model for Prediction of Prokaryotic Phosphorylation Sites

Cells. 2024 Nov 8;13(22):1854. doi: 10.3390/cells13221854.

Abstract

In the prokaryotic kingdom, protein phosphorylation serves as one of the most important posttranslational modifications (PTMs) and is involved in orchestrating a broad spectrum of biological processes. Here, we report an updated online server named the group-based prediction system for prokaryotic phosphorylation language model (GPS-pPLM), used for predicting phosphorylation sites (p-sites) in prokaryotes. For model training, two deep learning methods, a transformer and a deep neural network, were employed, and a total of 10 sequence features and contextual features were integrated. Using 44,839 nonredundant p-sites in 16,041 proteins from 95 prokaryotes, two general models for the prediction of O-phosphorylation and N-phosphorylation were first pretrained and then fine-tuned to construct 6 predictors specific for each phosphorylatable residue type as well as 134 species-specific predictors. Compared with other existing tools, the GPS-pPLM exhibits higher accuracy in predicting prokaryotic O-phosphorylation p-sites. Protein sequences in FASTA format or UniProt accession numbers can be submitted by users, and the predicted results are displayed in tabular form. In addition, we annotate the predicted p-sites with knowledge from 22 public resources, including experimental evidence, 3D structures, and disorder tendencies. The online service of the GPS-pPLM is freely accessible for academic research.

Keywords: deep learning; language model; phosphorylation; posttranslational modification; prokaryote.

MeSH terms

  • Amino Acid Sequence
  • Computational Biology / methods
  • Deep Learning
  • Phosphorylation
  • Prokaryotic Cells* / metabolism
  • Protein Processing, Post-Translational
  • Software