SIMLIN: a bioinformatics tool for prediction of S-sulphenylation in the human proteome based on multi-stage ensemble-learning models

Xiaochuan Wang; Chen Li; Fuyi Li; Varun S Sharma; Jiangning Song; Geoffrey I Webb

doi:10.1186/s12859-019-3178-6

SIMLIN: a bioinformatics tool for prediction of S-sulphenylation in the human proteome based on multi-stage ensemble-learning models

BMC Bioinformatics. 2019 Nov 21;20(1):602. doi: 10.1186/s12859-019-3178-6.

Authors

Xiaochuan Wang^{1

2}, Chen Li^{3

4}, Fuyi Li^{1

4}, Varun S Sharma³, Jiangning Song^{5

6

7}, Geoffrey I Webb⁸

Affiliations

¹ Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, 3800, Australia.
² Division of Cancer Epidemiology, Cancer Council Victoria, Melbourne, VIC, 3004, Australia.
³ Institute of Molecular Systems Biology, Department of Biology, ETH Zürich, 8093, Zürich, Switzerland.
⁴ Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia.
⁵ Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, 3800, Australia. Jiangning.Song@monash.edu.
⁶ Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia. Jiangning.Song@monash.edu.
⁷ ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, Melbourne, VIC, 3800, Australia. Jiangning.Song@monash.edu.
⁸ Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, 3800, Australia. Geoff.Webb@monash.edu.

Abstract

Background: S-sulphenylation is a ubiquitous protein post-translational modification (PTM) where an S-hydroxyl (-SOH) bond is formed via the reversible oxidation on the Sulfhydryl group of cysteine (C). Recent experimental studies have revealed that S-sulphenylation plays critical roles in many biological functions, such as protein regulation and cell signaling. State-of-the-art bioinformatic advances have facilitated high-throughput in silico screening of protein S-sulphenylation sites, thereby significantly reducing the time and labour costs traditionally required for the experimental investigation of S-sulphenylation.

Results: In this study, we have proposed a novel hybrid computational framework, termed SIMLIN, for accurate prediction of protein S-sulphenylation sites using a multi-stage neural-network based ensemble-learning model integrating both protein sequence derived and protein structural features. Benchmarking experiments against the current state-of-the-art predictors for S-sulphenylation demonstrated that SIMLIN delivered competitive prediction performance. The empirical studies on the independent testing dataset demonstrated that SIMLIN achieved 88.0% prediction accuracy and an AUC score of 0.82, which outperforms currently existing methods.

Conclusions: In summary, SIMLIN predicts human S-sulphenylation sites with high accuracy thereby facilitating biological hypothesis generation and experimental validation. The web server, datasets, and online instructions are freely available at http://simlin.erc.monash.edu/ for academic purposes.

Keywords: Bioinformatics software; Ensemble learning; Machine learning; Protein post-translational modification; S-sulphenylation.

MeSH terms

Algorithms*
Amino Acid Motifs
Amino Acid Sequence
Area Under Curve
Computational Biology / methods*
Conserved Sequence
Databases, Protein
Gene Ontology
Humans
Neural Networks, Computer
Proteome / metabolism*
ROC Curve
Software
Sulfamerazine / metabolism*

Substances

Proteome
sulfaperine
Sulfamerazine

Grants and funding

R01 AI111965/AI/NIAID NIH HHS/United States