Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods

Renxiang Yan; Xiaofeng Wang; Yarong Tian; Jing Xu; Xiaoli Xu; Juan Lin

doi:10.1039/c9mo00043g

Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods

Mol Omics. 2019 Jun 1;15(3):205-215. doi: 10.1039/c9mo00043g. Epub 2019 May 2.

Authors

Renxiang Yan¹, Xiaofeng Wang², Yarong Tian³, Jing Xu¹, Xiaoli Xu⁴, Juan Lin¹

Affiliations

¹ School of Biological Sciences and Engineering, Fuzhou University, Fuzhou 350002, China. yanrenxiang@fzu.edu.cn ljuan@fzu.edu.cn and Fujian Key Laboratory of Marine Enzyme Engineering, Fuzhou 350002, China.
² College of Mathematics and Computer Science, Shanxi Normal University, Linfen 041004, China.
³ Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 40530, Sweden.
⁴ School of Biological Sciences and Engineering, Fuzhou University, Fuzhou 350002, China. yanrenxiang@fzu.edu.cn ljuan@fzu.edu.cn.

PMID: 31046040
DOI: 10.1039/c9mo00043g

Abstract

The zinc (Zn²⁺) cofactor has been proven to be involved in numerous biological mechanisms and the zinc-binding site is recognized as one of the most important post-translation modifications in proteins. Therefore, accurate knowledge of zinc ions in protein structures can provide potential clues for elucidation of protein folding and functions. However, determining zinc-binding residues by experimental means is usually lab-intensive and associated with high cost in most cases. In this context, the development of computational tools for identifying zinc-binding sites is highly desired, especially in the current post-genomic era. In this work, we developed a novel zinc-binding site prediction method by combining several intensively-trained machine learning models. To establish an accurate and generative method, we downloaded all zinc-binding proteins from the Protein Data Bank and prepared a non-redundant dataset. Meanwhile, a well-prepared dataset by other groups was also used. Then, effective and complementary features were extracted from sequences and three-dimensional structures of these proteins. Moreover, several well-designed machine learning models were intensively trained to construct accurate models. To assess the performance, the obtained predictors were stringently benchmarked using the diverse zinc-binding sites. Furthermore, several state-of-the-art in silico methods developed specifically for zinc-binding sites were also evaluated and compared. The results confirmed that our method is very competitive in real world applications and could become a complementary tool to wet lab experiments. To facilitate research in the community, a web server and stand-alone program implementing our method were constructed and are publicly available at . The downloadable program of our method can be easily used for the high-throughput screening of potential zinc-binding sites across proteomes.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acid Sequence
Binding Sites
Computational Biology / methods*
Computer Simulation
Databases, Protein
Machine Learning*
Protein Binding
Protein Conformation
Protein Folding
Software
Support Vector Machine
Zinc / chemistry*

Substances

Zinc