Computational prediction of N-linked glycosylation incorporating structural properties and patterns

Bioinformatics. 2012 Sep 1;28(17):2249-55. doi: 10.1093/bioinformatics/bts426. Epub 2012 Jul 10.

Abstract

Motivation: N-linked glycosylation occurs predominantly at the N-X-T/S motif, where X is any amino acid except proline. Not all N-X-T/S sequons are glycosylated, and a number of web servers for predicting N-linked glycan occupancy using sequence and/or residue pattern information have been developed. None of the currently available servers, however, utilizes protein structural information for the prediction of N-glycan occupancy.

Results: Here, we describe a novel classifier algorithm, NGlycPred, for the prediction of glycan occupancy at the N-X-T/S sequons. The algorithm utilizes both structural as well as residue pattern information and was trained on a set of glycosylated protein structures using the Random Forest algorithm. The best predictor achieved a balanced accuracy of 0.687 under 10-fold cross-validation on a curated dataset of 479 N-X-T/S sequons and outperformed sequence-based predictors when evaluated on the same dataset. The incorporation of structural information, including local contact order, surface accessibility/composition and secondary structure thus improves the prediction accuracy of glycan occupancy at the N-X-T/S consensus sequon.

Availability and implementation: NGlycPred is freely available to non-commercial users as a web-based server at http://exon.niaid.nih.gov/nglycpred/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Amino Acids / genetics
  • Amino Acids / metabolism
  • Glycoproteins / chemistry*
  • Glycoproteins / metabolism
  • Glycosylation
  • Models, Biological*
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism
  • Structure-Activity Relationship

Substances

  • Amino Acids
  • Glycoproteins
  • Proteins