Geptop: a gene essentiality prediction tool for sequenced bacterial genomes based on orthology and phylogeny

PLoS One. 2013 Aug 15;8(8):e72343. doi: 10.1371/journal.pone.0072343. eCollection 2013.

Abstract

Integrative genomics predictors, which score highly in predicting bacterial essential genes, would be unfeasible in most species because the data sources are limited. We developed a universal approach and tool designated Geptop, based on orthology and phylogeny, to offer gene essentiality annotations. In a series of tests, our Geptop method yielded higher area under curve (AUC) scores in the receiver operating curves than the integrative approaches. In the ten-fold cross-validations among randomly upset samples, Geptop yielded an AUC of 0.918, and in the cross-organism predictions for 19 organisms Geptop yielded AUC scores between 0.569 and 0.959. A test applied to the very recently determined essential gene dataset from the Porphyromonas gingivalis, which belongs to a phylum different with all of the above 19 bacterial genomes, gave an AUC of 0.77. Therefore, Geptop can be applied to any bacterial species whose genome has been sequenced. Compared with the essential genes uniquely identified by the lethal screening, the essential genes predicted only by Gepop are associated with more protein-protein interactions, especially in the three bacteria with lower AUC scores (<0.7). This may further illustrate the reliability and feasibility of our method in some sense. The web server and standalone version of Geptop are available at http://cefg.uestc.edu.cn/geptop/ free of charge. The tool has been run on 968 bacterial genomes and the results are accessible at the website.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Area Under Curve
  • Bacterial Proteins / genetics
  • Genes, Essential*
  • Genome, Bacterial*
  • Gram-Negative Bacteria / classification
  • Gram-Negative Bacteria / genetics*
  • Gram-Positive Bacteria / classification
  • Gram-Positive Bacteria / genetics*
  • Molecular Sequence Annotation
  • Phylogeny
  • Protein Interaction Mapping
  • ROC Curve
  • Reproducibility of Results
  • Software*

Substances

  • Bacterial Proteins

Grants and funding

This study was supported by the program for New Century Excellent Talents in University (NCET-11-0059, http://www.moe.gov.cn/), National Natural Science Foundation of China (grant 31071109 and 60801058, http://www.nsfc.gov.cn/), and the special fund of the China Postdoctoral Science Foundation (Grant 201104687, http://res.chinapostdoctor.org.cn/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.