Unlocking the mystery of the hard-to-sequence phage genome: PaP1 methylome and bacterial immunity

BMC Genomics. 2014 Sep 19;15(1):803. doi: 10.1186/1471-2164-15-803.

Abstract

Background: Whole-genome sequencing is an important method to understand the genetic information, gene function, biological characteristics and survival mechanisms of organisms. Sequencing large genomes is very simple at present. However, we encountered a hard-to-sequence genome of Pseudomonas aeruginosa phage PaP1. Shotgun sequencing method failed to complete the sequence of this genome.

Results: After persevering for 10 years and going over three generations of sequencing techniques, we successfully completed the sequence of the PaP1 genome with a length of 91,715 bp. Single-molecule real-time sequencing results revealed that this genome contains 51 N-6-methyladenines and 152 N-4-methylcytosines. Three significant modified sequence motifs were predicted, but not all of the sites found in the genome were methylated in these motifs. Further investigations revealed a novel immune mechanism of bacteria, in which host bacteria can recognise and repel modified bases containing inserts in a large scale. This mechanism could be accounted for the failure of the shotgun method in PaP1 genome sequencing. This problem was resolved using the nfi- mutant of Escherichia coli DH5α as a host bacterium to construct a shotgun library.

Conclusions: This work provided insights into the hard-to-sequence phage PaP1 genome and discovered a new mechanism of bacterial immunity. The methylome of phage PaP1 is responsible for the failure of shotgun sequencing and for bacterial immunity mediated by enzyme Endo V activity; this methylome also provides a valuable resource for future studies on PaP1 genome replication and modification, as well as on gene regulation and host interaction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA Methylation
  • Genome, Viral*
  • Genomic Library
  • High-Throughput Nucleotide Sequencing
  • Molecular Sequence Data
  • Pancreatitis-Associated Proteins
  • Pseudomonas Phages / genetics*
  • Pseudomonas Phages / immunology*
  • Pseudomonas aeruginosa / enzymology
  • Pseudomonas aeruginosa / immunology
  • Pseudomonas aeruginosa / virology
  • Sequence Analysis, DNA

Associated data

  • GENBANK/HQ832595
  • GEO/GSE50100