Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data

PLoS One. 2015 Apr 28;10(4):e0126289. doi: 10.1371/journal.pone.0126289. eCollection 2015.

Abstract

A hybrid de novo assembly pipeline was constructed to utilize both MiSeq and SOLiD short read data in combination in the assembly. The short read data were converted to a standard format of the pipeline, and were supplied to the pipeline components such as ABySS and SOAPdenovo. The assembly pipeline proceeded through several stages, and either MiSeq paired-end data, SOLiD mate-paired data, or both of them could be specified as input data at each stage separately. The pipeline was examined on the filamentous fungus Aspergillus oryzae RIB40, by aligning the assembly results against the reference sequences. Using both the MiSeq and the SOLiD data in the hybrid assembly, the alignment length was improved by a factor of 3 to 8, compared with the assemblies using either one of the data types. The number of the reproduced gene cluster regions encoding secondary metabolite biosyntheses (SMB) was also improved by the hybrid assemblies. These results imply that the MiSeq data with long read length are essential to construct accurate nucleotide sequences, while the SOLiD mate-paired reads with long insertion length enhance long-range arrangements of the sequences. The pipeline was also tested on the actinomycete Streptomyces avermitilis MA-4680, whose gene is known to have high-GC content. Although the quality of the SOLiD reads was too low to perform any meaningful assemblies by themselves, the alignment length to the reference was improved by a factor of 2, compared with the assembly using only the MiSeq data.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aspergillus oryzae / genetics*
  • Base Sequence
  • Genome, Bacterial*
  • Genome, Fungal*
  • High-Throughput Nucleotide Sequencing / methods*
  • Molecular Sequence Data
  • Multigene Family
  • Open Reading Frames / genetics
  • Reference Standards
  • Reproducibility of Results
  • Streptomyces / genetics*

Associated data

  • BioProject/PRJNA277389
  • GENBANK/JZJK00000000
  • GENBANK/JZJM00000000
  • SRA/PRJNA277168

Grants and funding

Cift Corporation provided support in the form of a salary for author T. Inatsugi, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of the authors are articulated in the "author contributions" section.