Complete genome sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7

DNA Res. 2001 Aug 31;8(4):123-40. doi: 10.1093/dnares/8.4.123.

Abstract

The complete genomic sequence of an aerobic thermoacidophilic crenarchaeon, Sulfolobus tokodaii strain7 which optimally grows at 80 degrees C, at low pH, and under aerobic conditions, has been determined by the whole genome shotgun method with slight modifications. The genomic size was 2,694,756 bp long and the G + C content was 32.8%. The following RNA-coding genes were identified: a single 16S-23S rRNA cluster, one 5S rRNA gene and 46 tRNA genes (including 24 intron-containing tRNA genes). The repetitive sequences identified were SR-type repetitive sequences, long dispersed-type repetitive sequences and Tn-like repetitive elements. The genome contained 2826 potential protein-coding regions (open reading frames, ORFs). By similarity search against public databases, 911 (32.2%) ORFs were related to functional assigned genes, 921 (32.6%) were related to conserved ORFs of unknown function, 145 (5.1%) contained some motifs, and remaining 849 (30.0%) did not show any significant similarity to the registered sequences. The ORFs with functional assignments included the candidate genes involved in sulfide metabolism, the TCA cycle and the respiratory chain. Sequence comparison provided evidence suggesting the integration of plasmid, rearrangement of genomic structure, and duplication of genomic regions that may be responsible for the larger genomic size of the S. tokodaii strain7 genome. The genome contained eukaryote-type genes which were not identified in other archaea and lacked the CCA sequence in the tRNA genes. The result suggests that this strain is closer to eukaryotes among the archaea strains so far sequenced. The data presented in this paper are also available on the internet homepage (http://www.bio.nite.go.jp/E-home/genome_list-e.html/).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Archaeal Proteins / genetics
  • Base Sequence
  • Chromosome Mapping
  • Chromosomes, Archaeal / genetics
  • Codon / genetics
  • Conserved Sequence
  • DNA, Archaeal / genetics
  • Electron Transport / genetics
  • Gene Duplication
  • Genome, Archaeal*
  • Molecular Sequence Data
  • Open Reading Frames
  • Plasmids / genetics
  • RNA, Archaeal / genetics
  • Sulfides / metabolism
  • Sulfolobus / genetics*
  • Sulfolobus / metabolism

Substances

  • Archaeal Proteins
  • Codon
  • DNA, Archaeal
  • RNA, Archaeal
  • Sulfides

Associated data

  • GENBANK/AP000981
  • GENBANK/AP000982
  • GENBANK/AP000983
  • GENBANK/AP000984
  • GENBANK/AP000985
  • GENBANK/AP000986
  • GENBANK/AP000987
  • GENBANK/AP000988
  • GENBANK/AP000989
  • GENBANK/AP000990