Data mining of Mycobacterium tuberculosis complex genotyping results using mycobacterial interspersed repetitive units validates the clonal structure of spoligotyping-defined families

Res Microbiol. 2004 Oct;155(8):647-54. doi: 10.1016/j.resmic.2004.04.013.

Abstract

Recently, a combination of spoligotyping and bioinformatics was proposed as a potential tool for defining major circulating clades of tuberculosis bacilli. In the present study, we attempted to validate the above mentioned classification using a new high-throughput marker, named mycobacterial interspersed repetitive units (MIRUs). Using 12 MIRU loci and spoligotyping, we performed data mining of results on clinical isolates of the Mycobacterium tuberculosis complex representative of global mycobacterial allelic diversity. Knowledge rules permitting automatic labeling of major M. tuberculosis families were defined. Using this strategy, MIRU 24 appeared to be most appropriate for classifying our dataset. The Bovis family was shown to be perfectly classified by a maximum of 3 MIRUs, followed by Africanum and East African Indian (EAI) families by 4 MIRUs, the Beijing family by 6 MIRUs, Haarlem and X families by 8 MIRUs, the T family by 9, and the Latin-American and Mediterranean (LAM) family by 10 MIRUs. Considering the hierarchy of family divergence, our results corroborate a recent suggestion that EAI is the ancestral family followed by Africanum and Bovis. On the other hand, T, X, LAM and Haarlem families appear to be of more recent evolution. These results indicate that data mining of MIRUs is a valuable new tool for analyzing the evolutionary dynamics of the M. tuberculosis complex, and for monitoring an infectious disease such as tuberculosis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacterial Typing Techniques*
  • Computational Biology
  • DNA, Bacterial / analysis
  • DNA, Intergenic / genetics
  • Databases, Nucleic Acid
  • Genotype
  • Minisatellite Repeats / genetics*
  • Mycobacterium tuberculosis / genetics*
  • Mycobacterium tuberculosis / isolation & purification
  • Software

Substances

  • DNA, Bacterial
  • DNA, Intergenic