This paper formulates a set of rules to classify genotypes of the Mycobacterium tuberculosis complex (MTBC) into major lineages using spoligotypes and MIRU-VNTR results. The rules synthesize prior literature that characterizes lineages by spacer deletions and variations in the number of repeats seen at locus MIRU24 (alias VNTR2687). A tool that efficiently and accurately implements this rule base is now freely available at http://tbinsight.cs.rpi.edu/run_tb_lineage.html. When MIRU24 data is not available, the system utilizes predictions made by a Naïve Bayes classifier based on spoligotype data. This website also provides a tool to generate spoligoforests in order to visualize the genetic diversity and relatedness of genotypes and their associated lineages. A detailed analysis of the application of these tools on a dataset collected by the CDC consisting of 3198 distinct spoligotypes and 5430 distinct MIRU-VNTR types from 37,066 clinical isolates is presented. The tools were also tested on four other independent datasets. The accuracy of automated classification using both spoligotypes and MIRU24 is >99%, and using spoligotypes alone is >95%. This online rule-based classification technique in conjunction with genotype visualization provides a practical tool that supports surveillance of TB transmission trends and molecular epidemiological studies.
Copyright © 2012 Elsevier B.V. All rights reserved.