Pitfalls of barcodes in the study of worldwide SARS-CoV-2 variation and phylodynamics

Zool Res. 2021 Jan 18;42(1):87-93. doi: 10.24272/j.issn.2095-8137.2020.364.

Abstract

Analysis of SARS-CoV-2 genome variation using a minimal number of selected informative sites conforming a genetic barcode presents several drawbacks. We show that purely mathematical procedures for site selection should be supervised by known phylogeny (i) to ensure that solid tree branches are represented instead of mutational hotspots with poor phylogeographic proprieties, and (ii) to avoid phylogenetic redundancy. We propose a procedure that prevents information redundancy in site selection by considering the cumulative informativeness of previously selected sites (as a proxy for phylogenetic-based criteria). This procedure demonstrates that, for short barcodes (e.g., 11 sites), there are thousands of informative site combinations that improve previous proposals. We also show that barcodes based on worldwide databases inevitably prioritize variants located at the basal nodes of the phylogeny, such that most representative genomes in these ancestral nodes are no longer in circulation. Consequently, coronavirus phylodynamics cannot be properly captured by universal genomic barcodes because most SARS-CoV-2 variation is generated in geographically restricted areas by the continuous introduction of domestic variants.

使用最少量的选定信息位点组成的基因条形码在分析SARS-Cov-2基因组变异时存在诸多弊端。我们的研究表明,仅用数学程序来选定位点时应由已知的系统发育学研究作为指导,(1)确保用实体树分支来代表,而不是具有较差的系统发育地理特性的突变热点;(2)避免系统发育冗余。我们提出了一个流程,即通过考虑先前选定位点的累积的信息量(作为基于系统发育分析的标准代表)来避免位点选择中的信息冗余。这个程序演示了,对于一些短的条形码(如有11个位点)来说,也有成千上万位点组合信息来改进之前的提议。我们的研究还表明,基于全球数据库的条形码不可避免的优先考虑那些位于系统发育的基础节点上的变异,这使得在这些祖先节点上的大多数代表性基因组不再反复出现。因此,冠状病毒的系统发育动力学无法通过普遍的基因组条形码捕获,因为大多数的SARS-Cov-2变异是在地理限制区域内引入当地的变异产生的。.

Keywords: Barcode; COVID-19; Informative subtype markers; Phylodynamics; Phylogeny; SARS-COV-2.

Publication types

  • Letter

MeSH terms

  • Algorithms
  • COVID-19 / virology*
  • DNA Barcoding, Taxonomic
  • Genetic Variation
  • Genome, Viral
  • Humans
  • Mutation
  • Phylogeny
  • Phylogeography
  • SARS-CoV-2 / classification*
  • SARS-CoV-2 / genetics*
  • SARS-CoV-2 / isolation & purification

Grants and funding

This study was supported by the GePEM (Instituto de Salud Carlos III(ISCIII)/PI16/01478/Cofinanciado FEDER), DIAVIR (Instituto de Salud Carlos III(ISCIII)/DTS19/00049/Cofinanciado FEDER; Proyecto de Desarrollo Tecnológico en Salud), Resvi-Omics (Instituto de Salud Carlos III(ISCIII)/PI19/01039/Cofinanciado FEDER), BI-BACVIR (PRIS-3; Agencia de Conocimiento en Salud (ACIS)—Servicio Gallego de Salud (SERGAS)—Xunta de Galicia; Spain), Programa Traslaciona Covid-19 (ACIS—Servicio Gallego de Salud (SERGAS)—Xunta de Galicia; Spain) and Axencia Galega de Innovación (GAIN; IN607B 2020/08—Xunta de Galicia; Spain) to A.S.; and ReSVinext (Instituto de Salud Carlos III(ISCIII)/PI16/01569/Cofinanciado FEDER), and Enterogen (Instituto de Salud Carlos III(ISCIII)/PI19/01090/Cofinanciado FEDER) to F.M.-T.