Background: The diversity-generating retroelements (DGRs) are a family of genetic elements that can produce mutations in target genes often related to ligand-binding functions, which possess a C-type lectin (CLec) domain that tolerates massive variations. They were first identified in viruses, then in bacteria and archaea from human-associated and environmental genomes. This DGR mechanism represents a fast adaptation of organisms to ever- changing environments. However, their existence, phylogenetic and structural diversity, and functions in a wide range of environments are largely unknown.
Results: Here we present a study of DGR systems based on metagenome-assembled genomes (MAGs) from host-associated, aquatic, terrestrial and engineered environments. In total, we identified 861 non-redundant DGR-RTs and ~ 5.7% are new. We found that microbes associated with human hosts harbor the highest number of DGRs and also exhibit a higher prevalence of DGRs. After normalizing with genome size and including more genome data, we found that DGRs occur more frequently in organisms with smaller genomes. Overall, we identified nine main clades in the phylogenetic tree of reverse transcriptases (RTs), some comprising specific phyla and cassette architectures. We identified 38 different cassette patterns and 6 of them were shown in at least 10 DGRs, showing differences in terms of the numbers, arrangements, and orientations of their components. Finally, most of the target genes were related to ligand-binding and signaling functions, but we discovered a few cases in which the VRs were situated in domains different from the CLec.
Conclusions: Our research sheds light on the widespread prevalence of DGRs within environments and taxa, and supports the DGR phylogenetic divergence in different organisms. These variations might also occur in their structures since some cassette architectures were common in specific underrepresented phyla. In addition, we suggest that VRs could be found in domains different to the CLec, which should be further explored for organisms in scarcely studied environments.
Keywords: C-type lectin (CLec) fold; Cassette structure; Diversity-generating retroelement (DGR); Domain annotation; Metagenome assembled genome (MAG); Target gene.
© 2024. The Author(s).