AOC: Analysis of Orthologous Collections - an application for the characterization of natural selection in protein-coding sequences

ArXiv [Preprint]. 2024 Jun 13:arXiv:2406.09522v1.

Abstract

Motivation: Modern molecular sequence analysis increasingly relies on automated and robust software tools for interpretation, annotation, and biological insight. The Analysis of Orthologous Collections (AOC) application automates the identification of genomic sites and species/lineages influenced by natural selection in coding sequence analysis. AOC quantifies different types of selection: negative, diversifying or directional positive, or differential selection between groups of branches. We include all steps necessary to go from unaligned homologous sequences to complete results and interactive visualizations that are designed to aid in the useful interpretation and contextualization.

Results: We are motivated by a desire to make evolutionary analyses as simple as possible, and to close the disparity in the literature between genes which draw a significant amount of interest and those that are largely overlooked and underexplored. We believe that such underappreciated and understudied genetic datasets can hold rich biological information and offer substantial insights into the diverse patterns and processes of evolution, especially if domain experts are able to perform the analyses themselves.

Availability and implementation: A Snakemake [Mölder et al., 2021] application implementation is publicly available on GitHub at https://github.com/aglucaci/AnalysisOfOrthologousCollections and is accompanied by software documentation and a tutorial.

Publication types

  • Preprint