AGILE: an assembled genome mining pipeline

Graham M Hughes; Emma C Teeling

doi:10.1093/bioinformatics/bty781

AGILE: an assembled genome mining pipeline

Bioinformatics. 2019 Apr 1;35(7):1252-1254. doi: 10.1093/bioinformatics/bty781.

Authors

Graham M Hughes¹, Emma C Teeling¹

Affiliation

¹ School of Biology and Environmental Science, University College Dublin, Dublin 4, Ireland.

PMID: 30184049
DOI: 10.1093/bioinformatics/bty781

Abstract

Summary: A number of limiting factors mean that traditional genome annotation tools either fail or perform sub-optimally when trying to detect coding sequences in poor quality genome assemblies/genome reports. This means that potentially useful data is accessible only to those with specific skills and expertise in assembly and annotation. We present an Assembled-Genome mIning pipeLinE (AGILE) written in Perl that combines bioinformatics tools with a number of steps to overcome the limitations imposed by such assemblies when applied to highly fragmented genomes. Our methodology uses user-specified query genes from a closely related species to mine and annotate coding sequences that would traditionally be missed by standard annotation packages. Despite a focus on mammalian genomes, the generalized implementation means that it may be applied to any genome assembly, providing a means for non-specialists to gather gene sequences for downstream analyses.

Availability and implementation: Source code and associated files are available at: https://github.com/batlabucd/GenomeMining and https://bitbucket.org/BatlabUCD/genomemining/src. Singularity and Virtual Box images available at https://figshare.com/s/a0004bf93dc43484b0c0.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Data Mining
Exons / genetics
Genome* / genetics
Genomics* / methods
Software*