PaVE 2.0: behind the scenes of the Papillomavirus Episteme

J Mol Biol. 2024 Dec 26:168925. doi: 10.1016/j.jmb.2024.168925. Online ahead of print.

Abstract

The Papilloma Virus Episteme (PaVE) https://pave.niaid.nih.gov/ was initiated by NIAID in 2008 to provide a highly curated bioinformatic and knowledge resource for the papillomavirus scientific community. It rapidly became the fundamental and core resource for papillomavirus researchers and clinicians worldwide. Over time, the software infrastructure became severely outdated. In PaVE 2.0, the underlying libraries and hosting platform have been completely upgraded and rebuilt using Amazon Web Services (AWS) tools and automated CI/CD (continuous integration and deployment) pipelines for deployment of the application and data (now in AWS S3 cloud storage). PaVE 2.0 is hosted on three AWS ECS containers using the NIAID Operations & Engineering Branch's Monarch tech stack and terraform. A new Celery queue supports longer running tasks. The framework is Python Flask with a JavaScript/JINJA template front end, and the database switched from MySQL to Neo4j. A Swagger API (application programming Interface) performs database queries, and executes jobs for BLAST, MAFFT, and the L1 typing tooland will allow future programmatic data access. All major tools such as BLAST, the L1 typing tool, genome locus viewer, phylogenetic tree generator, multiple sequence alignment, and protein structure viewer were modernized and enhanced to support more users. Multiple sequence alignment uses MAFFT instead of COBALT. The protein structure viewer was changed from Jmol to Mol*, the new embeddable viewer used by RCSB. In summary, PaVE 2.0 allows us to continue to provide this essential resource with an open-source framework that could be used as a template for molecular biology databases of other viruses.

Keywords: BLAST; Database; HPV; papillomavirus; phylogenetic tree; protein alignment; protein structure; virus.