Data-independent acquisition (DIA) mass spectrometry methods provide systematic and comprehensive quantification of the proteome; yet, relatively few open-source tools are available to analyze DIA proteomics experiments. Fewer still are tools that can leverage gas phase fractionated (GPF) chromatogram libraries to enhance the detection and quantification of peptides in these experiments. Here, we present nf-encyclopedia, an open-source NextFlow pipeline that connects three open-source tools, MSConvert, EncyclopeDIA, and MSstats, to analyze DIA proteomics experiments with or without chromatogram libraries. We demonstrate that nf-encyclopedia is reproducible when run on either a cloud platform or a local workstation and provides robust peptide and protein quantification. Additionally, we found that MSstats enhances protein-level quantitative performance over EncyclopeDIA alone. Finally, we benchmarked the ability of nf-encyclopedia to scale to large experiments in the cloud by leveraging the parallelization of compute resources. The nf-encyclopedia pipeline is available under a permissive Apache 2.0 license; run it on your desktop, cluster, or in the cloud: https://github.com/TalusBio/nf-encyclopedia.
Keywords: bioinformatics; cloud; data-independent acquisition; mass spectrometry; pipeline; proteomics; reproducibility; workflow.