ms-data-core-api: an open-source, metadata-oriented library for computational proteomics

Bioinformatics. 2015 Sep 1;31(17):2903-5. doi: 10.1093/bioinformatics/btv250. Epub 2015 Apr 24.

Abstract

The ms-data-core-api is a free, open-source library for developing computational proteomics tools and pipelines. The Application Programming Interface, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra to peptide/protein identifications to quantitative results. The library contains readers for three of the most used Proteomics Standards Initiative standard file formats: mzML, mzIdentML, and mzTab. In addition to mzML, it also supports other common mass spectra data formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based). Also, it can be used to read PRIDE XML, the original format used by the PRIDE database, one of the world-leading proteomics resources. Finally, we present a set of algorithms and tools whose implementation illustrates the simplicity of developing applications using the library.

Availability and implementation: The software is freely available at https://github.com/PRIDE-Utilities/ms-data-core-api.

Supplementary information: Supplementary data are available at Bioinformatics online

Contact: juan@ebi.ac.uk.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Databases, Protein*
  • Humans
  • Mass Spectrometry / methods*
  • Peptide Fragments / analysis
  • Proteins / analysis*
  • Proteomics / methods*
  • Software*
  • Workflow

Substances

  • Peptide Fragments
  • Proteins