NNAlign_MA; MHC Peptidome Deconvolution for Accurate MHC Binding Motif Characterization and Improved T-cell Epitope Predictions

Mol Cell Proteomics. 2019 Dec;18(12):2459-2477. doi: 10.1074/mcp.TIR119.001658. Epub 2019 Oct 2.

Abstract

The set of peptides presented on a cell's surface by MHC molecules is known as the immunopeptidome. Current mass spectrometry technologies allow for identification of large peptidomes, and studies have proven these data to be a rich source of information for learning the rules of MHC-mediated antigen presentation. Immunopeptidomes are usually poly-specific, containing multiple sequence motifs matching the MHC molecules expressed in the system under investigation. Motif deconvolution -the process of associating each ligand to its presenting MHC molecule(s)- is therefore a critical and challenging step in the analysis of MS-eluted MHC ligand data. Here, we describe NNAlign_MA, a computational method designed to address this challenge and fully benefit from large, poly-specific data sets of MS-eluted ligands. NNAlign_MA simultaneously performs the tasks of (1) clustering peptides into individual specificities; (2) automatic annotation of each cluster to an MHC molecule; and (3) training of a prediction model covering all MHCs present in the training set. NNAlign_MA was benchmarked on large and diverse data sets, covering class I and class II data. In all cases, the method was demonstrated to outperform state-of-the-art methods, effectively expanding the coverage of alleles for which accurate predictions can be made, resulting in improved identification of both eluted ligands and T-cell epitopes. Given its high flexibility and ease of use, we expect NNAlign_MA to serve as an effective tool to increase our understanding of the rules of MHC antigen presentation and guide the development of novel T-cell-based therapeutics.

Keywords: Bioinformatics; algorithms; antigen presentation; bioinformatics software; immunoinformatics; immunology; immunopeptidomics; machine learning; mass spectrometry.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Motifs
  • Animals
  • Benchmarking
  • Cattle
  • Cell Line
  • Computational Biology / methods*
  • Databases, Protein
  • Datasets as Topic
  • Epitopes, T-Lymphocyte / metabolism*
  • Histocompatibility Antigens Class I / metabolism*
  • Histocompatibility Antigens Class II / metabolism*
  • Humans
  • Ligands
  • Machine Learning
  • Mass Spectrometry
  • Peptides / metabolism
  • Protein Binding

Substances

  • Epitopes, T-Lymphocyte
  • Histocompatibility Antigens Class I
  • Histocompatibility Antigens Class II
  • Ligands
  • Peptides