Motivation: Receptor-ligand interactions play an important role in controlling many biological systems. One prominent example is the binding of peptides to the major histocompatibility complex (MHC) molecules controlling the onset of cellular immune responses. Thousands of MHC allelic versions exist, making determination of the binding specificity for each variant experimentally infeasible. Here, we present a method that can extrapolate from variants with known binding specificity to those where no experimental data are available.
Results: For each position in the peptide ligand, we extracted the polymorphic pocket residues in MHC molecules that are in close proximity to the peptide residue. For MHC molecules with known specificities, we established a library of pocket-residues and corresponding binding specificities. The binding specificity for a novel MHC molecule is calculated as the average of the specificities of MHC molecules in this library weighted by the similarity of their pocket-residues to the query. This PickPocket method is demonstrated to accurately predict MHC-peptide binding for a broad range of MHC alleles, including human and non-human species. In contrast to neural network-based pan-specific methods, PickPocket was shown to be robust both when data is scarce and when the similarity to MHC molecules with characterized binding specificity is low. A consensus method combining the PickPocket and NetMHCpan methods was shown to achieve superior predictive performance. This study demonstrates how integration of diverse algorithmic approaches can lead to improved prediction. The method may also be used for making ligand-binding predictions for other types of receptors where many variants exist.