Distinguishing cancer-associated missense mutations from common polymorphisms

Cancer Res. 2007 Jan 15;67(2):465-73. doi: 10.1158/0008-5472.CAN-06-1736.

Abstract

Missense variants are commonly identified in genomic sequence but only a small fraction directly contribute to oncogenesis. The ability to distinguish those missense changes that contribute to cancer progression from those that do not is a difficult problem usually only accomplished through functional in vivo analyses. Using two computational algorithms, Sorting Intolerant from Tolerant (SIFT) and the Pfam-based LogR.E-value method, we have identified features that distinguish cancer-associated missense mutations from other classes of missense change. Our data reveal that cancer mutants behave similarly to Mendelian disease mutations, but are clearly distinct from either complex disease mutations or common single-nucleotide polymorphisms. We show that both activating and inactivating oncogenic mutations are predicted to be deleterious, although activating changes are likely to increase protein activity. Using the Gene Ontology and data from the SIFT and LogR.E-value metrics, a classifier was built that predicts cancer-associated missense mutations with a very low false-positive rate. The classifier does remarkably well in a number of different experiments designed to distinguish polymorphisms from true cancer-associated mutations. We also show that recurrently observed mutations are much more likely to be predicted to be cancer-associated than rare mutations, suggesting that our classifier will be useful in distinguishing causal from passenger mutations. In addition, from an expressed sequence tag-based screen, we identified a previously unknown germ line change (P1104A) in tumor tissues that is predicted to disrupt the function of the TYK2 protein. The data presented here show that this novel bioinformatics approach to classifying cancer-associated variants is robust and can be used for large-scale analyses.

MeSH terms

  • Algorithms
  • Alleles
  • Cell Line, Tumor
  • Computational Biology / methods
  • DNA, Neoplasm / genetics
  • Expressed Sequence Tags
  • Humans
  • Mutation, Missense*
  • Neoplasms / genetics*
  • Polymorphism, Single Nucleotide*

Substances

  • DNA, Neoplasm