Exploiting sequence similarity to validate the sensitivity of SNP arrays in detecting fine-scaled copy number variations

Bioinformatics. 2010 Apr 15;26(8):1007-14. doi: 10.1093/bioinformatics/btq088. Epub 2010 Feb 25.

Abstract

Motivation: High-density single nucleotide polymorphism (SNP) genotyping arrays are efficient and cost effective platforms for the detection of copy number variation (CNV). To ensure accuracy in probe synthesis and to minimize production costs, short oligonucleotide probe sequences are used. The use of short probe sequences limits the specificity of binding targets in the human genome. The specificity of these short probeset sequences has yet to be fully analysed against a normal reference human genome. Sequence similarity can artificially elevate or suppress copy number measurements, and hence reduce the reliability of affected probe readings. For the purpose of detecting narrow CNVs reliably down to the width of a single probeset, sequence similarity is an important issue that needs to be addressed.

Results: We surveyed the Affymetrix Human Mapping SNP arrays for probeset sequence similarity against the reference human genome. Utilizing sequence similarity results, we identified a collection of fine-scaled putative CNVs between gender from autosomal probesets whose sequence matches various loci on the sex chromosomes. To detect these variations, we utilized our statistical approach, Detecting REcurrent Copy number change using rank-order Statistics (DRECS), and showed that its performance was superior and more stable than the t-test in detecting CNVs. Through the application of DRECS on the HapMap population datasets with multi-matching probesets filtered, we identified biologically relevant SNPs in aberrant regions across populations with known association to physical traits, such as height, covered by the span of a single probe. This provided empirical confirmation of the existence of naturally occurring narrow CNVs as well as the sensitivity of the Affymetrix SNP array technology in detecting them.

Availability: The MATLAB implementation of DRECS is available at http://ww2.cs.mu.oz.au/ approximately gwong/DRECS/index.html.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms
  • DNA Copy Number Variations*
  • Genome, Human
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA / methods