What SNP genotyping errors are most costly for genetic association studies?

Genet Epidemiol. 2004 Feb;26(2):132-41. doi: 10.1002/gepi.10301.

Abstract

Which genotype misclassification errors are most costly, in terms of increased sample size necessary (SSN) to maintain constant asymptotic power and significance level, when performing case/control studies of genetic association? We answer this question for single-nucleotide polymorphisms (SNPs), using the 2x3 chi(2) test of independence. Our strategy is to expand the noncentrality parameter of the asymptotic distribution of the chi(2) test under a specified alternative hypothesis to approximate SSN, using a linear Taylor series in the error parameters. We consider two scenarios: the first assumes Hardy-Weinberg equilibrium (HWE) for the true genotypes in both cases and controls, and the second assumes HWE only in controls. The Taylor series approximation has a relative error of less than 1% when each error rate is less than 2%. The most costly error is recording the more common homozygote as the less common homozygote, with indefinitely increasing cost coefficient as minor SNP allele frequencies approach 0 in both scenarios. The cost of misclassifying the more common homozygote to the heterozygote also becomes indefinitely large as the minor SNP allele frequency goes to 0 under both scenarios. For the violation of HWE modeled here, the cost of misclassifying a heterozygote to the less common homozygote becomes large, although bounded. Therefore, the use of SNPs with a small minor allele frequency requires careful attention to the frequency of genotyping errors to ensure that power specifications are met. Furthermore, the design of automated genotyping should minimize those errors whose cost coefficients can become indefinitely large.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Alleles
  • Case-Control Studies
  • Chromosome Mapping / statistics & numerical data*
  • Gene Frequency / genetics*
  • Genotype*
  • Humans
  • Linkage Disequilibrium / genetics
  • Mathematical Computing
  • Phenotype
  • Polymorphism, Single Nucleotide / genetics*
  • Reproducibility of Results
  • Sample Size
  • Selection Bias
  • Software