Haplotype frequency estimation in patient populations: the effect of departures from Hardy-Weinberg proportions and collapsing over a locus in the HLA region

Genet Epidemiol. 2002 Feb;22(2):186-95. doi: 10.1002/gepi.0163.

Abstract

Haplotype analyses are an important area in the study of the genetic components of human disease. Associations between markers and disease loci that are not evident with a single marker locus may be identified in multi-locus marker analyses using estimated haplotype frequencies (HFs). Procedures that make use of the expectation-maximization (EM) algorithm to estimate HFs from unphased genotype data are in common use in genetic studies. The EM algorithm uses these unphased genotype frequencies along with the assumption of Hardy-Weinberg proportions (HWP) to converge on HF estimates. In this paper, we assess the accuracy of EM estimates of HFs in patients with type I diabetes for whom the true haplotypes are known, but the data are analyzed ignoring family information to allow comparison between estimated and true frequencies. The data consist of six HLA loci with high levels of polymorphism and a range of departures from HWP and linkage equilibrium. While the overall accuracy of the EM estimates is good, there can be large over- and underestimates of particular HFs, even for common haplotypes, especially when the loci involved deviate significantly from HWP. Estimating HFs for three or more loci and then collapsing over loci so as to generate two locus haplotypes can improve the accuracy of the estimation. The collapsing procedure is most beneficial when one of the loci in the two-locus haplotype of interest deviates significantly from HWP and the locus collapsed over is in linkage disequilibrium with the other loci.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Alleles
  • Diabetes Mellitus, Type 1 / genetics*
  • Gene Frequency / genetics*
  • Genes, MHC Class I*
  • Genotype
  • Haplotypes / genetics*
  • Humans
  • Linkage Disequilibrium / genetics*
  • Models, Statistical
  • Polymorphism, Genetic