We describe an approach for picking haplotype-tagging single nucleotide polymorphisms (htSNPs) that is presently being taken in two large nested case-control studies within a multiethnic cohort (MEC), which are engaged in a search for associations between risk of prostate and breast cancer and common genetic variations in candidate genes. Based on a preliminary sample of 70 control subjects chosen at random from each of the 5 ethnic groups in the MEC we estimate haplotype frequencies using a variant of the Excoffier-Slatkin E-M algorithm after genotyping a high density of SNPs selected every 3-5 kb in and surrounding a candidate gene. In order to evaluate the performance of a candidate set of htSNPS (which will be genotyped in the much larger case-control sample) we treat the haplotype frequencies estimate above as known, and carry out a formal calculation of the uncertainty of the number of copies of common haplotypes carried by an individual, summarizing this calculation as a coefficient of determination, R2h. A candidate set of htSNPS of a given size is chosen so as to maximize the minimum value of R2h over the common haplotypes, h.
Copyright 2003 S. Karger AG, Basel