Practical considerations for dividing data into subsets prior to PPL analysis

Hum Hered. 2008;66(4):223-37. doi: 10.1159/000143405. Epub 2008 Jul 9.

Abstract

Objective: The PPL, a class of statistics for complex trait genetic mapping in humans, utilizes Bayesian sequential updating to accumulate evidence for or against linkage across potentially heterogeneous data (sub)sets. Here, we systematically explore the relative efficacy of alternative subsetting approaches for purposes of PPL calculation.

Methods: We simulated genotypes for three pedigree sets (sib pairs; 2-3 generations; >or=4 generations) based on families from an ongoing study. For each pedigree set, 100 replicates were generated under different levels of heterogeneity (1000 under 'no linkage'). Within each replicate, updating was performed across subsets defined randomly (RAND2, RAND4), by true (TRUE) linkage status, with a realistic (REAL) classification, by individual pedigree (PED), or without any subsetting (NONE).

Results: Under 'linkage', REAL yields larger PPLs compared to NONE, RAND2, RAND4, or PED. Under 'no linkage', RAND2, RAND4 and PED yield PPLs close to NONE.

Conclusions: We have examined the impact of different subsetting strategies on the sampling behavior of the PPL. Our results underscore the utility of finding variables that can help delineate more homogeneous data subsets and demonstrate that, once such variables are found, sequential updating can be highly beneficial in the presence of appreciable heterogeneity at a linked locus, without inflation at an unlinked locus.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem
  • Chromosome Mapping
  • Genetic Heterogeneity
  • Genetic Linkage*
  • Genotype
  • Humans
  • Pedigree
  • Probability*
  • Quantitative Trait Loci
  • Quantitative Trait, Heritable*