The accuracy of gene localization, the reliability of locus-specific effect estimates, and the ability to replicate initial claims of linkage and/or association have emerged as major methodological concerns in genomewide studies of complex diseases and quantitative traits. To address the issue of multiple comparisons inherent in genomewide studies, the use of stringent criteria for assessing statistical significance has been generally acknowledged as a strategy to control type I error. However, the application of genomewide significance criteria does not take account of the selection bias introduced into parameter estimates, e.g., estimates of locus-specific effect size of disease/trait loci. Some have argued that reliable locus-specific parameter estimates can only be obtained in an independent sample. In this report, we examine statistical resampling techniques, including cross-validation and the bootstrap, applied to the initial sample to improve the estimation of locus-specific effects. We compare them with the naive method in which all data are used for both hypothesis testing and parameter estimation, as well as with the split-sample approach in which part of the data are reserved for estimation. Upward bias of the naive estimator and inadequacy of the split-sample approach are derived analytically under a simple quantitative trait model. Simulation studies of the resampling methods are performed for both the simple model and a more realistic genomewide linkage analysis. Our results suggest that cross-validation and bootstrap methods can substantially reduce the estimation bias, especially when the effect size is small or there is no genetic effect.
(c) 2005 Wiley-Liss, Inc.