Reduction of selection bias in genomewide studies by resampling

Lei Sun; Shelley B Bull

doi:10.1002/gepi.20068

Reduction of selection bias in genomewide studies by resampling

Genet Epidemiol. 2005 May;28(4):352-67. doi: 10.1002/gepi.20068.

Authors

Lei Sun¹, Shelley B Bull

Affiliation

¹ Department of Public Health Sciences, University of Toronto, Toronto, Canada. lei.sun@utoronto.ca

PMID: 15761913
DOI: 10.1002/gepi.20068

Abstract

The accuracy of gene localization, the reliability of locus-specific effect estimates, and the ability to replicate initial claims of linkage and/or association have emerged as major methodological concerns in genomewide studies of complex diseases and quantitative traits. To address the issue of multiple comparisons inherent in genomewide studies, the use of stringent criteria for assessing statistical significance has been generally acknowledged as a strategy to control type I error. However, the application of genomewide significance criteria does not take account of the selection bias introduced into parameter estimates, e.g., estimates of locus-specific effect size of disease/trait loci. Some have argued that reliable locus-specific parameter estimates can only be obtained in an independent sample. In this report, we examine statistical resampling techniques, including cross-validation and the bootstrap, applied to the initial sample to improve the estimation of locus-specific effects. We compare them with the naive method in which all data are used for both hypothesis testing and parameter estimation, as well as with the split-sample approach in which part of the data are reserved for estimation. Upward bias of the naive estimator and inadequacy of the split-sample approach are derived analytically under a simple quantitative trait model. Simulation studies of the resampling methods are performed for both the simple model and a more realistic genomewide linkage analysis. Our results suggest that cross-validation and bootstrap methods can substantially reduce the estimation bias, especially when the effect size is small or there is no genetic effect.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Chromosome Mapping
Computer Simulation / statistics & numerical data
Genetic Linkage
Genetic Markers
Genome, Human*
Humans
Models, Genetic*
Sample Size
Selection Bias*
Siblings

Substances

Genetic Markers

Abstract

Publication types

MeSH terms

Substances

Grants and funding