Comprehensive Assessment of Genotype Imputation Performance

Hum Hered. 2018;83(3):107-116. doi: 10.1159/000489758. Epub 2019 Jan 22.

Abstract

Genotype imputation is a process of estimating missing ge-notypes from the haplotype or genotype reference panel. It can effectively boost the power of detecting single nucleotide polymorphisms (SNPs) in genome-wide association studies, integrate multi-studies for meta-analysis, and be applied in fine-mapping studies. The performance of genotype imputation is affected by many factors, including software, reference selection, sample size, and SNP density/sequencing coverage. A systematical evaluation of the imputation performance of current popular software will benefit future studies. Here, we evaluate imputation performances of Beagle4.1, IMPUTE2, MACH+Minimac3, and SHAPEIT2+ IM-PUTE2 using test samples of East Asian ancestry and references of the 1000 Genomes Project. The result indicated the accuracy of IMPUTE2 (99.18%) is slightly higher than that of the others (Beagle4.1: 98.94%, MACH+Minimac3: 98.51%, and SHAPEIT2+IMPUTE2: 99.08%). To achieve good and stable imputation quality, the minimum requirement of SNP density needs to be > 200/Mb. The imputation accuracies of IMPUTE2 and Beagle4.1 were under the minor influence of the study sample size. The contribution extent of reference to genotype imputation performance relied on software selection. We assessed the imputation performance on SNPs generated by next-generation whole genome sequencing and found that SNP sets detected by sequencing with 15× depth could be mostly got by imputing from the haplotype reference panel of the 1000 Genomes Project based on SNP data detected by sequencing with 4× depth. All of the imputation software had a weaker performance in low minor allele frequency SNP regions because of the bias of reference or software. In the future, more comprehensive reference panels or new algorithm developments may rise up to this challenge.

Keywords: Genome-wide association study; Genotype imputation; Minor allele frequency; Whole genome sequencing.

Publication types

  • Meta-Analysis
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Chromosomes, Human, Pair 1 / genetics
  • Chromosomes, Human, Pair 22 / genetics
  • False Positive Reactions
  • Gene Frequency
  • Genotype
  • Genotyping Techniques / methods*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Polymorphism, Single Nucleotide / genetics
  • Sample Size
  • Software