Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction

Chia-Yen Chen; Jiali Han; David J Hunter; Peter Kraft; Alkes L Price

doi:10.1002/gepi.21906

Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction

Genet Epidemiol. 2015 Sep;39(6):427-38. doi: 10.1002/gepi.21906. Epub 2015 May 21.

Authors

Chia-Yen Chen¹, Jiali Han^{1

2

3}, David J Hunter^{1

2

4}, Peter Kraft^{1

2

4

5}, Alkes L Price^{1

4

5}

Affiliations

¹ Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.
² Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States of America.
³ Department of Epidemiology, Richard M. Fairbanks School of Public Health, Simon Cancer Center, Indiana University, Indianapolis, Indiana, United States of America.
⁴ Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America.
⁵ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, Broad Institute of Harvard and MIT, Cambridge.

Abstract

Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color (HC), tanning ability (TA), and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRSs) and best linear unbiased prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R(2) for HC increased by 66% (0.0456-0.0755; P < 10(-16)), the R(2) for TA increased by 123% (0.0154 to 0.0344; P < 10(-16)), and the liability-scale R(2) for BCC increased by 68% (0.0138-0.0232; P < 10(-16)) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being overweighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be underweighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.

Keywords: basal cell carcinoma; genome-wide association study; pigmentation; polygenic prediction; principal component analysis.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Carcinoma, Basal Cell / diagnosis
Carcinoma, Basal Cell / genetics
Genome-Wide Association Study*
Genotype
Humans
Models, Genetic
Multifactorial Inheritance
Phenotype
Polymorphism, Single Nucleotide
Principal Component Analysis
Risk

Abstract

Publication types

MeSH terms

Grants and funding