Applying Ancestry and Sex Computation as a Quality Control Tool in Targeted Next-Generation Sequencing

Patrick C Mathias; Emily H Turner; Sheena M Scroggins; Stephen J Salipante; Noah G Hoffman; Colin C Pritchard; Brian H Shirts

doi:10.1093/ajcp/aqv098

Applying Ancestry and Sex Computation as a Quality Control Tool in Targeted Next-Generation Sequencing

Am J Clin Pathol. 2016 Mar;145(3):308-15. doi: 10.1093/ajcp/aqv098. Epub 2016 Feb 20.

Authors

Patrick C Mathias¹, Emily H Turner², Sheena M Scroggins², Stephen J Salipante², Noah G Hoffman², Colin C Pritchard², Brian H Shirts²

Affiliations

¹ From the Department of Laboratory Medicine, University of Washington, Seattle. pcm10@uw.edu.
² From the Department of Laboratory Medicine, University of Washington, Seattle.

PMID: 27124912
DOI: 10.1093/ajcp/aqv098

Abstract

Objectives: To apply techniques for ancestry and sex computation from next-generation sequencing (NGS) data as an approach to confirm sample identity and detect sample processing errors.

Methods: We combined a principal component analysis method with k-nearest neighbors classification to compute the ancestry of patients undergoing NGS testing. By combining this calculation with X chromosome copy number data, we determined the sex and ancestry of patients for comparison with self-report. We also modeled the sensitivity of this technique in detecting sample processing errors.

Results: We applied this technique to 859 patient samples with reliable self-report data. Our k-nearest neighbors ancestry screen had an accuracy of 98.7% for patients reporting a single ancestry. Visual inspection of principal component plots was consistent with self-report in 99.6% of single-ancestry and mixed-ancestry patients. Our model demonstrates that approximately two-thirds of potential sample swaps could be detected in our patient population using this technique.

Conclusions: Patient ancestry can be estimated from NGS data incidentally sequenced in targeted panels, enabling an inexpensive quality control method when coupled with patient self-report.

Keywords: Molecular diagnostics; Next-generation sequencing; Quality control.

MeSH terms

DNA Copy Number Variations
Diagnostic Errors*
Education, Medical, Continuing
Female
High-Throughput Nucleotide Sequencing / standards*
Humans
Male
Models, Theoretical*
Pathology, Molecular
Principal Component Analysis
Quality Control
Racial Groups / genetics*
Self Report
Sensitivity and Specificity
Sequence Analysis, DNA / standards
Sex Factors
Specimen Handling