VarSight: prioritizing clinically reported variants with binary classification algorithms

James M Holt; Brandon Wilk; Camille L Birch; Donna M Brown; Manavalan Gajapathy; Alexander C Moss; Nadiya Sosonkina; Melissa A Wilk; Julie A Anderson; Jeremy M Harris; Jacob M Kelly; Fariba Shaterferdosian; Angelina E Uno-Antonison; Arthur Weborg; Undiagnosed Diseases Network; Elizabeth A Worthey

doi:10.1186/s12859-019-3026-8

VarSight: prioritizing clinically reported variants with binary classification algorithms

BMC Bioinformatics. 2019 Oct 15;20(1):496. doi: 10.1186/s12859-019-3026-8.

Authors

James M Holt¹, Brandon Wilk², Camille L Birch², Donna M Brown², Manavalan Gajapathy², Alexander C Moss², Nadiya Sosonkina^{2

3}, Melissa A Wilk², Julie A Anderson², Jeremy M Harris², Jacob M Kelly², Fariba Shaterferdosian², Angelina E Uno-Antonison², Arthur Weborg²; Undiagnosed Diseases Network; Elizabeth A Worthey²

Collaborators

Undiagnosed Diseases Network:
Maria T Acosta, Margaret Adam, David R Adams, Pankaj B Agrawal, Mercedes E Alejandro, Patrick Allard, Justin Alvey, Laura Amendola, Ashley Andrews, Euan A Ashley, Mahshid S Azamian, Carlos A Bacino, Guney Bademci, Eva Baker, Ashok Balasubramanyam, Dustin Baldridge, Jim Bale, Michael Bamshad, Deborah Barbouth, Gabriel F Batzli, Pinar Bayrak-Toydemir, Anita Beck, Alan H Beggs, Gill Bejerano, Hugo J Bellen, Jimmy Bennet, Beverly Berg-Rood, Raphael Bernier, Jonathan A Bernstein, Gerard T Berry, Anna Bican, Stephanie Bivona, Elizabeth Blue, John Bohnsack, Carsten Bonnenmann, Devon Bonner, Lorenzo Botto, Lauren C Briere, Elly Brokamp, Elizabeth A Burke, Lindsay C Burrage, Manish J Butte, Peter Byers, John Carey, Olveen Carrasquillo, Ta Chen Peter Chang, Sirisak Chanprasert, Hsiao-Tuan Chao, Gary D Clark, Terra R Coakley, Laurel A Cobban, Joy D Cogan, F Sessions Cole, Heather A Colley, Cynthia M Cooper, Heidi Cope, William J Craigen, Michael Cunningham, Precilla D'Souza, Hongzheng Dai, Surendra Dasari, Mariska Davids, Jyoti G Dayal, Esteban C Dell'Angelica, Shweta U Dhar, Katrina Dipple, Daniel Doherty, Naghmeh Dorrani, Emilie D Douine, David D Draper, Laura Duncan, Dawn Earl, David J Eckstein, Lisa T Emrick, Christine M Eng, Cecilia Esteves, Tyra Estwick, Liliana Fernandez, Carlos Ferreira, Elizabeth L Fieg, Paul G Fisher, Brent L Fogel, Irman Forghani, Laure Fresard, William A Gahl, Ian Glass, Rena A Godfrey, Katie Golden-Grant, Alica M Goldman, David B Goldstein, Alana Grajewski, Catherine A Groden, Andrea L Gropman, Sihoun Hahn, Rizwan Hamid, Neil A Hanchard, Nichole Hayes, Frances High, Anne Hing, Fuki M Hisama, Ingrid A Holm, Jason Hom, Martha Horike-Pyne, Alden Huang, Yong Huang, Rosario Isasi, Fariha Jamal, Gail P Jarvik, Jeffrey Jarvik, Suman Jayadev, Yong-Hui Jiang, Jean M Johnston, Lefkothea Karaviti, Emily G Kelley, Dana Kiley, Isaac S Kohane, Jennefer N Kohler, Deborah Krakow, Donna M Krasnewich, Susan Korrick, Mary Koziura, Joel B Krier, Seema R Lalani, Byron Lam, Christina Lam, Brendan C Lanpher, Ian R Lanza, C Christopher Lau, Kimberly LeBlanc, Brendan H Lee, Hane Lee, Roy Levitt, Richard A Lewis, Sharyn A Lincoln, Pengfei Liu, Xue Zhong Liu, Nicola Longo, Sandra K Loo, Joseph Loscalzo, Richard L Maas, Ellen F Macnamara, Calum A MacRae, Valerie V Maduro, Marta M Majcherska, May Christine V Malicdan, Laura A Mamounas, Teri A Manolio, Rong Mao, Kenneth Maravilla, Thomas C Markello, Ronit Marom, Gabor Marth, Beth A Martin, Martin G Martin, Julian A Martínez-Agosto, Shruti Marwaha, Jacob McCauley, Allyn McConkie-Rosell, Colleen E McCormack, Alexa T McCray, Heather Mefford, J Lawrence Merritt, Matthew Might, Ghayda Mirzaa, Eva Morava-Kozicz, Paolo M Moretti, Marie Morimoto, John J Mulvihill, David R Murdock, Avi Nath, Stan F Nelson, John H Newman, Sarah K Nicholas, Deborah Nickerson, Donna Novacic, Devin Oglesbee, James P Orengo, Laura Pace, Stephen Pak, J Carl Pallais, Christina Gs Palmer, Jeanette C Papp, Neil H Parker, John A Phillips Iii, Jennifer E Posey, John H Postlethwait, Lorraine Potocki, Barbara N Pusey, Aaron Quinlan, Wendy Raskind, Archana N Raja, Genecee Renteria, Chloe M Reuter, Lynette Rives, Amy K Robertson, Lance H Rodan, Jill A Rosenfeld, Robb K Rowley, Maura Ruzhnikov, Ralph Sacco, Jacinda B Sampson, Susan L Samson, Mario Saporta, C Ron Scott, Judy Schaechter, Timothy Schedl, Kelly Schoch, Daryl A Scott, Lisa Shakachite, Prashant Sharma, Vandana Shashi, Jimann Shin, Rebecca Signer, Catherine H Sillari, Edwin K Silverman, Janet S Sinsheimer, Kathy Sisco, Kevin S Smith, Lilianna Solnica-Krezel, Rebecca C Spillmann, Joan M Stoler, Nicholas Stong, Jennifer A Sullivan, Angela Sun, Shirley Sutton, David A Sweetser, Virginia Sybert, Holly K Tabor, Cecelia P Tamburro, Queenie K-G Tan, Mustafa Tekin, Fred Telischi, Willa Thorson, Cynthia J Tifft, Camilo Toro, Alyssa A Tran, Tiina K Urv, Matt Velinder, Dave Viskochil, Tiphanie P Vogel, Colleen E Wahl, Stephanie Wallace, Nicole M Walley, Chris A Walsh, Melissa Walker, Jennifer Wambach, Jijun Wan, Lee-Kai Wang, Michael F Wangler, Patricia A Ward, Daniel Wegner, Mark Wener, Monte Westerfield, Matthew T Wheeler, Anastasia L Wise, Lynne A Wolfe, Jeremy D Woods, Shinya Yamamoto, John Yang, Amanda J Yoon, Guoyun Yu, Diane B Zastrow, Chunli Zhao, Stephan Zuchner

Affiliations

¹ HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806, USA. jholt@hudsonalpha.org.
² HudsonAlpha Institute for Biotechnology, Software Development and Informatics, 601 Genome Way, Huntsville, 35806, USA.
³ University of Alabama at Birmingham, Department of Genetics, 720 20th Street South, Birmingham, 35294, USA.

Abstract

Background: When applying genomic medicine to a rare disease patient, the primary goal is to identify one or more genomic variants that may explain the patient's phenotypes. Typically, this is done through annotation, filtering, and then prioritization of variants for manual curation. However, prioritization of variants in rare disease patients remains a challenging task due to the high degree of variability in phenotype presentation and molecular source of disease. Thus, methods that can identify and/or prioritize variants to be clinically reported in the presence of such variability are of critical importance.

Methods: We tested the application of classification algorithms that ingest variant annotations along with phenotype information for predicting whether a variant will ultimately be clinically reported and returned to a patient. To test the classifiers, we performed a retrospective study on variants that were clinically reported to 237 patients in the Undiagnosed Diseases Network.

Results: We treated the classifiers as variant prioritization systems and compared them to four variant prioritization algorithms and two single-measure controls. We showed that the trained classifiers outperformed all other tested methods with the best classifiers ranking 72% of all reported variants and 94% of reported pathogenic variants in the top 20.

Conclusions: We demonstrated how freely available binary classification algorithms can be used to prioritize variants even in the presence of real-world variability. Furthermore, these classifiers outperformed all other tested methods, suggesting that they may be well suited for working with real rare disease patient datasets.

Keywords: Binary classification; Clinical genome sequencing; Variant prioritization.

MeSH terms

Algorithms*
Genetic Diseases, Inborn / diagnosis*
Genetic Diseases, Inborn / genetics
Genetic Predisposition to Disease
Genome, Human
Genomics / methods*
Humans
Mutation*
Phenotype
Polymorphism, Genetic
Precision Medicine / methods
Rare Diseases / diagnosis*
Rare Diseases / genetics
Retrospective Studies
Sequence Analysis, DNA / methods
Software

Abstract

MeSH terms

Grants and funding