Predicting SARS-CoV-2 infection among hemodialysis patients using multimodal data

Juntao Duan; Hanmo Li; Xiaoran Ma; Hanjie Zhang; Rachel Lasky; Caitlin K Monaghan; Sheetal Chaudhuri; Len A Usvyat; Mengyang Gu; Wensheng Guo; Peter Kotanko; Yuedong Wang

doi:10.3389/fneph.2023.1179342

Predicting SARS-CoV-2 infection among hemodialysis patients using multimodal data

Front Nephrol. 2023 Jun 2:3:1179342. doi: 10.3389/fneph.2023.1179342. eCollection 2023.

Authors

Juntao Duan¹, Hanmo Li¹, Xiaoran Ma¹, Hanjie Zhang², Rachel Lasky³, Caitlin K Monaghan³, Sheetal Chaudhuri^{3

4}, Len A Usvyat³, Mengyang Gu¹, Wensheng Guo⁵, Peter Kotanko^{2

6}, Yuedong Wang¹

Affiliations

¹ Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, United States.
² Renal Research Institute, New York NY, United States.
³ Fresenius Medical Care, Global Medical Office, Waltham, MA, United States.
⁴ Division of Nephrology, Maastricht University Medical Center, Maastricht, Netherlands.
⁵ Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia PA, United States.
⁶ Icahn School of Medicine at Mount Sinai, New York NY, United States.

Abstract

Background: The coronavirus disease 2019 (COVID-19) pandemic has created more devastation among dialysis patients than among the general population. Patient-level prediction models for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection are crucial for the early identification of patients to prevent and mitigate outbreaks within dialysis clinics. As the COVID-19 pandemic evolves, it is unclear whether or not previously built prediction models are still sufficiently effective.

Methods: We developed a machine learning (XGBoost) model to predict during the incubation period a SARS-CoV-2 infection that is subsequently diagnosed after 3 or more days. We used data from multiple sources, including demographic, clinical, treatment, laboratory, and vaccination information from a national network of hemodialysis clinics, socioeconomic information from the Census Bureau, and county-level COVID-19 infection and mortality information from state and local health agencies. We created prediction models and evaluated their performances on a rolling basis to investigate the evolution of prediction power and risk factors.

Result: From April 2020 to August 2020, our machine learning model achieved an area under the receiver operating characteristic curve (AUROC) of 0.75, an improvement of over 0.07 from a previously developed machine learning model published by Kidney360 in 2021. As the pandemic evolved, the prediction performance deteriorated and fluctuated more, with the lowest AUROC of 0.6 in December 2021 and January 2022. Over the whole study period, that is, from April 2020 to February 2022, fixing the false-positive rate at 20%, our model was able to detect 40% of the positive patients. We found that features derived from local infection information reported by the Centers for Disease Control and Prevention (CDC) were the most important predictors, and vaccination status was a useful predictor as well. Whether or not a patient lives in a nursing home was an effective predictor before vaccination, but became less predictive after vaccination.

Conclusion: As found in our study, the dynamics of the prediction model are frequently changing as the pandemic evolves. County-level infection information and vaccination information are crucial for the success of early COVID-19 prediction models. Our results show that the proposed model can effectively identify SARS-CoV-2 infections during the incubation period. Prospective studies are warranted to explore the application of such prediction models in daily clinical practice.

Keywords: COVID-19; XGBoost; hemodialysis; machine learning; prediction.

Grants and funding

R01 DK130067/DK/NIDDK NIH HHS/United States