Objective: Many models for predicting various disease prognoses have achieved high performance without laboratory test results. However, whether laboratory test results can improve performance remains unclear. This study aimed to investigate whether laboratory test results improve the model performance for coronavirus disease 2019 (COVID-19).
Methods: Prediction models were developed using data from the electronic healthcare record database in Japan. Patients aged ≥18 years hospitalized for COVID-19 after February 11, 2020, were included. Their age, sex, comorbidities, laboratory test results, and number of days from February 11, 2020, were collected. We developed a logistic regression, XGBOOST, random forest, and neural network analysis and compared the performance with and without laboratory test results. The performance of predicting in-hospital death was evaluated using the area under the curve (AUC).
Results: Data from 8,288 hospitalized patients (females, 46.5%) were analyzed. The median patient age was 71 years. A total of 6,630 patients were included in the training dataset, and 312 (4.7%) died. In the logistic regression model, the area under the curve was 0.88 (95% confidence interval [CI] = 0.83-0.93) and 0.75 (95% CI = 0.68-0.81) with and without laboratory test results, respectively. The performance was not fundamentally different between the model types, and the laboratory test results improved the performance in all cases. The variables useful for prediction were blood urea nitrogen, albumin, and lactate dehydrogenase.
Conclusions: Laboratory test results, such as blood urea nitrogen, albumin, and lactate dehydrogenase levels, along with background information, helped estimate the prognosis of patients hospitalized for COVID-19.
Keywords: Database; electronic health records; machine learning; model performance; prognosis.