Background: To better understand the different clinical phenotypes across the disease spectrum in patients with COVID-19 using an unsupervised machine learning clustering approach.
Materials and methods: A population-based retrospective study was conducted utilizing demographics, clinical characteristics, comorbidities, and clinical outcomes of 7,606 COVID-19-positive patients on admission to public hospitals in Hong Kong in the year 2020. An unsupervised machine learning clustering was used to explore this large cohort.
Results: Four clusters of differing clinical phenotypes based on data at initial admission was derived in which 86.6% of the deceased cases were aggregated in one of the clusters without prior knowledge of their clinical outcomes. Other distinctive clinical characteristics of this cluster were old age and high concurrent comorbidities as well as laboratory characteristics of lower hemoglobin/hematocrit levels, higher neutrophil, C-reactive protein, lactate dehydrogenase, and creatinine levels. The clinical patterns captured by the cluster analysis was validated on other temporally distinct cohorts in 2021. The phenotypes aligned with existing literature.
Conclusion: The study demonstrated the usefulness of unsupervised machine learning techniques with the potential to uncover latent clinical phenotypes. It could serve as a more robust classification for patient triaging and patient-tailored treatment strategies.
Keywords: COVID-19; clinical phenotypes; clustering; laboratory test; machine learning.
Copyright © 2022 Lau, Ng, Kwok, Tsia, Sin, Lam and Vardhanabhuti.