Rationale: Differential diagnosis of pleural effusion is challenging in clinical practice. Objectives: We aimed to develop a machine learning model to classify the five common causes of pleural effusions. Methods: This retrospective study collected 49 features from clinical information, blood, and pleural fluid of adult patients who underwent diagnostic thoracentesis between October 2013 and December 2018. Pleural effusions were classified into the following five categories: transudative, malignant, parapneumonic, tuberculous, and other. The performance of five different classifiers, including multinomial logistic regression, support vector machine, random forest, extreme gradient boosting, and light gradient boosting machine (LGB), was evaluated in terms of accuracy and area under the receiver operating characteristic curve through fivefold cross-validation. Hybrid feature selection was applied to determine the most relevant features for classifying pleural effusion. Results: We analyzed 2,253 patients (training set, n = 1,459; validation set, n = 365; extra-validation set, n = 429) and found that the LGB model achieved the best performance in both validation and extra-validation sets. After feature selection, the accuracy of the LGB model with the selected 18 features was equivalent to that with all 49 features (mean ± standard deviation): 0.818 ± 0.012 and 0.777 ± 0.007 in the validation and extra-validation sets, respectively. The model's mean area under the receiver operating characteristic curve was as high as 0.930 ± 0.042 and 0.916 ± 0.044 in the validation and extra-validation sets, respectively. In our model, pleural lactate dehydrogenase, protein, and adenosine deaminase levels were the most important factors for classifying pleural effusions. Conclusions: Our LGB model showed satisfactory performance for differential diagnosis of the common causes of pleural effusions. This model could provide clinicians with valuable information regarding the major differential diagnoses of pleural diseases.
Keywords: differential diagnosis; machine learning; pleural effusion.