Objective: The purpose of this study was to investigate the preoperative prediction of Cytokeratin (CK) 19 expression in patients with hepatocellular carcinoma (HCC) by machine learning-based ultrasomics.
Methods: We retrospectively analyzed 214 patients with pathologically confirmed HCC who received CK19 immunohistochemical staining. Through random stratified sampling (ratio, 8:2), patients from institutions I and II were divided into training dataset (n = 143) and test dataset (n = 36), and patients from institution III served as external validation dataset (n = 35). All gray-scale ultrasound images were preprocessed, and then the regions of interest were then manually segmented by two sonographers. A total of 1409 ultrasomics features were extracted from the original and derived images. Next, the intraclass correlation coefficient, variance threshold, mutual information, and embedded method were applied to feature dimension reduction. Finally, the clinical model, ultrasonics model, and combined model were constructed by eXtreme Gradient Boosting algorithm. Model performance was assessed by area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy.
Results: A total of 12 ultrasomics signatures were used to construct the ultrasomics models. In addition, 21 clinical features were used to construct the clinical model, including gender, age, Child-Pugh classification, hepatitis B surface antigen/hepatitis C virus antibody (positive/negative), cirrhosis (yes/no), splenomegaly (yes/no), tumor location, tumor maximum diameter, tumor number, alpha-fetoprotein, alanine aminotransferase, aspartate aminotransferase, alkaline phosphatase, glutamyl-transpeptidase, albumin, total bilirubin, conjugated bilirubin, creatinine, prothrombin time, fibrinogen, and international normalized ratio. The AUC of the ultrasomics model was 0.789 (0.621 - 0.907) and 0.787 (0.616 - 0.907) in the test and validation datasets, respectively. However, the performance of the combined model covering clinical features and ultrasomics signatures improved significantly. Additionally, the AUC (95% CI), sensitivity, specificity, and accuracy were 0.867 (0.712 - 0.957), 0.750, 0.875, 0.861, and 0.862 (0.703 - 0.955), 0.833, 0.862, and 0.857 in the test dataset and external validation dataset, respectively.
Conclusion: Ultrasomics signatures could be used to predict the expression of CK19 in HCC patients. The combination of clinical features and ultrasomics signatures showed excellent effects, which significantly improved prediction accuracy and robustness.
Keywords: cytokeratin 19 (CK19); hepatocellular carcinoma; machine learning; radiomics; ultrasonography.
Copyright © 2022 Zhang, Qi, Li, Ren, Liu, Mao, Li, Wu, Yang, Liu, Li, Duan and Zhang.