Improved Diagnostic Accuracy of Thyroid Fine-Needle Aspiration Cytology with Artificial Intelligence Technology

Thyroid. 2024 Jun;34(6):723-734. doi: 10.1089/thy.2023.0384.

Abstract

Background: Artificial intelligence (AI) is increasingly being applied in pathology and cytology, showing promising results. We collected a large dataset of whole slide images (WSIs) of thyroid fine-needle aspiration cytology (FNA), incorporating z-stacking, from institutions across the nation to develop an AI model. Methods: We conducted a multicenter retrospective diagnostic accuracy study using thyroid FNA dataset from the Open AI Dataset Project that consists of digitalized images samples collected from 3 university hospitals and 215 Korean institutions through extensive quality check during the case selection, scanning, labeling, and reviewing process. Multiple z-layer images were captured using three different scanners and image patches were extracted from WSIs and resized after focus fusion and color normalization. We pretested six AI models, determining Inception ResNet v2 as the best model using a subset of dataset, and subsequently tested the final model with total datasets. Additionally, we compared the performance of AI and cytopathologists using randomly selected 1031 image patches and reevaluated the cytopathologists' performance after reference to AI results. Results: A total of 10,332 image patches from 306 thyroid FNAs, comprising 78 malignant (papillary thyroid carcinoma) and 228 benign from 86 institutions were used for the AI training. Inception ResNet v2 achieved highest accuracy of 99.7%, 97.7%, and 94.9% for training, validation, and test dataset, respectively (sensitivity 99.9%, 99.6%, and 100% and specificity 99.6%, 96.4%, and 90.4% for training, validation, and test dataset, respectively). In the comparison between AI and human, AI model showed higher accuracy and specificity than the average expert cytopathologists beyond the two-standard deviation (accuracy 99.71% [95% confidence interval (CI), 99.38-100.00%] vs. 88.91% [95% CI, 86.99-90.83%], sensitivity 99.81% [95% CI, 99.54-100.00%] vs. 87.26% [95% CI, 85.22-89.30%], and specificity 99.61% [95% CI, 99.23-99.99%] vs. 90.58% [95% CI, 88.80-92.36%]). Moreover, after referring to the AI results, the performance of all the experts (accuracy 96%, 95%, and 96%, respectively) and the diagnostic agreement (from 0.64 to 0.84) increased. Conclusions: These results suggest that the application of AI technology to thyroid FNA cytology may improve the diagnostic accuracy as well as intra- and inter-observer variability among pathologists. Further confirmatory research is needed.

Keywords: artificial intelligence; cytology; deep learning; fine-needle aspiration; thyroid; thyroid neoplasms.

Publication types

  • Multicenter Study

MeSH terms

  • Artificial Intelligence*
  • Biopsy, Fine-Needle / methods
  • Cytodiagnosis
  • Humans
  • Reproducibility of Results
  • Retrospective Studies
  • Sensitivity and Specificity
  • Thyroid Cancer, Papillary / diagnosis
  • Thyroid Cancer, Papillary / pathology
  • Thyroid Gland / pathology
  • Thyroid Neoplasms* / diagnosis
  • Thyroid Neoplasms* / pathology
  • Thyroid Nodule / diagnosis
  • Thyroid Nodule / pathology