Evaluation of ChatGPT-4 for the detection of surgical site infections from electronic health records after colorectal surgery: A pilot diagnostic accuracy study

J Infect Public Health. 2025 Feb;18(2):102627. doi: 10.1016/j.jiph.2024.102627. Epub 2024 Dec 18.

Abstract

Background: Surveillance of surgical site infection (SSI) relies on manual methods that are time-consuming and prone to subjectivity. This study evaluates the diagnostic accuracy of ChatGPT for detecting SSI from electronic health records after colorectal surgery via comparison with the results of a nationwide surveillance programme.

Methods: This pilot, retrospective, multicentre analysis included 122 patients who underwent colorectal surgery. Patient records were reviewed by both manual surveillance and ChatGPT, which was tasked with identifying SSI and categorizing them as superficial, deep, or organ-space infections. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated. Receiver operating characteristic (ROC) curve analysis determined the model's diagnostic performance.

Results: ChatGPT achieved a sensitivity of 100 %, correctly identifying all SSIs detected by manual methods. The specificity was 54 %, indicating the presence of false positives. The PPV was 67 %, and the NPV was 100 %. The area under the ROC curve was 0.77, indicating good overall accuracy for distinguishing between SSI and non-SSI cases. Minor differences in outcomes were observed between colon and rectal surgeries, as well as between the hospitals participating in the study.

Conclusions: ChatGPT shows high sensitivity and good overall accuracy for detecting SSI. It appears to be a useful tool for initial screenings and for reducing manual review workload. The moderate specificity suggests a need for further refinement to reduce the rate of false positives. The integration of ChatGPT alongside electronic medical records, antibiotic consumption and imaging data results for real-time analysis may further improve the surveillance of SSI.

Clinicaltrials: gov Identifier: NCT06556017.

Keywords: Accuracy; Artificial intelligence; ChatGPT; Diagnosis; LLM; Large Language Model; NLP; Natural language processing; OpenAI; Sensitivity and specificity; Surgical site infection.

Publication types

  • Multicenter Study
  • Evaluation Study

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Colorectal Surgery* / adverse effects
  • Electronic Health Records*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Pilot Projects
  • Predictive Value of Tests
  • ROC Curve
  • Retrospective Studies
  • Sensitivity and Specificity*
  • Surgical Wound Infection* / diagnosis
  • Surgical Wound Infection* / epidemiology

Associated data

  • ClinicalTrials.gov/NCT06556017