Objectives: To systematically review the comparative statistical performance (discrimination and /or calibration) of prognostic clinical prediction models (CPMs) and clinician judgment (CJ).
Study design and setting: We conducted a systematic review of observational studies in PubMed, Medline, Embase, and CINAHL. Eligible studies reported direct statistical comparison between prognostic CPMs and CJ. Risk of bias was assessed using the PROBAST tool.
Results: We identified 41 studies, most with high risk of bias (39 studies). Of these, 41 studies, 39 examined discrimination and 12 studies assessed calibration. Prognostic CPMs had a median AUC of 0.73 (IQR, 0.62 - 0.81), while CJ had a median AUC of 0.71 (IQR, 0.62 - 0.81). 29 studies provided 124 discrimination metrics useful for comparative analysis. Among these, 58 (46.7%) found no significant difference between prognostic CPMs and CJ (p > 0.05); 31 (25%) favored prognostic CPMs, and 35 (28.2%) favored CJ. Four studies compared calibration, showing better performance on prognostic CPMs.
Conclusions: In many instances CJ frequently demonstrates comparable or superior discrimination compared to prognostic CPMs, although models outperform CJ on calibration. Studies comparing performance of prognostic CPMs and CJ require large improvements in reporting.
Keywords: Area Under Curve; Calibration; Clinical Decision Rules; Clinical Reasoning; Prognosis; Systematic Review.
Copyright © 2023 Elsevier Inc. All rights reserved.