Diagnostic accuracy and potential covariates of artificial intelligence for diagnosing orthopedic fractures: a systematic literature review and meta-analysis

Xiang Zhang; Yi Yang; Yi-Wei Shen; Ke-Rui Zhang; Ze-Kun Jiang; Li-Tai Ma; Chen Ding; Bei-Yu Wang; Yang Meng; Hao Liu

doi:10.1007/s00330-022-08956-4

Diagnostic accuracy and potential covariates of artificial intelligence for diagnosing orthopedic fractures: a systematic literature review and meta-analysis

Eur Radiol. 2022 Oct;32(10):7196-7216. doi: 10.1007/s00330-022-08956-4. Epub 2022 Jun 27.

Authors

Xiang Zhang¹, Yi Yang¹, Yi-Wei Shen¹, Ke-Rui Zhang¹, Ze-Kun Jiang², Li-Tai Ma¹, Chen Ding¹, Bei-Yu Wang¹, Yang Meng¹, Hao Liu³

Affiliations

¹ Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, No. 37 Guo Xue Rd, Chengdu, 610041, China.
² West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610000, China.
³ Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, No. 37 Guo Xue Rd, Chengdu, 610041, China. liuhao6304@126.com.

PMID: 35754091
DOI: 10.1007/s00330-022-08956-4

Abstract

Objectives: To systematically quantify the diagnostic accuracy and identify potential covariates affecting the performance of artificial intelligence (AI) in diagnosing orthopedic fractures.

Methods: PubMed, Embase, Web of Science, and Cochrane Library were systematically searched for studies on AI applications in diagnosing orthopedic fractures from inception to September 29, 2021. Pooled sensitivity and specificity and the area under the receiver operating characteristic curves (AUC) were obtained. This study was registered in the PROSPERO database prior to initiation (CRD 42021254618).

Results: Thirty-nine were eligible for quantitative analysis. The overall pooled AUC, sensitivity, and specificity were 0.96 (95% CI 0.94-0.98), 90% (95% CI 87-92%), and 92% (95% CI 90-94%), respectively. In subgroup analyses, multicenter designed studies yielded higher sensitivity (92% vs. 88%) and specificity (94% vs. 91%) than single-center studies. AI demonstrated higher sensitivity with transfer learning (with vs. without: 92% vs. 87%) or data augmentation (with vs. without: 92% vs. 87%), compared to those without. Utilizing plain X-rays as input images for AI achieved results comparable to CT (AUC 0.96 vs. 0.96). Moreover, AI achieved comparable results to humans (AUC 0.97 vs. 0.97) and better results than non-expert human readers (AUC 0.98 vs. 0.96; sensitivity 95% vs. 88%).

Conclusions: AI demonstrated high accuracy in diagnosing orthopedic fractures from medical images. Larger-scale studies with higher design quality are needed to validate our findings.

Key points: • Multicenter study design, application of transfer learning, and data augmentation are closely related to improving the performance of artificial intelligence models in diagnosing orthopedic fractures. • Utilizing plain X-rays as input images for AI to diagnose fractures achieved results comparable to CT (AUC 0.96 vs. 0.96). • AI achieved comparable results to humans (AUC 0.97 vs. 0.97) but was superior to non-expert human readers (AUC 0.98 vs. 0.96, sensitivity 95% vs. 88%) in diagnosing fractures.

Keywords: Artificial intelligence; Fractures, bone; Meta-analysis.

Publication types

Meta-Analysis
Review
Systematic Review

MeSH terms

Artificial Intelligence
Fractures, Bone* / diagnostic imaging
Humans
Multicenter Studies as Topic
Orthopedics*
ROC Curve
Sensitivity and Specificity

Grants and funding

21PJ037/Popularization and Application Project of the Sichuan Provincial Health Commission