Background: The integration of artificial intelligence into medicine has attracted increasing attention in recent years. ChatGPT has emerged as a promising tool for delivering evidence-based recommendations in various clinical domains. However, the application of ChatGPT to physical therapy for musculoskeletal conditions has yet to be investigated.
Methods: Thirty clinical questions related to spinal, lower extremity, and upper extremity conditions were quired to ChatGPT-4. Responses were assessed for accuracy against clinical practice guidelines by two reviewers. Intra- and inter-rater reliability were measured using Fleiss' kappa (k).
Results: ChatGPT's responses were consistent with CPG recommendations for 80% of the questions. Performance was highest for upper extremity conditions (100%) and lowest for spinal conditions (60%), with a moderate performance for lower extremity conditions (87%). Intra-rater reliability was good (k = 0.698 and k = 0.631 for the two reviewers), and inter-rater reliability was very good (k = 0.847).
Conclusion: ChatGPT demonstrates promise as a supplementary decision-making support tool for physical therapy, with good accuracy and reliability in aligning with clinical practice guideline recommendations. Further research is needed to evaluate its performance across broader scenarios and refine its clinical applicability.
Keywords: Clinical practice guideline; Evidence-based practice; Large language model; Natural language processing; Physiotherapy.
© 2025. The Author(s) under exclusive licence to Biomedical Engineering Society.