Machine learning-generated decision boundaries for prediction and exploration of patient-specific quality assurance failures in stereotactic radiosurgery plans

Med Phys. 2022 Mar;49(3):1955-1963. doi: 10.1002/mp.15454. Epub 2022 Feb 4.

Abstract

Introduction: Stereotactic radiosurgery (SRS) is a form of radiotherapy treatment during which high radiation dose is delivered in a single or few fractions. These treatments require highly conformal plans with steep dose gradients, which can result in an increase in plan complexity prompting the need for stringent pretreatment patient-specific quality assurance (QA) measurements to ensure the planned and measured dose distributions agree within clinical standards. Complexity scores and machine learning (ML) techniques may help with prediction of QA outcomes; however interpretability and usability of those results continues to be an area of study. This study investigates the use of plan complexity metrics as input for an ML model to allow for prediction of QA outcomes for SRS plans as measured via three-dimension (3D) phantom dose verification. Explorations into interpretability and predictive ability, as well as a prospective in-clinic implementation using the resulting model were performed.

Methods: Four hundred ninety-eight plans (1571 volumetric modulated arc therapy arcs) were processed via in-house script to generate several complexity scores. 3D phantom dose verification measurement results were extracted and classified as pass or failure (with failures defined as below 95% voxel agreement passing 3%/1-mm gamma criteria with 10% threshold,) and 1472 of the arcs were split into training and testing sets, with 99 arcs as a sequential holdout set. A z-score scaler was trained on the training set and used to scale all other sets. Variations of multi-leaf collimator (MLC) leaf movement variability, aperture complexity, and leaf size, and monitor unit (MU) at control point weighted target area scores were used as input to a support vector classifier to generate a series of 1D, 2D, and 5D decision boundaries. The best performing 5D model was then used within a prospective in-clinic study providing predictions to physicists prior to ordering 3D phantom dose verification measurements for 38 patient plans (112 arcs). The decision to order 3D phantom dose verification measurements was recorded before and after prediction.

Results: Best performing 1D threshold and 2D prediction models with best performance produced a QA failure recall and QA passing recall of 1.00 and 0.55, and 0.82 and 0.82, respectively. Best performing 5D prediction model produced a QA failure recall (sensitivity) of 1.00 and QA passing recall (specificity) of 0.72. This model was then used within a prospective in-clinic study providing predictions to physicists prior to ordering 3D phantom dose verification measurements and achieved a QA failure recall of 1.00 and QA passing recall of 0.58. The decision to order 3D phantom dose verification measurements was recorded before and after measurement. A single initially unidentified failing plan of the prospective cohort was successfully predicted to fail by the model.

Conclusion: Implementation of complexity score-based prediction models for SRS would allow for support of a clinician's decision to reduce time spent performing QA measurements and avoid patient treatment delays (i.e., in case of QA failure).

Keywords: machine learning; stereotactic radiosurgery.

MeSH terms

  • Humans
  • Machine Learning
  • Prospective Studies
  • Radiosurgery* / methods
  • Radiotherapy Dosage
  • Radiotherapy Planning, Computer-Assisted / methods
  • Radiotherapy, Intensity-Modulated* / methods