The YOLO series of object detection algorithms, including YOLOv4 and YOLOv5, have shown superior performance in various medical diagnostic tasks, surpassing human ability in some cases. However, their black-box nature has limited their adoption in medical applications that require trust and explainability of model decisions. To address this issue, visual explanations for AI models, known as visual XAI, have been proposed in the form of heatmaps that highlight regions in the input that contributed most to a particular decision. Gradient-based approaches, such as Grad-CAM [1], and non-gradient-based approaches, such as Eigen-CAM [2], are applicable to YOLO models and do not require new layer implementation. This paper evaluates the performance of Grad-CAM and Eigen-CAM on the VinDrCXR Chest X-ray Abnormalities Detection dataset [3] and discusses the limitations of these methods for explaining model decisions to data scientists.
Keywords: Visual xai; unreliability; yolo.