KMeansGraphMIL: A Weakly Supervised Multiple Instance Learning Model for Predicting Colorectal Cancer Tumor Mutational Burden

Am J Pathol. 2025 Jan 10:S0002-9440(25)00002-1. doi: 10.1016/j.ajpath.2024.12.008. Online ahead of print.

Abstract

Colorectal cancer (CRC) is one of the top three most lethal malignancies worldwide, posing a significant threat to human health. Recently proposed immunotherapy checkpoint blockade treatments have proven effective for CRC, but their use depends on measuring specific biomarkers in patients. Among these biomarkers, tumor mutational burden (TMB) has emerged as a novel indicator, traditionally requiring next-generation sequencing for measurement, which is time-consuming, labor intensive, and costly. To provide an economical and rapid way to predict patients' TMB, we propose the KMeansGraphMIL model based on weakly supervised multiple-instance learning. Compared with previous weakly supervised multiple-instance learning models, KMeansGraphMIL leverages both the similarity of image patch feature vectors and the spatial relationships between patches. This approach improves the model's area under the receiver operating characteristic curve to 0.8334 and significantly increases the recall to 0.7556. Thus, we present an economical and rapid framework for predicting CRC TMB, offering the potential for physicians to quickly develop treatment plans and saving patients substantial time and money.