Predicting cancer content in tiles of lung squamous cell carcinoma tumours with validation against pathologist labels

Comput Biol Med. 2024 Dec 4:185:109489. doi: 10.1016/j.compbiomed.2024.109489. Online ahead of print.

Abstract

Background: A growing body of research is using deep learning to explore the relationship between treatment biomarkers for lung cancer patients and cancer tissue morphology on digitized whole slide images (WSIs) of tumour resections. However, these WSIs typically contain non-cancer tissue, introducing noise during model training. As digital pathology models typically start with splitting WSIs into tiles, we propose a model that can be used to exclude non-cancer tiles from the WSIs of lung squamous cell carcinoma (SqCC) tumours.

Methods: We obtained 116 WSIs of tumours from 35 different centres from the Cancer Genome Atlas. A pathologist completed or reviewed cancer contours in four regions of interest (ROIs) within each WSIs. We then split the ROIs into tiles labelled with the percentage of cancer tissue within them and trained VGG16 to predict this value, and then we calculated regression error. To measure classification performance and visualize the classification results, we thresholded the predictions and calculated the area under the receiver operating characteristic curve (AUC).

Results: The model's median regression error was 4% with a standard deviation of 35%. At a cancer threshold of 50%, the model had an AUC of 0.83. False positives tended to be in tissues that surround cancer, tiles with <50% cancer, and areas with high immune activity. False negatives tended to be microtomy defects.

Conclusions: With further validation for each specific research application, the model we describe in this paper could facilitate the development of more effective research pipelines for predicting treatment biomarkers for lung SqCC.

Keywords: Convolutional neural network; H&E staining; Lung cancer; Lung squamous cell carcinoma; Tissue detection.