Purpose: This study aims to develop a robust, large-scale deep learning model for medical image segmentation, leveraging self-supervised learning to overcome the limitations of supervised learning and data variability in clinical settings.
Methods and materials: We curated a substantial multi-center CT dataset for self-supervised pre-training using masked image modeling with sparse submanifold convolution. We designed a series of Sparse Submanifold U-Nets (SS-UNets) of varying sizes and performed self-supervised pre-training. We fine-tuned the SS-UNets on the TotalSegmentator dataset. The evaluation encompassed robustness tests on four unseen datasets and transferability assessments on three additional datasets.
Results: Our SS-UNets exhibited superior performance in comparison to state-of-the-art self-supervised methods, demonstrating higher Dice Similarity Coefficient (DSC) and Surface Dice Coefficient (SDC) metrics. SS-UNet-B achieved 84.3 % DSC and 88.0 % SDC in TotalSegmentator. We further demonstrated the scalability of our networks, with segmentation performance increasing with model size, demonstrated from 58 million to 1.4 billion parameters:4.6 % DSC and 3.2 % SDC improvement in TotalSegmentator from SS-UNet-B to SS-UNet-H.
Conclusions: We demonstrate the efficacy of self-supervised learning for medical image segmentation in the CT, MRI and PET domains. Our approach significantly reduces reliance on extensively labeled data, mitigates risks of overfitting, and enhances model generalizability. Future applications may allow accurate segmentation of organs and lesions across several imaging domains, potentially streamlining cancer detection and radiotherapy treatment planning.
Keywords: Medical image segmentation; Self-supervised learning; Sparse submanifold convolution.
Copyright © 2025 Elsevier B.V. All rights reserved.