Thus far, there have been no reported specific rules for systematically determining the appropriate augmented sample size to optimize model performance when conducting data augmentation. In this paper, we report on the feasibility of synthetic data augmentation using generative adversarial networks (GAN) by proposing an automation pipeline to find the optimal multiple of data augmentation to achieve the best deep learning-based diagnostic performance in a limited dataset. We used Waters' view radiographs for patients diagnosed with chronic sinusitis to demonstrate the method developed herein. We demonstrate that our approach produces significantly better diagnostic performance parameters than models trained using conventional data augmentation. The deep learning method proposed in this study could be implemented to assist radiologists in improving their diagnosis. Researchers and industry workers could overcome the lack of training data by employing our proposed automation pipeline approach in GAN-based synthetic data augmentation. This is anticipated to provide new means to overcome the shortage of graphic data for algorithm training.
© 2022. The Author(s).