The number of publications in endoscopic journals that present deep learning applications has risen tremendously over the past years. Deep learning has shown great promise for automated detection, diagnosis and quality improvement in endoscopy. However, the interdisciplinary nature of these works has undoubtedly made it more difficult to estimate their value and applicability. In this review, the pitfalls and common misconducts when training and validating deep learning systems are discussed and some practical guidelines are proposed that should be taken into account when acquiring data and handling it to ensure an unbiased system that will generalize for application in routine clinical practice. Finally, some considerations are presented to ensure correct validation and comparison of AI systems.
Keywords: Benchmarking; Deep learning; Reproducibility of results; Supervised machine learning.
Copyright © 2020 Elsevier Ltd. All rights reserved.