Deep Learning Models for Anatomical Location Classification in Esophagogastroduodenoscopy Images and Videos: A Quantitative Evaluation with Clinical Data

Seong Min Kang; Gi Pyo Lee; Young Jae Kim; Kyoung Oh Kim; Kwang Gi Kim

doi:10.3390/diagnostics14212360

Deep Learning Models for Anatomical Location Classification in Esophagogastroduodenoscopy Images and Videos: A Quantitative Evaluation with Clinical Data

Diagnostics (Basel). 2024 Oct 23;14(21):2360. doi: 10.3390/diagnostics14212360.

Authors

Seong Min Kang¹, Gi Pyo Lee², Young Jae Kim³, Kyoung Oh Kim⁴, Kwang Gi Kim⁵

Affiliations

¹ Medical Device R&D Center, Gachon University Gil Hospital, Incheon 21565, Republic of Korea.
² Department of Biomedical Engineering, Gachon University, Seongnam-si 13120, Republic of Korea.
³ Gachon Biomedical & Convergence Institute, Gachon University Gil Medical Center, Incheon 21565, Republic of Korea.
⁴ Department of Internal Medicine, Gachon University Gil Hospital, Incheon 21565, Republic of Korea.
⁵ Department of Biomedical Engineering, Gil Medical Center, Gachon University College of Medicine, Incheon 21565, Republic of Korea.

Abstract

Background/objectives: During gastroscopy, accurately identifying the anatomical locations of the gastrointestinal tract is crucial for developing diagnostic aids, such as lesion localization and blind spot alerts.

Methods: This study utilized a dataset of 31,403 still images from 1000 patients with normal findings to annotate the anatomical locations within the images and develop a classification model. The model was then applied to videos of 20 esophagogastroduodenoscopy procedures, where it was validated for real-time location prediction. To address instability of predictions caused by independent frame-by-frame assessment, we implemented a hard-voting-based post-processing algorithm that aggregates results from seven consecutive frames, improving the overall accuracy.

Results: Among the tested models, InceptionV3 demonstrated superior performance for still images, achieving an F1 score of 79.79%, precision of 80.57%, and recall of 80.08%. For video data, the InceptionResNetV2 model performed best, achieving an F1 score of 61.37%, precision of 73.08%, and recall of 57.21%. These results indicate that the deep learning models not only achieved high accuracy in position recognition for still images but also performed well on video data. Additionally, the post-processing algorithm effectively stabilized the predictions, highlighting its potential for real-time endoscopic applications.

Conclusions: This study demonstrates the feasibility of predicting the gastrointestinal tract locations during gastroscopy and suggests a promising path for the development of advanced diagnostic aids to assist clinicians. Furthermore, the location information generated by this model can be leveraged in future technologies, such as automated report generation and supporting follow-up examinations for patients.

Keywords: classification; deep learning; esophagogastroduodenoscopy; location inference; minimum shooting points.

Grants and funding

S0252-21-1001/National IT Industry Promotion Agency (NIPA)