z-logo
open-access-imgOpen Access
Challenges in Implementing Endoscopic Artificial Intelligence: The Impact of Real‐World Imaging Conditions on Barrett's Neoplasia Detection
Author(s) -
Jong M. R.,
Jaspers T. J. M.,
Kusters C. H. J.,
Jukema J. B.,
Eijck van Heslinga R. A. H.,
Fockens K. N.,
Boers T. G. W.,
Visser L. S.,
Putten J. A.,
Sommen F.,
With P. H.,
Groof A. J.,
Bergman J. J.
Publication year - 2025
Publication title -
united european gastroenterology journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.667
H-Index - 35
eISSN - 2050-6414
pISSN - 2050-6406
DOI - 10.1002/ueg2.12760
Subject(s) - medicine , robustness (evolution) , artificial intelligence , image quality , test set , area under curve , machine learning , computer science , image (mathematics) , biochemistry , chemistry , pharmacokinetics , gene
ABSTRACT Background Endoscopic deep learning systems are often developed using high‐quality imagery obtained from expert centers. Therefore, they may underperform in community hospitals where image quality is more heterogeneous. Objective This study aimed to quantify the performance degradation of a computer aided detection system for Barrett's neoplasia, trained on expert images, when exposed to more heterogeneous imaging conditions representative of daily clinical practice. Further, we evaluated strategies to mitigate this performance loss. Methods We developed a computer aided detection system using 1011 high‐quality, expert‐acquired images from 373 Barrett's patients. We assessed its performance on high, moderate and low image quality test sets, each containing images from an independent group of 117 Barrett's patients. These test sets reflected the varied image quality of routine patient care and contained artefacts such as insufficient mucosal cleaning and inadequate esophageal expansion. We then applied three methods to improve the algorithm's robustness to data heterogeneity: inclusion of more diverse training data, domain‐specific pretraining and architectural optimization. Results The computer aided detection system, when trained exclusively on high‐quality data, achieved area under the curve (AUC), sensitivity and specificity scores of 83%, 85% and 67% on the high quality test set. AUC and sensitivity were significantly lower with 80% ( p  < 0.001) and 62% ( p  = 0.002) on the moderate‐quality and 71% ( p  > 0.001) and 47% ( p  = 0.002) on the low‐quality test set. Incorporating robustness‐enhancing strategies significantly improved the AUC, sensitivity and specificity to 92% ( p  = 0.004), 88% ( p  = 0.84) and 81% ( p  = 0.003) on the high‐quality test set, 93% ( p  = 0.006), 86% ( p  = 0.01) and 83% ( p  = 0.09) on the moderate‐quality test set and 84% ( p  = 0.001), 78% ( p  = 0.002) and 77% ( p  = 0.23) on the low‐quality test set. Conclusion Endoscopic deep learning systems trained solely on high‐quality images may not perform well when exposed to heterogeneous imagery, as found in routine practice. Robustness‐enhancing training strategies can increase the likelihood of successful clinical implementation.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom