
Systematic Comparison of Heatmapping Techniques in Deep Learning in the Context of Diabetic Retinopathy Lesion Detection
Author(s) -
Toon Van Craenendonck,
Bart Elen,
Nele Gerrits,
Patrick De Boever
Publication year - 2020
Publication title -
translational vision science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.508
H-Index - 21
ISSN - 2164-2591
DOI - 10.1167/tvst.9.2.64
Subject(s) - diabetic retinopathy , context (archaeology) , artificial intelligence , ground truth , medicine , computer science , deep learning , metric (unit) , pattern recognition (psychology) , ophthalmology , diabetes mellitus , paleontology , biology , endocrinology , operations management , economics
Purpose Heatmapping techniques can support explainability of deep learning (DL) predictions in medical image analysis. However, individual techniques have been mainly applied in a descriptive way without an objective and systematic evaluation. We investigated comparative performances using diabetic retinopathy lesion detection as a benchmark task. Methods The Indian Diabetic Retinopathy Image Dataset (IDRiD) publicly available database contains fundus images of diabetes patients with pixel level annotations of diabetic retinopathy (DR) lesions, the ground truth for this study. Three in advance trained DL models (ResNet50, VGG16 or InceptionV3) were used for DR detection in these images. Next, explainability was visualized with each of the 10 most used heatmapping techniques. The quantitative correspondence between the output of a heatmap and the ground truth was evaluated with the Explainability Consistency Score (ECS), a metric between 0 and 1, developed for this comparative task. Results In case of the overall DR lesions detection, the ECS ranged from 0.21 to 0.51 for all model/heatmapping combinations. The highest score was for VGG16+Grad-CAM (ECS = 0.51; 95% confidence interval [CI]: [0.46; 0.55]). For individual lesions, VGG16+Grad-CAM performed best on hemorrhages and hard exudates. ResNet50+SmoothGrad performed best for soft exudates and ResNet50+Guided Backpropagation performed best for microaneurysms. Conclusions Our empirical evaluation on the IDRiD database demonstrated that the combination DL model/heatmapping affects explainability when considering common DR lesions. Our approach found considerable disagreement between regions highlighted by heatmaps and expert annotations. Translational Relevance We warrant a more systematic investigation and analysis of heatmaps for reliable explanation of image-based predictions of deep learning models.