z-logo
open-access-imgOpen Access
Generalizability and usefulness of artificial intelligence for skin cancer diagnostics: An algorithm validation study
Author(s) -
Ternov Niels K.,
Christensen Anders N.,
Kampen Peter J. T.,
Als Gustav,
Vestergaard Tine,
Konge Lars,
Tolsgaard Martin,
Hölmich Lisbet R.,
Guitera Pascale,
Chakera Annette H.,
Hannemose Morten R.
Publication year - 2022
Publication title -
jeadv clinical practice
Language(s) - English
Resource type - Journals
ISSN - 2768-6566
DOI - 10.1002/jvc2.59
Subject(s) - triage , overfitting , skin cancer , convolutional neural network , generalizability theory , artificial intelligence , data set , melanoma diagnosis , computer science , artificial neural network , skin lesion , machine learning , medicine , algorithm , melanoma , cancer , dermatology , mathematics , emergency medicine , statistics , cancer research
Background Artificial intelligence can be trained to outperform dermatologists in image‐based skin cancer diagnostics. However, the networks' sensitivity to biases and overfitting may hamper their clinical applicability. Objectives The aim of this study was to explain the potential consequences of implementing convolutional neural networks for stand‐alone melanoma diagnostics and skin lesion triage. Methods In this algorithm validation study on retrospective data, we reproduced and evaluated the performance of state‐of‐the‐art artificial intelligence (convolutional neural networks) for skin cancer diagnostics. The networks were trained on 25,331 annotated dermoscopic skin lesion images from an open‐source data set (ISIC‐2019) and tested using a novel data set (AISC‐2021) consisting of 26,591 annotated dermoscopic skin lesion images. We tested the trained algorithms' ability to generalize to new data and their diagnostic performance in two simulations (melanoma diagnostics and skin lesion triage). Results The trained algorithms performed significantly less accurate diagnostics on images of nevi, melanomas and actinic keratoses from the AISC‐2021 data set than the ISIC‐2019 data set ( p  < 0.003). Almost one‐third (31.1%) of the melanomas were misclassified during the melanoma diagnostics simulation, irrespective of their Breslow thickness. Furthermore, the algorithms marked 92.7% of the lesions ‘suspicious’ during the triage simulation, which yielded a triage sensitivity and specificity of 99.7% and 8.2%, respectively. Conclusions Although state‐of‐the‐art artificial intelligence outperforms dermatologists on image‐based skin lesion classification within an artificial setting, additional data and technological advances are needed before clinical implementation.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here