Premium
Collective human intelligence outperforms artificial intelligence in a skin lesion classification task
Author(s) -
Winkler Julia K.,
Sies Katharina,
Fink Christine,
Toberer Ferdinand,
Enk Alexander,
Abassi Mohamed Souhayel,
Fuchs Tobias,
Blum Andreas,
Stolz Wilhelm,
CorasStepanek Brigitte,
Cipic Robert,
Guther Stefanie,
Haenssle Holger A.
Publication year - 2021
Publication title -
jddg: journal der deutschen dermatologischen gesellschaft
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.463
H-Index - 60
eISSN - 1610-0387
pISSN - 1610-0379
DOI - 10.1111/ddg.14510
Subject(s) - convolutional neural network , artificial intelligence , medical diagnosis , diagnostic accuracy , skin lesion , computer science , majority rule , multiclass classification , medicine , pattern recognition (psychology) , dermatology , pathology , radiology , support vector machine
Summary Background and objectives Convolutional neural networks (CNN) enable accurate diagnosis of medical images and perform on or above the level of individual physicians. Recently, collective human intelligence (CoHI) was shown to exceed the diagnostic accuracy of individuals. Thus, diagnostic performance of CoHI (120 dermatologists) versus individual dermatologists versus two state‐of‐the‐art CNN was investigated. Patients and Methods Cross‐sectional reader study with presentation of 30 clinical cases to 120 dermatologists. Six diagnoses were offered and votes collected via remote voting devices (quizzbox®, Quizzbox Solutions GmbH, Stuttgart, Germany). Dermatoscopic images were classified by a binary and multiclass CNN (FotoFinder Systems GmbH, Bad Birnbach, Germany). Three sets of diagnostic classifications were scored against ground truth: (1) CoHI, (2) individual dermatologists, and (3) CNN. Results CoHI attained a significantly higher accuracy [95 % confidence interval] (80.0 % [62.7 %–90.5 %]) than individual dermatologists (75.7 % [73.8 %–77.5 %]) and CNN (70.0 % [52.1 %–83.3 %]; all P < 0.001) in binary classifications. Moreover, CoHI achieved a higher sensitivity (82.4 % [59.0 %–93.8 %]) and specificity (76.9 % [49.7 %–91.8 %]) than individual dermatologists (sensitivity 77.8 % [75.3 %–80.2 %], specificity 73.0 % [70.6 %–75.4 %]) and CNN (sensitivity 70.6 % [46.9 %–86.7 %], specificity 69.2 % [42.4 %–87.3 %]). The diagnostic accuracy of CoHI was superior to that of individual dermatologists ( P < 0.001) in multiclass evaluation, with the accuracy of the latter comparable to multiclass CNN. Conclusions Our analysis revealed that the majority vote of an interconnected group of dermatologists (CoHI) outperformed individuals and CNN in a demanding skin lesion classification task.