Premium
Comparison of error rates between four pretrained DenseNet convolutional neural network models and 13 board‐certified veterinary radiologists when evaluating 15 labels of canine thoracic radiographs
Author(s) -
AdrienMaxence Hespel,
Emilie Boissady,
Alois De La Comble,
Michelle Acierno,
Kate Alexander,
Mylene Auger,
David Biller,
Marie de Swarte,
Jason Fuerst,
Eric Green,
Séamus Hoey,
Kevin Koernig,
Alison Lee,
Megan MacLellan,
Hester McAllister,
Jaime Rechy Jr,
Zhu Xiaojuan,
Micaela Zarelli,
Federica Morandi
Publication year - 2022
Publication title -
veterinary radiology and ultrasound
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.541
H-Index - 60
eISSN - 1740-8261
pISSN - 1058-8183
DOI - 10.1111/vru.13069
Subject(s) - medicine , radiography , gold standard (test) , convolutional neural network , institutional review board , radiology , veterinary medicine , artificial intelligence , surgery , computer science
Convolutional neural networks (CNNs) are commonly used as artificial intelligence (AI) tools for evaluating radiographs, but published studies testing their performance in veterinary patients are currently lacking. The purpose of this retrospective, secondary analysis, diagnostic accuracy study was to compare the error rates of four CNNs to the error rates of 13 veterinary radiologists for evaluating canine thoracic radiographs using an independent gold standard. Radiographs acquired at a referral institution were used to evaluate the four CNNs sharing a common architecture. Fifty radiographic studies were selected at random. The studies were evaluated independently by three board‐certified veterinary radiologists for the presence or absence of 15 thoracic labels, thus creating the gold standard through the majority rule. The labels included “cardiovascular,” “pulmonary,” “pleural,” “airway,” and “other categories.” The error rates for each of the CNNs and for 13 additional board‐certified veterinary radiologists were calculated on those same studies. There was no statistical difference in the error rates among the four CNNs for the majority of the labels. However, the CNN's training method impacted the overall error rate for three of 15 labels. The veterinary radiologists had a statistically lower error rate than all four CNNs overall and for five labels (33%). There was only one label (“esophageal dilation”) for which two CNNs were superior to the veterinary radiologists. Findings from the current study raise numerous questions that need to be addressed to further develop and standardize AI in the veterinary radiology environment and to optimize patient care.