Evaluating Artificial Intelligence Applications in Clinical Settings
Author(s) -
Elaine O. Nsoesie
Publication year - 2018
Publication title -
jama network open
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.278
H-Index - 39
ISSN - 2574-3805
DOI - 10.1001/jamanetworkopen.2018.2658
Subject(s) - artificial intelligence , psychology , computer science
Artificial intelligence (AI)–based systems have been shown to reliably recognize cardiovascular disease risk1 and diagnose conditions such as diabetic retinopathy2,3 and melanoma4 from medical images. These advances in image-based medical diagnosis have been widely publicized in the media and similar tools have been approved by the US Food and Drug Administration (FDA). In April of 2018, the FDA approved the first AI device to provide screening decision for a disease (ie, diabetic retinopathy) without assisted interpretation by a clinician.5 Kanagasingam et al6 evaluated a similar approach—a convolutional neural network algorithm, a deep learning method—for identifying diabetic retinopathy from medical images in a primary care setting in Midland, Western Australia. Their system correctly classified the 2 severe cases captured in the data (193 patients with diabetes), and misclassified 15 (false-positives) individuals as having diabetic retinopathy. The number of patients needing to be reviewed by an ophthalmologist was less than 10%. These findings demonstrate the potential for these systems to support efficient and improved care, while also highlighting the need for rigorous evaluation in clinical settings. Most deep learning algorithms require large data sets for training, usually consisting of thousands or millions of images. Medical data sets of this magnitude are typically expensive to produce and annotate. Individuals developing AI diagnostic tools might therefore rely on whatever data are available to produce initial results. However, certain deficiencies might not be evident until an AI diagnostic tool is evaluated in a clinical setting because data sets used for training have been carefully curated to remove imperfect data samples. For example, a system trained only on highquality images might provide incorrect diagnosis when classifying low-quality images or images affected by sheen or other defects present in real-world clinical settings, as observed by Kanagasingam and colleagues.6 Also, evaluation of AI diagnostic tools in clinical settings will enable researchers and clinicians to ascertain its potential effect on patient outcomes and health care decisions. Problems identified can be corrected prior to deployment. Findings from these evaluations should also be published in peer reviewed literature to monitor progress and allow for comparison of different systems. There is currently a dearth of published studies on the evaluation of AI diagnostic tools used in clinical settings. Of course, evaluating an AI diagnostic tool in a clinical setting does not guarantee generalizability of findings. The article by Kanagasingam et al6 is based on a single algorithm evaluated in a single health care location. The authors acknowledge this limitation. Moving from good initial performance to a device that can be used across varied clinical settings might not be feasible in some cases. For example, AI diagnostic tools hold significant potential for improving health care in low-resource settings and regions where adequate medical infrastructure is lacking. However, observations made in a clinical context in a health care setting in a developed country might not be reproducible in a low-resource setting. This highlights the fact that different geographic regions and clinical settings might require tailored tools. Furthermore, training an AI diagnostic tool on a single data set or clinical setting might lead to outcomes that are dependent on a particular type of device used in capturing images or overrepresentation of a particular symptom or demographic group.7 Although multiple studies have demonstrated that AI can perform on par with clinical experts in disease diagnosis, most of these tools have not been evaluated in controlled clinical studies to assess their effect on health care decisions and patient outcomes. While AI tools have the potential + Related article
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom