Paper
Document
Download
Flag content
0

Effect of patient‐contextual skin images in human‐ and artificial intelligence‐based diagnosis of melanoma: Results from the 2020 SIIM‐ISIC melanoma classification challenge

0
TipTip
Save
Document
Download
Flag content

Abstract

Abstract Background While the high accuracy of reported AI tools for melanoma detection is promising, the lack of holistic consideration of the patient is often criticized. Along with medical history, a dermatologist would also consider intra‐patient nevi patterns, such that nevi that are different from others on a given patient are treated with suspicion. Objective To evaluate whether patient‐contextual lesion‐images improves diagnostic accuracy for melanoma in a dermoscopic image‐based AI competition and a human reader study. Methods An international online AI competition was held in 2020. The task was to classify dermoscopy images as melanoma or benign lesions. A multi‐source dataset of dermoscopy images grouped by patient were provided, and additional use of public datasets was permitted. Competitors were judged on area under the receiver operating characteristic (AUROC) on a private leaderboard. Concurrently, a human reader study was hosted using a subset of the test data. Participants gave their initial diagnosis of an index case (melanoma vs. benign) and were then presented with seven additional lesion‐images of that patient before giving a second prediction of the index case. Outcome measures were sensitivity and specificity. Results The top 50 of 3308 AI competition entries achieved AUROC scores ranging from 0.943 to 0.949. Few algorithms considered intra‐patient lesion patterns and instead most evaluated images independently. The median sensitivity and specificity of human readers before receiving contextual images were 60.0% and 86.7%, and after were 60.0% and 85.7%. Human and AI algorithm performance varied by image source. Conclusion This study provided an open‐source state‐of‐the‐art algorithm for melanoma detection that has been evaluated at multiple centres. Patient‐contextual images did not positively impact performance of AI algorithms or human readers. Providing seven contextual images and no total body image may have been insufficient to test the applicability of the intra‐patient lesion patterns.

Paper PDF

This paper's license is marked as closed access or non-commercial and cannot be viewed on ResearchHub. Visit the paper's external site.