Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks

Eur J Cancer. 2019 Sep:119:57-65. doi: 10.1016/j.ejca.2019.06.013. Epub 2019 Aug 14.

Abstract

Background: Recently, convolutional neural networks (CNNs) systematically outperformed dermatologists in distinguishing dermoscopic melanoma and nevi images. However, such a binary classification does not reflect the clinical reality of skin cancer screenings in which multiple diagnoses need to be taken into account.

Methods: Using 11,444 dermoscopic images, which covered dermatologic diagnoses comprising the majority of commonly pigmented skin lesions commonly faced in skin cancer screenings, a CNN was trained through novel deep learning techniques. A test set of 300 biopsy-verified images was used to compare the classifier's performance with that of 112 dermatologists from 13 German university hospitals. The primary end-point was the correct classification of the different lesions into benign and malignant. The secondary end-point was the correct classification of the images into one of the five diagnostic categories.

Findings: Sensitivity and specificity of dermatologists for the primary end-point were 74.4% (95% confidence interval [CI]: 67.0-81.8%) and 59.8% (95% CI: 49.8-69.8%), respectively. At equal sensitivity, the algorithm achieved a specificity of 91.3% (95% CI: 85.5-97.1%). For the secondary end-point, the mean sensitivity and specificity of the dermatologists were at 56.5% (95% CI: 42.8-70.2%) and 89.2% (95% CI: 85.0-93.3%), respectively. At equal sensitivity, the algorithm achieved a specificity of 98.8%. Two-sided McNemar tests revealed significance for the primary end-point (p < 0.001). For the secondary end-point, outperformance (p < 0.001) was achieved except for basal cell carcinoma (on-par performance).

Interpretation: Our findings show that automated classification of dermoscopic melanoma and nevi images is extendable to a multiclass classification problem, thus better reflecting clinical differential diagnoses, while still outperforming dermatologists at a significant level (p < 0.001).

Keywords: Artificial intelligence; Melanoma; Skin cancer; Skin cancer screening.

Publication types

  • Multicenter Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biopsy
  • Dermatologists / statistics & numerical data*
  • Dermoscopy / methods*
  • Diagnosis, Differential
  • Female
  • Hospitals, University
  • Humans
  • Male
  • Melanoma / diagnostic imaging*
  • Melanoma / pathology
  • Neural Networks, Computer*
  • Nevus / diagnostic imaging*
  • Nevus / pathology
  • Sensitivity and Specificity
  • Skin Neoplasms / classification
  • Skin Neoplasms / diagnostic imaging*
  • Skin Neoplasms / pathology
  • Surveys and Questionnaires