Dermatological diseases are among the most common disorders worldwide. This paper presents the first study of the interpretability and imbalanced semi-supervised learning of the multiclass intelligent skin diagnosis framework (ISDL) using 58,457 skin images with 10,857 unlabeled samples. Pseudo-labelled samples from minority classes have a higher probability at each iteration of class-rebalancing self-training, thereby promoting the utilization of unlabeled samples to solve the class imbalance problem. Our ISDL achieved a promising performance with an accuracy of 0.979, sensitivity of 0.975, specificity of 0.973, macro-F1 score of 0.974 and area under the receiver operating characteristic curve (AUC) of 0.999 for multi-label skin disease classification. The Shapley Additive explanation (SHAP) method is combined with our ISDL to explain how the deep learning model makes predictions. This finding is consistent with the clinical diagnosis. We also proposed a sampling distribution optimisation strategy to select pseudo-labelled samples in a more effective manner using ISDLplus. Furthermore, it has the potential to relieve the pressure placed on professional doctors, as well as help with practical issues associated with a shortage of such doctors in rural areas.
翻译:皮肤病是全球最常见的疾病之一。本文首次研究了基于58,457张皮肤图像(含10,857个未标注样本)的多类别智能皮肤诊断框架(ISDL)的可解释性与不平衡半监督学习。在类别重平衡自训练的每次迭代中,少数类别的伪标注样本具有更高的采样概率,从而促进利用未标注样本解决类别不平衡问题。我们的ISDL在多标签皮肤疾病分类中取得了优异的性能:准确率0.979、灵敏度0.975、特异度0.973、宏F1分数0.974以及受试者工作特征曲线下面积(AUC)0.999。我们将Shapley加法解释(SHAP)方法与ISDL相结合,以阐释深度学习模型的预测机制,该结果与临床诊断一致。我们还提出了一种采样分布优化策略,通过ISDLplus更有效地选择伪标注样本。此外,该框架具有减轻专业医生工作压力的潜力,并有助于解决农村地区医生短缺的实际问题。