Dermatological diseases are among the most common disorders worldwide. This paper presents the first study of the interpretability and imbalanced semi-supervised learning of the multiclass intelligent skin diagnosis framework (ISDL) using 58,457 skin images with 10,857 unlabeled samples. Pseudo-labelled samples from minority classes have a higher probability at each iteration of class-rebalancing self-training, thereby promoting the utilization of unlabeled samples to solve the class imbalance problem. Our ISDL achieved a promising performance with an accuracy of 0.979, sensitivity of 0.975, specificity of 0.973, macro-F1 score of 0.974 and area under the receiver operating characteristic curve (AUC) of 0.999 for multi-label skin disease classification. The Shapley Additive explanation (SHAP) method is combined with our ISDL to explain how the deep learning model makes predictions. This finding is consistent with the clinical diagnosis. We also proposed a sampling distribution optimisation strategy to select pseudo-labelled samples in a more effective manner using ISDLplus. Furthermore, it has the potential to relieve the pressure placed on professional doctors, as well as help with practical issues associated with a shortage of such doctors in rural areas.
翻译:皮肤病是全球最常见的疾病之一。本文首次研究了多类智能皮肤诊断框架(ISDL)的可解释性和不平衡半监督学习,该研究使用了58,457张皮肤图像,其中包含10,857个未标记样本。在类别再平衡自训练的每次迭代中,来自少数类别的伪标记样本具有更高的选择概率,从而促进了利用未标记样本来解决类别不平衡问题。我们的ISDL在多标签皮肤病分类中取得了优异的性能,准确度为0.979,灵敏度为0.975,特异性为0.973,宏平均F1分数为0.974,接收者操作特征曲线下面积(AUC)为0.999。我们将Shapley加性解释(SHAP)方法与我们的ISDL相结合,以解释深度学习模型如何进行预测。这一发现与临床诊断一致。我们还提出了一种采样分布优化策略,通过ISDLplus以更有效的方式选择伪标记样本。此外,该框架有望减轻专业医生的压力,并有助于解决农村地区此类医生短缺的实际问题。