The potential of deep neural networks in skin lesion classification has already been demonstrated to be on-par if not superior to the dermatologists diagnosis. However, the performance of these models usually deteriorates when the test data differs significantly from the training data (i.e. domain shift). This concerning limitation for models intended to be used in real-world skin lesion classification tasks poses a risk to patients. For example, different image acquisition systems or previously unseen anatomical sites on the patient can suffice to cause such domain shifts. Mitigating the negative effect of such shifts is therefore crucial, but developing effective methods to address domain shift has proven to be challenging. In this study, we carry out an in-depth analysis of eight different unsupervised domain adaptation methods to analyze their effectiveness in improving generalization for dermoscopic datasets. To ensure robustness of our findings, we test each method on a total of ten distinct datasets, thereby covering a variety of possible domain shifts. In addition, we investigated which factors in the domain shifted datasets have an impact on the effectiveness of domain adaptation methods. Our findings show that all of the eight domain adaptation methods result in improved AUPRC for the majority of analyzed datasets. Altogether, these results indicate that unsupervised domain adaptations generally lead to performance improvements for the binary melanoma-nevus classification task regardless of the nature of the domain shift. However, small or heavily imbalanced datasets lead to a reduced conformity of the results due to the influence of these factors on the methods performance.
翻译:深度神经网络在皮肤病变分类中的潜力已被证明可与甚至超越皮肤科医生的诊断水平。然而,当测试数据与训练数据存在显著差异(即领域偏移)时,这些模型的性能通常会下降。这一令人担忧的局限性对旨在用于实际皮肤病变分类任务的模型构成了患者风险。例如,不同的图像采集系统或患者身上未见过的解剖部位都足以引发此类领域偏移。因此,缓解这种偏移的负面影响至关重要,但开发有效应对领域偏移的方法已被证明具有挑战性。在本研究中,我们对八种不同的无监督领域自适应方法进行了深入分析,以评估它们在提高皮肤镜数据集泛化能力方面的有效性。为确保研究结果的稳健性,我们在总共十个不同的数据集上测试每种方法,从而涵盖了各种可能的领域偏移。此外,我们探究了领域偏移数据集中哪些因素会影响领域自适应方法的有效性。研究结果表明,所有八种领域自适应方法在大多数分析数据集上都能提高AUPRC。总体而言,这些结果说明,无论领域偏移的性质如何,无监督领域自适应通常能改善二元黑色素瘤-痣分类任务的性能。然而,由于小数据集或高度不平衡数据集对这些方法性能的影响,会导致结果的一致性降低。