Deep learning-based diagnostic system has demonstrated potential in classifying skin cancer conditions when labeled training example are abundant. However, skin lesion analysis often suffers from a scarcity of labeled data, hindering the development of an accurate and reliable diagnostic system. In this work, we leverage multiple skin lesion datasets and investigate the feasibility of various unsupervised domain adaptation (UDA) methods in binary and multi-class skin lesion classification. In particular, we assess three UDA training schemes: single-, combined-, and multi-source. Our experiment results show that UDA is effective in binary classification, with further improvement being observed when imbalance is mitigated. In multi-class task, its performance is less prominent, and imbalance problem again needs to be addressed to achieve above-baseline accuracy. Through our quantitative analysis, we find that the test error of multi-class tasks is strongly correlated with label shift, and feature-level UDA methods have limitations when handling imbalanced datasets. Finally, our study reveals that UDA can effectively reduce bias against minority groups and promote fairness, even without the explicit use of fairness-focused techniques.
翻译:基于深度学习的诊断系统在标注训练样本充足的情况下,已展现出对皮肤癌病情分类的潜力。然而,皮肤病变分析常面临标注数据稀缺的困境,阻碍了准确可靠诊断系统的发展。本文利用多个皮肤病变数据集,探究了多种无监督领域自适应(UDA)方法在二分类与多分类皮肤病变分类任务中的可行性。具体而言,我们评估了三种UDA训练方案:单源、组合源与多源。实验结果表明,UDA在二分类任务中有效,且在缓解类别不平衡后性能进一步提升;而在多分类任务中,其性能提升不够显著,且仍需解决不平衡问题以实现超越基线的准确率。通过定量分析,我们发现多分类任务的测试误差与标签偏移高度相关,而特征级UDA方法在处理不平衡数据集时存在局限性。最终,本研究表明,即使未显式采用公平性技术,UDA也能有效减少对少数群体的偏见并促进公平性。