Deep learning has been reported to achieve high performances in the detection of skin cancer, yet many challenges regarding the reproducibility of results and biases remain. This study is a replication (different data, same analysis) of a previous study on Alzheimer's disease detection, which studied the robustness of logistic regression (LR) and convolutional neural networks (CNN) across patient sexes. We explore sex bias in skin cancer detection, using the PAD-UFES-20 dataset with LR trained on handcrafted features reflecting dermatological guidelines (ABCDE and the 7-point checklist), and a pre-trained ResNet-50 model. We evaluate these models in alignment with the replicated study: across multiple training datasets with varied sex composition to determine their robustness. Our results show that both the LR and the CNN were robust to the sex distribution, but the results also revealed that the CNN had a significantly higher accuracy (ACC) and area under the receiver operating characteristics (AUROC) for male patients compared to female patients. The data and relevant scripts to reproduce our results are publicly available (https://github.com/ nikodice4/Skin-cancer-detection-sex-bias).
翻译:深度学习在皮肤癌检测中已被报道可实现高性能,但关于结果可重复性和偏差的诸多挑战依然存在。本研究是对先前阿尔茨海默病检测研究的复现(不同数据,相同分析方法),该研究探讨了逻辑回归(LR)与卷积神经网络(CNN)在不同患者性别间的鲁棒性。我们使用PAD-UFES-20数据集,通过基于皮肤病学指南(ABCDE法则和7点检查表)手工设计特征训练的逻辑回归,以及预训练的ResNet-50模型,探究皮肤癌检测中的性别偏差。我们按照复现研究的框架评估这些模型:在具有不同性别构成的多组训练数据集上测试其鲁棒性。结果表明,LR与CNN对性别分布均表现出鲁棒性,但同时也发现CNN对男性患者的准确率(ACC)和受试者工作特征曲线下面积(AUROC)显著高于女性患者。用于重现结果的数据及相关脚本已公开(https://github.com/nikodice4/Skin-cancer-detection-sex-bias)。