Effect of Demographic Bias on Skin Lesion Classification

In this study, we evaluate the performance of skin lesion classification using ResNet-based convolutional models, focusing on the impact of demographic bias in training data, particularly variations in patient sex and age. We use linear programming to generate datasets with controlled demographic characteristics, allowing systematic investigation of bias effects. Three learning strategies are evaluated: a single-task model, a reinforcing multi-task model, and an adversarial learning scheme. Our sex-based analysis indicates that sex-specific training datasets optimise model performance. Notably, including male patients in the training data improved performance for the male subgroup, even in female-majority cases. Reinforcing and adversarial learning schemes narrowed or eliminated bias gaps in balanced and female-majority datasets. However, these strategies proved less effective in male-majority settings, where models continued to perform better for males than females. The two learning schemes showed marginal bias reduction compared to the baseline model in predominantly male patient populations. Age-based analysis demonstrates comparable baseline performance across the three model approaches, with performance declining across age categories. Younger groups consistently achieve the highest performance, regardless of training data distribution. Although balanced training yields optimal results for the youngest age category, performance decreases in older categories. We find that sex biases arise mainly from data imbalances, while age biases consistently favour younger groups regardless of distribution. These distinct mechanisms require targeted mitigation strategies. Additionally, cross-dataset validation on two external datasets revealed that domain shifts notably affect performance and patterns of demographic bias.

翻译：本研究评估了基于ResNet的卷积模型在皮肤病变分类中的性能，重点关注训练数据中人口统计偏差的影响，特别是患者性别和年龄的差异。我们采用线性规划生成具有可控人口统计特征的数据集，从而系统性地研究偏差效应。评估了三种学习策略：单任务模型、强化多任务模型和对抗学习方案。基于性别的分析表明，性别特异性训练数据集能优化模型性能。值得注意的是，在训练数据中包含男性患者可提升男性亚组的性能，即使在女性占多数的案例中也是如此。强化学习和对抗学习方案能够缩小或消除平衡数据集及女性占多数数据集中的偏差差距。然而，这些策略在男性占多数的场景下效果有限，此时模型对男性的表现仍优于女性。在以男性患者为主的群体中，这两种学习方案与基线模型相比仅表现出微弱的偏差减少效果。基于年龄的分析显示，三种模型在基线性能上具有可比性，但各年龄类别的性能均呈下降趋势。无论训练数据分布如何，年轻群体始终获得最佳性能。尽管平衡训练对最年轻年龄组产生最优结果，但年长年龄组的性能有所下降。我们发现性别偏差主要源于数据不平衡，而年龄偏差则无论数据分布如何均一致偏向年轻群体。这些不同的机制需要针对性的缓解策略。此外，在两个外部数据集上的跨数据集验证表明，领域迁移显著影响模型性能及人口统计偏差模式。