AI-based diagnoses have demonstrated dermatologist-level performance in classifying skin cancer. However, such systems are prone to under-performing when tested on data from minority groups that lack sufficient representation in the training sets. Although data collection and annotation offer the best means for promoting minority groups, these processes are costly and time-consuming. Prior works have suggested that data from majority groups may serve as a valuable information source to supplement the training of diagnosis tools for minority groups. In this work, we propose an effective diffusion-based augmentation framework that maximizes the use of rich information from majority groups to benefit minority groups. Using groups with different skin types as a case study, our results show that the proposed framework can generate synthetic images that improve diagnostic results for the minority groups, even when there is little or no reference data from these target groups. The practical value of our work is evident in medical imaging analysis, where under-diagnosis persists as a problem for certain groups due to insufficient representation.
翻译:基于人工智能的诊断在皮肤癌分类方面已展现出与皮肤科医生相当的性能。然而,当在训练集中代表性不足的少数群体数据上进行测试时,此类系统往往表现不佳。尽管数据收集与标注为提升少数群体的代表性提供了最佳途径,但这些过程成本高昂且耗时。先前的研究表明,多数群体的数据可作为宝贵的信息来源,以补充针对少数群体的诊断工具训练。在本研究中,我们提出了一种有效的基于扩散的增强框架,该框架最大限度地利用来自多数群体的丰富信息以惠及少数群体。以不同皮肤类型群体作为案例研究,我们的结果表明,即使目标群体的参考数据极少或完全没有,所提出的框架也能生成合成图像,从而改善对少数群体的诊断结果。本研究的实用价值在医学影像分析领域尤为明显,在该领域中,由于代表性不足,某些群体持续面临诊断不足的问题。