Artificial Intelligence (AI) in skin disease diagnosis has improved significantly, but a major concern is that these models frequently show biased performance across subgroups, especially regarding sensitive attributes such as skin color. To address these issues, we propose a novel generative AI-based framework, namely, Dermatology Diffusion Transformer (DermDiT), which leverages text prompts generated via Vision Language Models and multimodal text-image learning to generate new dermoscopic images. We utilize large vision language models to generate accurate and proper prompts for each dermoscopic image which helps to generate synthetic images to improve the representation of underrepresented groups (patient, disease, etc.) in highly imbalanced datasets for clinical diagnoses. Our extensive experimentation showcases the large vision language models providing much more insightful representations, that enable DermDiT to generate high-quality images. Our code is available at https://github.com/Munia03/DermDiT
翻译:皮肤疾病诊断中的人工智能(AI)已取得显著进展,但一个主要问题是这些模型在不同亚组间(尤其是涉及肤色等敏感属性时)常表现出有偏的性能。为解决这些问题,我们提出了一种基于生成式AI的新型框架——皮肤病学扩散变换器(DermDiT),该框架利用通过视觉语言模型生成的文本提示及多模态文本-图像学习来生成新的皮肤镜图像。我们采用大型视觉语言模型为每张皮肤镜图像生成准确恰当的文本提示,这有助于生成合成图像以改善临床诊断中高度不平衡数据集中代表性不足群体(患者、疾病等)的覆盖。大量实验表明,大型视觉语言模型能提供更具洞察力的表征,使DermDiT能够生成高质量图像。我们的代码公开于 https://github.com/Munia03/DermDiT。