With the continuous advancement of vision language models (VLMs) technology, remarkable research achievements have emerged in the dermatology field, the fourth most prevalent human disease category. However, despite these advancements, VLM still faces "hallucination" in dermatological diagnosis, and due to the inherent complexity of dermatological conditions, existing tools offer relatively limited support for user comprehension. We propose SkinGEN, a diagnosis-to-generation framework that leverages the stable diffusion (SD) method to generate reference demonstrations from diagnosis results provided by VLM, thereby enhancing the visual explainability for users. Through extensive experiments with Low-Rank Adaptation (LoRA), we identify optimal strategies for skin condition image generation. We conduct a user study with 32 participants evaluating both the system performance and explainability. Results demonstrate that SkinGEN significantly improves users' comprehension of VLM predictions and fosters increased trust in the diagnostic process. This work paves the way for more transparent and user-centric VLM applications in dermatology and beyond.
翻译:随着视觉语言模型(VLM)技术的不断进步,在皮肤病学领域——第四大常见人类疾病类别中,已涌现出显著的研究成果。然而,尽管取得这些进展,VLM在皮肤病诊断中仍面临“幻觉”问题,并且由于皮肤病本身的固有复杂性,现有工具对用户理解的支持相对有限。我们提出SkinGEN,一种诊断到生成框架,该框架利用稳定扩散(SD)方法从VLM提供的诊断结果中生成参考示例,从而增强对用户的视觉可解释性。通过使用低秩自适应(LoRA)进行广泛实验,我们确定了皮肤状况图像生成的最优策略。我们开展了一项包含32名参与者的用户研究,评估系统性能与可解释性。结果表明,SkinGEN显著提升了用户对VLM预测的理解,并增强了对诊断过程的信任。这项工作为皮肤病学及其他领域中更透明、以用户为中心的VLM应用铺平了道路。