Models for AI-based skin cancer screening suffer a severe performance drop when shifting from expert dermoscopic (source) images to consumer-grade clinical (target) images, hindering real-world deployment. Existing domain adaptation methods often ignore crucial semantic invariants, such as clinical concepts. While new foundation models like MONET can provide this semantic information as dense, probabilistic scores, this metadata is unavailable at test time, creating a deployment paradox for practical image-only screening tools. We address this gap by proposing CoFiDA-M, a privileged information framework that learns from concepts at training time but deploys as an image-only model. Our method trains a teacher network that uses MONET concept probabilities to guide a FiLM modulator, transforming visual features into a semantically ``edited" feature space. A lightweight, image-only student is then trained to reproduce this edited representation, not just the teacher's final predictions. This distillation ``bakes" the clinical reasoning into the student's weights. On a challenging multi-dataset benchmark, our image-only student significantly outperforms state-of-the-art approaches, especially in melanoma recall. Our work provides a practical and generalizable framework for leveraging noisy, probabilistic metadata as privileged information, demonstrating strong cross-dataset robustness and potential for real-world deployment beyond dermatology. Implementation code is available at: https://github.com/mmu-dermatology-research/CoFiDA.git
翻译:基于AI的皮肤癌筛查模型在从专家皮肤镜(源域)图像迁移至消费者级临床(目标域)图像时,性能显著下降,阻碍了实际部署。现有域适应方法常忽略关键语义不变量(例如临床概念)。尽管像MONET这样的新型基础模型能以密集概率评分形式提供这类语义信息,但在测试时这些元数据不可获取,导致实用纯图像筛查工具面临部署矛盾。为解决此问题,我们提出CoFiDA-M——一种特权信息框架,在训练阶段学习概念,但以纯图像模型形式部署。该方法训练一个教师网络,利用MONET概念概率引导FiLM调制器,将视觉特征转换为语义“编辑”后的特征空间。随后训练轻量级纯图像学生网络,以复现该编辑表示(而非仅复现教师的最终预测)。这种蒸馏将临床推理“植入”学生网络权重。在挑战性多数据集基准测试中,我们的纯图像学生网络显著优于现有方法,尤其在黑色素瘤召回率方面表现突出。本研究为利用含噪概率元数据作为特权信息提供了实用且泛化的框架,展示了强大的跨数据集鲁棒性及皮肤科以外领域的实际部署潜力。实现代码见:https://github.com/mmu-dermatology-research/CoFiDA.git