A pre-trained visual-language model, contrastive language-image pre-training (CLIP), successfully accomplishes various downstream tasks with text prompts, such as finding images or localizing regions within the image. Despite CLIP's strong multi-modal data capabilities, it remains limited in specialized environments, such as medical applications. For this purpose, many CLIP variants-i.e., BioMedCLIP, and MedCLIP-SAMv2-have emerged, but false positives related to normal regions persist. Thus, we aim to present a simple yet important goal of reducing false positives in medical anomaly detection. We introduce a Contrastive LAnguage Prompting (CLAP) method that leverages both positive and negative text prompts. This straightforward approach identifies potential lesion regions by visual attention to the positive prompts in the given image. To reduce false positives, we attenuate attention on normal regions using negative prompts. Extensive experiments with the BMAD dataset, including six biomedical benchmarks, demonstrate that CLAP method enhances anomaly detection performance. Our future plans include developing an automated fine prompting method for more practical usage.
翻译:预训练的视觉语言模型——对比语言图像预训练(CLIP)——通过文本提示成功完成了多种下游任务,例如查找图像或定位图像内的区域。尽管CLIP具备强大的多模态数据处理能力,但在医疗应用等专业环境中仍存在局限。为此,许多CLIP变体(如BioMedCLIP和MedCLIP-SAMv2)相继出现,但与正常区域相关的假阳性问题依然存在。因此,我们旨在提出一个简单而重要的目标:减少医学异常检测中的假阳性。我们引入了一种对比语言提示(CLAP)方法,该方法同时利用正向和负向文本提示。这种直接方法通过对给定图像中正向提示的视觉关注来识别潜在病变区域。为减少假阳性,我们使用负向提示来减弱对正常区域的关注。基于BMAD数据集(包含六个生物医学基准)的大量实验表明,CLAP方法提升了异常检测性能。我们未来的计划包括开发自动化精细提示方法以实现更实际的应用。