EasyLens: A Training-Free Plug-and-Play Subtle-Lesion Representation Amplifier for Medical Vision-Language Models

Medical vision-language models (VLMs) have shown increasing potential for clinical image interpretation, including lesion detection and report generation. However, their practical utility remains limited by insufficient sensitivity to subtle lesions, whose visual evidence is often sparse, low-contrast, and embedded within complex anatomical context. As local visual tokens are aggregated, these weak lesion cues can become underrepresented in global image representations, making them difficult for medical VLMs to recognize. Existing efforts to improve lesion sensitivity mainly rely on medical-domain vision-encoder pre-training, clinical-term-guided alignment, or trainable pathological representation enhancement. Although effective, these approaches usually require additional training or model-specific adaptation and may overfit to particular disease morphologies, limiting their applicability to frozen medical VLMs. To address these limitations, we propose EasyLens, a training-free plug-and-play subtle-lesion representation amplifier for medical VLMs. EasyLens first constructs EasyBank, a pathology-anatomy prototype space that provides lesion-related prototypes and anatomy-aware normal references for comparing suspicious patches against both pathological and normal anatomical patterns. To avoid blindly amplifying normal tissues, EasyTag selects lesion-relevant patches through counterfactual prototype reasoning. To counteract the dilution of subtle lesion cues in global image representations, EasyAmplifier strengthens the selected lesion-relevant patch representations through morphology-guided residual enhancement, thereby increasing their contribution to the global image embedding. Experiments on multiple medical image datasets and frozen medical VLM backbones show that EasyLens improves subtle-lesion detection and outperforms existing encoder-enhancement baselines.

翻译：医学视觉语言模型（VLM）在临床图像解读（包括病灶检测和报告生成）方面展现出日益增长的潜力。然而，由于对微病变的敏感性不足，其实际应用仍受局限——此类病变的视觉证据通常稀疏、低对比度且嵌入复杂的解剖结构背景中。当局部视觉令牌聚合时，这些微弱的病灶线索可能在全球图像表征中失去代表性，导致医学VLM难以识别。现有提升病灶敏感性的工作主要依赖医学领域视觉编码器预训练、临床术语引导对齐或可训练的病理表征增强。尽管这些方法有效，但通常需要额外训练或模型特定适配，且可能对特定疾病形态过拟合，限制了其在冻结态医学VLM上的通用性。为解决上述局限，我们提出EasyLens——一种面向医学VLM的无训练即插即用微病变表征放大器。EasyLens首先构建病理-解剖原型空间EasyBank，该空间提供病灶相关原型与解剖感知正常参照，以对比检测可疑斑块的病理与正常解剖模式。为避免盲目放大正常组织，EasyTag通过反事实原型推理筛选病灶相关斑块。为抵消微病变线索在全局图像表征中的稀释效应，EasyAmplifier通过形态引导残差增强强化所选病灶相关斑块表征，从而提升其对全局图像嵌入的贡献。在多个医学图像数据集及冻结态医学VLM骨干上的实验表明，EasyLens能有效提升微病变检测性能，并优于现有编码器增强基线方法。