Image-text contrastive learning has proven effective for pretraining medical image models. When targeting localized downstream tasks like semantic segmentation or object detection, additional local contrastive losses that align image regions with sentences have shown promising results. We study how local contrastive losses are related to global (per-sample) contrastive losses and which effects they have on localized medical downstream tasks. Based on a theoretical comparison, we propose to remove some components of local losses and replace others by a novel distribution prior which enforces uniformity of representations within each sample. We empirically study this approach on chest X-ray tasks and find it to be very effective, outperforming methods without local losses on 12 of 18 tasks.
翻译:图像-文本对比学习已被证明在医学图像模型预训练中行之有效。当针对语义分割或目标检测等局部下游任务时,额外引入将图像区域与句子对齐的局部对比损失可取得显著成效。本文研究局部对比损失与全局(逐样本)对比损失之间的关联,并探讨其对医学局部下游任务的影响。基于理论对比分析,我们提出移除局部损失的某些组件,并以一种新颖的分布先验替换其余组件——该先验强制每个样本内部表示具有均匀性。我们在胸部X光任务上对该方法进行实证研究,发现其效果显著,在18项任务中有12项超越了未使用局部损失的方法。