Identifying unique polyps in colon capsule endoscopy (CCE) images is a critical yet challenging task for medical personnel due to the large volume of images, the cognitive load it creates for clinicians, and the ambiguity in labeling specific frames. This paper formulates this problem as a multi-instance learning (MIL) task, where a query polyp image is compared with a target bag of images to determine uniqueness. We employ a multi-instance verification (MIV) framework that incorporates attention mechanisms, such as variance-excited multi-head attention (VEMA) and distance-based attention (DBA), to enhance the model's ability to extract meaningful representations. Additionally, we investigate the impact of self-supervised learning using SimCLR to generate robust embeddings. Experimental results on a dataset of 1912 polyps from 754 patients demonstrate that attention mechanisms significantly improve performance, with DBA L1 achieving the highest test accuracy of 86.26\% and a test AUC of 0.928 using a ConvNeXt backbone with SimCLR pretraining. This study underscores the potential of MIL and self-supervised learning in advancing automated analysis of Colon Capsule Endoscopy images, with implications for broader medical imaging applications.
翻译:在结肠胶囊内镜(CCE)图像中识别独特息肉是一项关键但极具挑战性的任务,这源于图像数量庞大、给临床医生带来的认知负荷以及特定帧标注的模糊性。本文将这一问题构建为多示例学习(MIL)任务,通过将查询息肉图像与一个目标图像包进行比较以判定其独特性。我们采用了一种多示例验证(MIV)框架,该框架融合了注意力机制,如方差激励多头注意力(VEMA)和基于距离的注意力(DBA),以增强模型提取有意义表征的能力。此外,我们研究了使用SimCLR进行自监督学习对生成鲁棒嵌入的影响。在包含754名患者的1912个息肉数据集上的实验结果表明,注意力机制显著提升了性能,其中采用SimCLR预训练的ConvNeXt骨干网络,DBA L1取得了最高的测试准确率86.26%和测试AUC 0.928。本研究强调了MIL和自监督学习在推进结肠胶囊内镜图像自动化分析方面的潜力,并对更广泛的医学影像应用具有启示意义。