IBoxCLA: Towards Robust Box-supervised Segmentation of Polyp via Improved Box-dice and Contrastive Latent-anchors

Box-supervised polyp segmentation attracts increasing attention for its cost-effective potential. Existing solutions often rely on learning-free methods or pretrained models to laboriously generate pseudo masks, triggering Dice constraint subsequently. In this paper, we found that a model guided by the simplest box-filled masks can accurately predict polyp locations/sizes, but suffers from shape collapsing. In response, we propose two innovative learning fashions, Improved Box-dice (IBox) and Contrastive Latent-Anchors (CLA), and combine them to train a robust box-supervised model IBoxCLA. The core idea behind IBoxCLA is to decouple the learning of location/size and shape, allowing for focused constraints on each of them. Specifically, IBox transforms the segmentation map into a proxy map using shape decoupling and confusion-region swapping sequentially. Within the proxy map, shapes are disentangled, while locations/sizes are encoded as box-like responses. By constraining the proxy map instead of the raw prediction, the box-filled mask can well supervise IBoxCLA without misleading its shape learning. Furthermore, CLA contributes to shape learning by generating two types of latent anchors, which are learned and updated using momentum and segmented polyps to steadily represent polyp and background features. The latent anchors facilitate IBoxCLA to capture discriminative features within and outside boxes in a contrastive manner, yielding clearer boundaries. We benchmark IBoxCLA on five public polyp datasets. The experimental results demonstrate the competitive performance of IBoxCLA compared to recent fully-supervised polyp segmentation methods, and its superiority over other box-supervised state-of-the-arts with a relative increase of overall mDice and mIoU by at least 6.5% and 7.5%, respectively.

翻译：盒监督息肉分割因其成本效益潜力而受到越来越多的关注。现有解决方案通常依赖于无学习方法或预训练模型来费力地生成伪掩码，随后触发骰子约束。在本文中，我们发现由最简单的盒填充掩码引导的模型能够准确预测息肉的位置/大小，但存在形状塌陷问题。为此，我们提出了两种创新的学习范式：改进盒骰子（IBox）和对比潜在锚点（CLA），并将它们结合起来训练一个稳健的盒监督模型IBoxCLA。IBoxCLA背后的核心思想是将位置/大小和形状的学习解耦，从而能够对它们分别进行聚焦约束。具体而言，IBox通过依次应用形状解耦和混淆区域交换，将分割图转换为代理图。在代理图中，形状被解耦，而位置/大小则被编码为类盒响应。通过约束代理图而非原始预测，盒填充掩码能够很好地监督IBoxCLA，而不会误导其形状学习。此外，CLA通过生成两种类型的潜在锚点来促进形状学习，这些锚点使用动量和分割出的息肉进行学习和更新，以稳定地表示息肉和背景特征。潜在锚点有助于IBoxCLA以对比方式捕捉盒内和盒外的判别性特征，从而产生更清晰的边界。我们在五个公共息肉数据集上对IBoxCLA进行了基准测试。实验结果表明，与近期全监督息肉分割方法相比，IBoxCLA具有竞争性的性能，并且相对于其他盒监督的最先进方法具有优越性，其整体mDice和mIoU相对提升至少分别为6.5%和7.5%。