IBoxCLA: Towards Robust Box-supervised Segmentation of Polyp via Improved Box-dice and Contrastive Latent-anchors

Box-supervised polyp segmentation attracts increasing attention for its cost-effective potential. Existing solutions often rely on learning-free methods or pretrained models to laboriously generate pseudo masks, triggering Dice constraint subsequently. In this paper, we found that a model guided by the simplest box-filled masks can accurately predict polyp locations/sizes, but suffers from shape collapsing. In response, we propose two innovative learning fashions, Improved Box-dice (IBox) and Contrastive Latent-Anchors (CLA), and combine them to train a robust box-supervised model IBoxCLA. The core idea behind IBoxCLA is to decouple the learning of location/size and shape, allowing for focused constraints on each of them. Specifically, IBox transforms the segmentation map into a proxy map using shape decoupling and confusion-region swapping sequentially. Within the proxy map, shapes are disentangled, while locations/sizes are encoded as box-like responses. By constraining the proxy map instead of the raw prediction, the box-filled mask can well supervise IBoxCLA without misleading its shape learning. Furthermore, CLA contributes to shape learning by generating two types of latent anchors, which are learned and updated using momentum and segmented polyps to steadily represent polyp and background features. The latent anchors facilitate IBoxCLA to capture discriminative features within and outside boxes in a contrastive manner, yielding clearer boundaries. We benchmark IBoxCLA on five public polyp datasets. The experimental results demonstrate the competitive performance of IBoxCLA compared to recent fully-supervised polyp segmentation methods, and its superiority over other box-supervised state-of-the-arts with a relative increase of overall mDice and mIoU by at least 6.5% and 7.5%, respectively.

翻译：箱体监督的息肉分割因其成本效益潜力受到日益关注。现有方法通常依赖无学习技术或预训练模型费力生成伪掩膜，进而触发骰子约束。本文发现，由最简单的箱体填充掩膜引导的模型能准确预测息肉位置与尺寸，但易出现形状坍塌。为此，我们提出两种创新学习范式——改进箱体骰子（IBox）与对比潜在锚点（CLA），并结合两者构建鲁棒箱体监督模型IBoxCLA。其核心思想在于解耦位置/尺寸与形状的学习，从而对二者分别施加聚焦约束。具体而言，IBox通过形状解耦与混淆区域交换顺序操作，将分割图转化为代理图。在代理图中，形状被解耦，而位置/尺寸以类箱体响应形式编码。通过约束代理图而非原始预测，箱体填充掩膜可有效监督IBoxCLA而不误导其形状学习。进一步地，CLA通过生成两类潜在锚点促进形状学习——利用动量和分割息肉周期性地学习与更新这些锚点，以稳定表征息肉与背景特征。潜在锚点使IBoxCLA能以对比方式捕获箱体内外的判别性特征，从而生成更清晰的边界。我们在五个公开息肉数据集上对IBoxCLA进行基准测试。实验结果表明，与近期全监督息肉分割方法相比，IBoxCLA展现出具有竞争力的性能；相较于其他尖端箱体监督方法，其总体mDice和mIoU相对提升分别达至少6.5%和7.5%，呈现显著优势。