Limited by expensive pixel-level labels, polyp segmentation models are plagued by data shortage and suffer from impaired generalization. In contrast, polyp bounding box annotations are much cheaper and more accessible. Thus, to reduce labeling cost, we propose to learn a weakly supervised polyp segmentation model (i.e., WeakPolyp) completely based on bounding box annotations. However, coarse bounding boxes contain too much noise. To avoid interference, we introduce the mask-to-box (M2B) transformation. By supervising the outer box mask of the prediction instead of the prediction itself, M2B greatly mitigates the mismatch between the coarse label and the precise prediction. But, M2B only provides sparse supervision, leading to non-unique predictions. Therefore, we further propose a scale consistency (SC) loss for dense supervision. By explicitly aligning predictions across the same image at different scales, the SC loss largely reduces the variation of predictions. Note that our WeakPolyp is a plug-and-play model, which can be easily ported to other appealing backbones. Besides, the proposed modules are only used during training, bringing no computation cost to inference. Extensive experiments demonstrate the effectiveness of our proposed WeakPolyp, which surprisingly achieves a comparable performance with a fully supervised model, requiring no mask annotations at all.
翻译:受限于昂贵的像素级标注成本,息肉分割模型长期面临数据匮乏与泛化能力不足的困境。相比之下,息肉边界框标注更为廉价且易获取。为降低标注成本,本文提出完全基于边界框标注的弱监督息肉分割模型(即WeakPolyp)。然而,粗糙的边界框包含过多噪声。为避免干扰,我们引入掩膜-边界框(M2B)变换。通过监督预测结果的外框掩膜而非预测本身,M2B极大缓解了粗标注与精细预测之间的失配问题。但M2B仅提供稀疏监督,易导致预测结果不唯一。为此,我们进一步提出尺度一致性(SC)损失以实现密集监督。通过显式对齐同一图像在不同尺度下的预测结果,SC损失显著降低了预测的变异性。需指出,WeakPolyp作为即插即用模型,可便捷移植至其他优秀骨干网络。此外,所提模块仅用于训练阶段,不增加推理计算成本。大量实验验证了WeakPolyp的有效性:无需任何掩膜标注,该模型竟能达到与全监督模型相当的性能。