The contextual information is critical for various computer vision tasks, previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context. These methods utilize fine-label to optimize the model but ignore that fine-trained features are also precious training resources, which can introduce preferable distribution to hard pixels (i.e., misclassified pixels). Inspired by contrastive learning in unsupervised paradigm, we apply the contrastive loss in a supervised manner and re-design the loss function to cast off the stereotype of unsupervised learning (e.g., imbalance of positives and negatives, confusion of anchors computing). To this end, we propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally. The PNE loss can be directly plugged right into existing semantic segmentation frameworks and leads to excellent performance with neglectable extra computational costs. We utilize a number of classic segmentation methods (e.g., DeepLabV3, HRNetV2, OCRNet, UperNet) and backbone (e.g., ResNet, HRNet, Swin Transformer) to conduct comprehensive experiments and achieve state-of-the-art performance on three benchmark datasets (e.g., Cityscapes, COCO-Stuff and ADE20K). Our code will be publicly available soon.
翻译:上下文信息对于多种计算机视觉任务至关重要,以往工作通常设计即插即用模块和结构损失函数来有效提取并聚合全局上下文。这些方法利用精细标签优化模型,但忽略了精细训练后的特征也是宝贵的训练资源——它们可以为困难像素(即误分类像素)引入更优的分布。受无监督范式下对比学习的启发,我们以监督方式应用对比损失并重新设计损失函数,摆脱无监督学习的刻板模式(例如正负样本不平衡、锚点计算混乱)。为此,我们提出正负均衡对比损失(PNE Loss),该损失增大了正嵌入对锚点的潜在影响,并将正负样本对同等对待。PNE损失可直接嵌入现有语义分割框架,以可忽略的额外计算成本实现卓越性能。我们采用多种经典分割方法(如DeepLabV3、HRNetV2、OCRNet、UperNet)和骨干网络(如ResNet、HRNet、Swin Transformer)开展全面实验,在三个基准数据集(如Cityscapes、COCO-Stuff和ADE20K)上取得了最先进性能。我们的代码将很快公开。