This paper presents a new mechanism to facilitate the training of mask transformers for efficient panoptic segmentation, democratizing its deployment. We observe that due to its high complexity, the training objective of panoptic segmentation will inevitably lead to much higher false positive penalization. Such unbalanced loss makes the training process of the end-to-end mask-transformer based architectures difficult, especially for efficient models. In this paper, we present ReMaX that adds relaxation to mask predictions and class predictions during training for panoptic segmentation. We demonstrate that via these simple relaxation techniques during training, our model can be consistently improved by a clear margin \textbf{without} any extra computational cost on inference. By combining our method with efficient backbones like MobileNetV3-Small, our method achieves new state-of-the-art results for efficient panoptic segmentation on COCO, ADE20K and Cityscapes. Code and pre-trained checkpoints will be available at \url{https://github.com/google-research/deeplab2}.
翻译:本文提出了一种新机制,旨在优化掩码变换器(mask transformer)在高效全景分割任务中的训练,从而推动其部署的民主化。我们发现,由于全景分割训练目标具有高度复杂性,不可避免地会导致对假阳性样本的过度惩罚。这种不平衡的损失函数使得基于端到端掩码变换器的架构训练过程面临挑战,尤其对于高效模型而言。本文提出的ReMaX方法,通过在全景分割训练过程中对掩码预测和类别预测引入放松策略,有效缓解了这一问题。实验证明,仅通过训练阶段的这些简单放松技术,我们的模型即可在推理时无需额外计算成本的情况下,获得显著一致的性能提升。通过将所提方法与高效骨干网络(如MobileNetV3-Small)结合,我们的方法在COCO、ADE20K和Cityscapes数据集上实现了高效全景分割的最新最优结果。代码与预训练检查点将在\url{https://github.com/google-research/deeplab2}上开源。