Predicting a binary mask for an object is more accurate but also more computationally expensive than a bounding box. Polygonal masks as developed in CenterPoly can be a good compromise. In this paper, we improve over CenterPoly by enhancing the classical regression L1 loss with a novel region-based loss and a novel order loss, as well as with a new training process for the vertices prediction head. Moreover, the previous methods that predict polygonal masks use different coordinate systems, but it is not clear if one is better than another, if we abstract the architecture requirement. We therefore investigate their impact on the prediction. We also use a new evaluation protocol with oracle predictions for the detection head, to further isolate the segmentation process and better compare the polygonal masks with binary masks. Our instance segmentation method is trained and tested with challenging datasets containing urban scenes, with a high density of road users. Experiments show, in particular, that using a combination of a regression loss and a region-based loss allows significant improvements on the Cityscapes and IDD test set compared to CenterPoly. Moreover the inference stage remains fast enough to reach real-time performance with an average of 0.045 s per frame for 2048$\times$1024 images on a single RTX 2070 GPU. The code is available $\href{https://github.com/KatiaJDL/CenterPoly-v2}{\text{here}}$.
翻译:为物体预测二值掩膜虽比边界框更精确,但计算成本也更高。正如CenterPoly方法所示,多边形掩膜可成为良好的折中方案。本文通过引入新型区域损失函数和新型排序损失函数对经典回归L1损失进行改进,并采用新的顶点预测头训练流程,从而实现了对CenterPoly的优化。此外,现有预测多边形掩膜的方法使用不同坐标系,但若抽象化架构需求,则难以判断何种坐标系统更优。为此,我们研究了不同坐标系对预测结果的影响。我们还采用了一种基于检测头理想预测的新型评估协议,以进一步分离分割过程并更好地比较多边形掩膜与二值掩膜。本实例分割方法使用包含高密度道路使用者的城市场景挑战性数据集进行训练与测试。实验表明,与CenterPoly相比,结合回归损失与区域损失的方法在Cityscapes和IDD测试集上实现了显著提升。同时推理阶段仍保持足够速度:在单个RTX 2070 GPU上处理2048×1024图像时,平均每帧仅需0.045秒,达到实时性能要求。相关代码已发布于\href{https://github.com/KatiaJDL/CenterPoly-v2}{\text{此处}}。