Panoptic segmentation is one of the most challenging scene parsing tasks, combining the tasks of semantic segmentation and instance segmentation. While much progress has been made, few works focus on the real-time application of panoptic segmentation methods. In this paper, we revisit the recently introduced K-Net architecture. We propose vital changes to the architecture, training, and inference procedure, which massively decrease latency and improve performance. Our resulting RT-K-Net sets a new state-of-the-art performance for real-time panoptic segmentation methods on the Cityscapes dataset and shows promising results on the challenging Mapillary Vistas dataset. On Cityscapes, RT-K-Net reaches 60.2 % PQ with an average inference time of 32 ms for full resolution 1024x2048 pixel images on a single Titan RTX GPU. On Mapillary Vistas, RT-K-Net reaches 33.2 % PQ with an average inference time of 69 ms. Source code is available at https://github.com/markusschoen/RT-K-Net.
翻译:全景分割是最具挑战性的场景解析任务之一,它结合了语义分割和实例分割的任务。尽管已取得诸多进展,但鲜有研究关注全景分割方法的实时应用。本文重新审视了近期提出的K-Net架构。我们对架构、训练和推理流程进行了关键改进,大幅降低了延迟并提升了性能。由此得到的RT-K-Net在Cityscapes数据集上为实时全景分割方法树立了新的最佳性能,并在具有挑战性的Mapillary Vistas数据集上展现了令人期待的结果。在Cityscapes上,RT-K-Net在单块Titan RTX GPU上处理全分辨率1024×2048像素图像时,平均推理时间为32毫秒,达到了60.2%的PQ(全景质量)。在Mapillary Vistas上,RT-K-Net的平均推理时间为69毫秒,达到了33.2%的PQ。源代码可从https://github.com/markusschoen/RT-K-Net获取。