Panoptic segmentation is one of the most challenging scene parsing tasks, combining the tasks of semantic segmentation and instance segmentation. While much progress has been made, few works focus on the real-time application of panoptic segmentation methods. In this paper, we revisit the recently introduced K-Net architecture. We propose vital changes to the architecture, training, and inference procedure, which massively decrease latency and improve performance. Our resulting RT-K-Net sets a new state-of-the-art performance for real-time panoptic segmentation methods on the Cityscapes dataset and shows promising results on the challenging Mapillary Vistas dataset. On Cityscapes, RT-K-Net reaches 60.2 % PQ with an average inference time of 32 ms for full resolution 1024x2048 pixel images on a single Titan RTX GPU. On Mapillary Vistas, RT-K-Net reaches 33.2 % PQ with an average inference time of 69 ms. Source code is available at https://github.com/markusschoen/RT-K-Net.
翻译:全景分割是结合语义分割与实例分割任务的最具挑战性的场景解析任务之一。尽管已取得诸多进展,但鲜有研究关注全景分割方法的实时应用。本文重新审视了近期提出的K-Net架构,对其架构、训练及推理流程提出关键改进,显著降低了延迟并提升了性能。由此得到的RT-K-Net在Cityscapes数据集上创下了实时全景分割方法的最新最佳性能,并在具有挑战性的Mapillary Vistas数据集上展现出令人瞩目的结果。在Cityscapes上,RT-K-Net在单张Titan RTX GPU上处理全分辨率1024×2048像素图像时,平均推理时间为32毫秒,PQ达到60.2%。在Mapillary Vistas上,RT-K-Net平均推理时间为69毫秒,PQ达到33.2%。源代码发布在https://github.com/markusschoen/RT-K-Net。