Video panoptic segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving. We believe that the decoupling strategy proposed by DVIS enables more effective utilization of temporal information for both "thing" and "stuff" objects. In this report, we successfully validated the effectiveness of the decoupling strategy in video panoptic segmentation. Finally, our method achieved a VPQ score of 51.4 and 53.7 in the development and test phases, respectively, and ultimately ranked 1st in the VPS track of the 2nd PVUW Challenge. The code is available at https://github.com/zhang-tao-whu/DVIS
翻译:视频全景分割是一项具有挑战性的任务,是众多下游应用(包括视频编辑和自动驾驶)的基石。我们认为,DVIS提出的解耦策略能够更有效地利用"物体"和"物品"两类对象的时序信息。在本报告中,我们成功验证了解耦策略在视频全景分割中的有效性。最终,我们的方法在开发阶段和测试阶段分别取得了51.4和53.7的VPQ分数,并在第二届PVUW挑战赛的VPS赛道中荣获第一名。代码已开源在https://github.com/zhang-tao-whu/DVIS。