The 2nd Solution for LSVOS Challenge RVOS Track: Spatial-temporal Refinement for Consistent Semantic Segmentation

Referring Video Object Segmentation (RVOS) is a challenging task due to its requirement for temporal understanding. Due to the obstacle of computational complexity, many state-of-the-art models are trained on short time intervals. During testing, while these models can effectively process information over short time steps, they struggle to maintain consistent perception over prolonged time sequences, leading to inconsistencies in the resulting semantic segmentation masks. To address this challenge, we take a step further in this work by leveraging the tracking capabilities of the newly introduced Segment Anything Model version 2 (SAM-v2) to enhance the temporal consistency of the referring object segmentation model. Our method achieved a score of 60.40 \mathcal{J\text{\&}F} on the test set of the MeViS dataset, placing 2nd place in the final ranking of the RVOS Track at the ECCV 2024 LSVOS Challenge.

翻译：参考视频目标分割（RVOS）因其对时序理解的要求而成为一项具有挑战性的任务。由于计算复杂度的限制，许多先进模型仅在短时间间隔上进行训练。在测试阶段，尽管这些模型能够有效处理短时间步内的信息，但在处理长时间序列时难以保持一致的感知能力，导致生成的语义分割掩码出现不一致性。为应对这一挑战，本研究进一步利用新引入的Segment Anything Model版本2（SAM-v2）的跟踪能力，以增强参考目标分割模型的时序一致性。我们的方法在MeViS数据集的测试集上获得了60.40的\mathcal{J\text{\&}F}分数，在ECCV 2024 LSVOS挑战赛RVOS赛道的最终排名中位列第二。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日