Tracking Anything in High Quality

Visual object tracking is a fundamental video task in computer vision. Recently, the notably increasing power of perception algorithms allows the unification of single/multiobject and box/mask-based tracking. Among them, the Segment Anything Model (SAM) attracts much attention. In this report, we propose HQTrack, a framework for High Quality Tracking anything in videos. HQTrack mainly consists of a video multi-object segmenter (VMOS) and a mask refiner (MR). Given the object to be tracked in the initial frame of a video, VMOS propagates the object masks to the current frame. The mask results at this stage are not accurate enough since VMOS is trained on several closeset video object segmentation (VOS) datasets, which has limited ability to generalize to complex and corner scenes. To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results. As a compelling testament to the effectiveness of our paradigm, without employing any tricks such as test-time data augmentations and model ensemble, HQTrack ranks the 2nd place in the Visual Object Tracking and Segmentation (VOTS2023) challenge. Code and models are available at https://github.com/jiawen-zhu/HQTrack.

翻译：视觉目标跟踪是计算机视觉中的一项基础视频任务。近年来，感知算法能力的显著增强使得单/多目标与框/掩码类跟踪的统一成为可能。其中，分割任意模型（SAM）备受关注。本报告提出HQTrack——一种面向视频中高质量任意目标跟踪的框架。HQTrack主要由视频多目标分割器（VMOS）和掩码精炼器（MR）组成。给定视频初始帧中待跟踪目标后，VMOS将目标掩码传播至当前帧。由于VMOS在多个封闭式视频目标分割（VOS）数据集上训练，其对复杂与边缘场景的泛化能力有限，此阶段的掩码结果精度不足。为进一步提升跟踪掩码质量，采用预训练MR模型对跟踪结果进行精炼。作为本范式有效性的有力佐证，未采用测试时数据增强与模型集成等技巧的情况下，HQTrack在视觉目标跟踪与分割挑战赛（VOTS2023）中位列第二。代码与模型已开源至https://github.com/jiawen-zhu/HQTrack。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日