In recent years, Video Object Segmentation (VOS) has emerged as a complementary method to Video Object Tracking (VOT). VOS focuses on classifying all the pixels around the target, allowing for precise shape labeling, while VOT primarily focuses on the approximate region where the target might be. However, traditional segmentation modules usually classify pixels frame by frame, disregarding information between adjacent frames. In this paper, we propose a new algorithm that addresses this limitation by analyzing the motion pattern using the inherent tensor structure. The tensor structure, obtained through Tucker2 tensor decomposition, proves to be effective in describing the target's motion. By incorporating this information, we achieved competitive results on Four benchmarks LaSOT\cite{fan2019lasot}, AVisT\cite{noman2022avist}, OTB100\cite{7001050}, and GOT-10k\cite{huang2019got} LaSOT\cite{fan2019lasot} with SOTA. Furthermore, the proposed tracker is capable of real-time operation, adding value to its practical application.
翻译:近年来,视频目标分割(VOS)已成为视频目标跟踪(VOT)的补充方法。VOS专注于对目标周围的所有像素进行分类,能够实现精确的形状标注,而VOT则主要关注目标可能存在的近似区域。然而,传统分割模块通常逐帧对像素进行分类,忽略了相邻帧之间的关联信息。本文提出了一种新算法,通过利用张量结构的内在特性分析运动模式来解决这一局限。通过Tucker2张量分解获得的张量结构被证明能有效描述目标运动。通过引入该信息,我们在LaSOT\cite{fan2019lasot}、AVisT\cite{noman2022avist}、OTB100\cite{7001050}和GOT-10k\cite{huang2019got}四个基准数据集上取得了具有竞争力的效果,达到当前最优水平(SOTA)。此外,所提出的跟踪器具备实时运行能力,增强了其实用价值。