Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360{\deg} images. To alleviate these problems, we introduce a novel representation, extended bounding field-of-view (eBFoV), for target localization and use it as the foundation of a general 360 tracking framework which is applicable for both omnidirectional visual object tracking and segmentation tasks. Building upon our previous work on omnidirectional visual object tracking (360VOT), we propose a comprehensive dataset and benchmark that incorporates a new component called omnidirectional video object segmentation (360VOS). The 360VOS dataset includes 290 sequences accompanied by dense pixel-wise masks and covers a broader range of target categories. To support both the development and evaluation of algorithms in this domain, we divide the dataset into a training subset with 170 sequences and a testing subset with 120 sequences. Furthermore, we tailor evaluation metrics for both omnidirectional tracking and segmentation to ensure rigorous assessment. Through extensive experiments, we benchmark state-of-the-art approaches and demonstrate the effectiveness of our proposed 360 tracking framework and training dataset. Homepage: https://360vots.hkustvgd.com/
翻译:全向视频中的视觉目标跟踪与分割因360°图像带来的宽视场和大范围球面畸变而极具挑战性。为缓解这些问题,我们提出了一种新颖的目标定位表征——扩展包围视场(eBFoV),并将其作为通用360°跟踪框架的基础,该框架同时适用于全向视觉目标跟踪与分割任务。基于我们在全向视觉目标跟踪(360VOT)方面的前期工作,我们构建了一个包含全向视频目标分割(360VOS)新组件的综合性数据集与基准。360VOS数据集包含290个序列,配有密集像素级掩码,并覆盖更广泛的目标类别。为支持该领域算法的开发与评估,我们将数据集划分为训练子集(170个序列)和测试子集(120个序列)。此外,我们针对全向跟踪与分割任务定制了评估指标以确保严格评估。通过大量实验,我们对现有最优方法进行了基准测试,并验证了所提出的360°跟踪框架与训练数据集的有效性。主页:https://360vots.hkustvgd.com/