Over the past decade, significant progress has been made in visual object tracking, largely due to the availability of large-scale training datasets. However, existing tracking datasets are primarily focused on open-air scenarios, which greatly limits the development of object tracking in underwater environments. To address this issue, we take a step forward by proposing the first large-scale underwater camouflaged object tracking dataset, namely UW-COT. Based on the proposed dataset, this paper presents an experimental evaluation of several advanced visual object tracking methods and the latest advancements in image and video segmentation. Specifically, we compare the performance of the Segment Anything Model (SAM) and its updated version, SAM 2, in challenging underwater environments. Our findings highlight the improvements in SAM 2 over SAM, demonstrating its enhanced capability to handle the complexities of underwater camouflaged objects. Compared to current advanced visual object tracking methods, the latest video segmentation foundation model SAM 2 also exhibits significant advantages, providing valuable insights into the development of more effective tracking technologies for underwater scenarios. The dataset will be accessible at \color{magenta}{https://github.com/983632847/Awesome-Multimodal-Object-Tracking}.
翻译:过去十年间,视觉目标跟踪领域取得了显著进展,这主要得益于大规模训练数据集的可用性。然而,现有的跟踪数据集主要集中于开放空间场景,这极大地限制了水下环境中目标跟踪技术的发展。为解决这一问题,我们向前迈进一步,提出了首个大规模水下伪装目标跟踪数据集,即UW-COT。基于所提出的数据集,本文对几种先进的视觉目标跟踪方法以及图像与视频分割的最新进展进行了实验评估。具体而言,我们比较了Segment Anything Model(SAM)及其更新版本SAM 2在具有挑战性的水下环境中的性能。我们的研究结果凸显了SAM 2相较于SAM的改进,证明了其处理水下伪装目标复杂性的增强能力。与当前先进的视觉目标跟踪方法相比,最新的视频分割基础模型SAM 2也展现出显著优势,为开发更有效的水下场景跟踪技术提供了宝贵的见解。该数据集可通过 \color{magenta}{https://github.com/983632847/Awesome-Multimodal-Object-Tracking} 访问。