Video Camouflaged Object Detection (VCOD) is a challenging task which aims to identify objects that seamlessly concealed within the background in videos. The dynamic properties of video enable detection of camouflaged objects through motion cues or varied perspectives. Previous VCOD datasets primarily contain animal objects, limiting the scope of research to wildlife scenarios. However, the applications of VCOD extend beyond wildlife and have significant implications in security, art, and medical fields. Addressing this problem, we construct a new large-scale multi-domain VCOD dataset MSVCOD. To achieve high-quality annotations, we design a semi-automatic iterative annotation pipeline that reduces costs while maintaining annotation accuracy. Our MSVCOD is the largest VCOD dataset to date, introducing multiple object categories including human, animal, medical, and vehicle objects for the first time, while also expanding background diversity across various environments. This expanded scope increases the practical applicability of the VCOD task in camouflaged object detection. Alongside this dataset, we introduce a one-steam video camouflage object detection model that performs both feature extraction and information fusion without additional motion feature fusion modules. Our framework achieves state-of-the-art results on the existing VCOD animal dataset and the proposed MSVCOD. The dataset and code will be made publicly available.
翻译:视频伪装目标检测(VCOD)是一项具有挑战性的任务,其目标在于识别视频中与背景无缝融合的隐藏物体。视频的动态特性使得通过运动线索或多变视角检测伪装物体成为可能。现有的VCOD数据集主要包含动物对象,将研究范围局限于野生动物场景。然而,VCOD的应用远不止于野生动物领域,在安防、艺术和医疗等领域同样具有重要价值。针对这一问题,我们构建了一个新的大规模多领域VCOD数据集MSVCOD。为实现高质量标注,我们设计了一套半自动迭代标注流程,在保证标注精度的同时降低了成本。我们的MSVCOD是迄今为止规模最大的VCOD数据集,首次引入了包括人体、动物、医疗和车辆在内的多类目标对象,同时扩展了多种环境下的背景多样性。这种范围的拓展显著提升了VCOD任务在伪装目标检测中的实际适用性。伴随该数据集,我们提出了一种单流视频伪装目标检测模型,该模型无需额外的运动特征融合模块即可同时完成特征提取与信息融合。我们的框架在现有VCOD动物数据集及所提出的MSVCOD上均取得了最先进的性能。数据集与代码将公开发布。