Traffic videos inherently differ from generic videos in their stationary camera setup, thus providing a strong motion prior where objects often move in a specific direction over a short time interval. Existing works predominantly employ generic video object detection framework for traffic video object detection, which yield certain advantages such as broad applicability and robustness to diverse scenarios. However, they fail to harness the strength of motion prior to enhance detection accuracy. In this work, we propose two innovative methods to exploit the motion prior and boost the performance of both fully-supervised and semi-supervised traffic video object detection. Firstly, we introduce a new self-attention module that leverages the motion prior to guide temporal information integration in the fully-supervised setting. Secondly, we utilise the motion prior to develop a pseudo-labelling mechanism to eliminate noisy pseudo labels for the semi-supervised setting. Both of our motion-prior-centred methods consistently demonstrates superior performance, outperforming existing state-of-the-art approaches by a margin of 2% in terms of mAP.
翻译:交通视频与通用视频的本质区别在于其固定的摄像机设置,这提供了强运动先验:目标通常在短时间内沿特定方向运动。现有工作主要采用通用视频目标检测框架处理交通视频目标检测,虽具备广泛适用性和多场景鲁棒性等优势,但未能充分利用运动先验来提升检测精度。本研究提出两种创新方法,通过挖掘运动先验分别增强全监督和半监督交通视频目标检测的性能。首先,我们引入新型自注意力模块,在全监督设置下借助运动先验指导时序信息融合;其次,利用运动先验设计伪标签机制,消除半监督场景中的噪声伪标签。这两种以运动先验为核心的方法均展现出优越性能,在mAP指标上超越现有最先进方法2%。