This report presents our team's technical solution for participating in Track 3 of the 2024 ECCV ROAD++ Challenge. The task of Track 3 is atomic activity recognition, which aims to identify 64 types of atomic activities in road scenes based on video content. Our approach primarily addresses the challenges of small objects, discriminating between single object and a group of objects, as well as model overfitting in this task. Firstly, we construct a multi-branch activity recognition framework that not only separates different object categories but also the tasks of single object and object group recognition, thereby enhancing recognition accuracy. Subsequently, we develop various model ensembling strategies, including integrations of multiple frame sampling sequences, different frame sampling sequence lengths, multiple training epochs, and different backbone networks. Furthermore, we propose an atomic activity recognition data augmentation method, which greatly expands the sample space by flipping video frames and road topology, effectively mitigating model overfitting. Our methods rank first in the test set of Track 3 for the ROAD++ Challenge 2024, and achieve 69% mAP.
翻译:本报告介绍了我们团队参与2024年ECCV ROAD++挑战赛Track 3的技术方案。Track 3的任务是原子行为识别,旨在基于视频内容识别道路场景中的64种原子行为。我们的方法主要针对该任务中存在的小目标、单个物体与物体群之间的区分以及模型过拟合等挑战。首先,我们构建了一个多分支行为识别框架,该框架不仅分离了不同的物体类别,还将单个物体识别与物体群识别的任务分开,从而提高了识别精度。随后,我们开发了多种模型集成策略,包括多种帧采样序列的集成、不同帧采样序列长度的集成、多个训练轮次的集成以及不同骨干网络的集成。此外,我们提出了一种原子行为识别数据增强方法,通过翻转视频帧和道路拓扑结构,极大地扩展了样本空间,有效缓解了模型过拟合问题。我们的方法在2024年ROAD++挑战赛Track 3的测试集上排名第一,并取得了69%的mAP。