When applied to autonomous vehicle (AV) settings, action recognition can enhance an environment model's situational awareness. This is especially prevalent in scenarios where traditional geometric descriptions and heuristics in AVs are insufficient. However, action recognition has traditionally been studied for humans, and its limited adaptability to noisy, un-clipped, un-pampered, raw RGB data has limited its application in other fields. To push for the advancement and adoption of action recognition into AVs, this work proposes a novel two-stage action recognition system, termed RALACs. RALACs formulates the problem of action recognition for road scenes, and bridges the gap between it and the established field of human action recognition. This work shows how attention layers can be useful for encoding the relations across agents, and stresses how such a scheme can be class-agnostic. Furthermore, to address the dynamic nature of agents on the road, RALACs constructs a novel approach to adapting Region of Interest (ROI) Alignment to agent tracks for downstream action classification. Finally, our scheme also considers the problem of active agent detection, and utilizes a novel application of fusing optical flow maps to discern relevant agents in a road scene. We show that our proposed scheme can outperform the baseline on the ICCV2021 Road Challenge dataset and by deploying it on a real vehicle platform, we provide preliminary insight to the usefulness of action recognition in decision making.
翻译:当应用于自动驾驶汽车场景时,动作识别能增强环境模型的情境感知能力。这在传统几何描述与启发式方法在自动驾驶中表现不足的场景下尤为显著。然而,动作识别传统上主要面向人类行为研究,其对含噪声、未裁剪、未优化、原始RGB数据的有限适应性限制了其跨领域应用。为推动动作识别在自动驾驶中的发展与应用,本文提出一种新颖的两阶段动作识别系统——RALACs。该系统旨在构建道路场景动作识别问题框架,并弥合其与成熟的人类动作识别领域之间的鸿沟。研究表明,注意力层可有效编码智能体间交互关系,并强调此类架构具备类别无关特性。此外,为应对道路上智能体的动态特性,RALACs创新性地将感兴趣区域对齐方法适配至智能体轨迹,以支撑下游动作分类。最后,本文方案还考量活跃智能体检测问题,通过融合光流图的创新应用来辨识道路场景中的相关智能体。实验表明,本方案在ICCV2021道路挑战赛数据集上超越基线方法,并基于真实车辆平台部署结果,为动作识别在决策制定中的价值提供了初步验证依据。