When applied to autonomous vehicle (AV) settings, action recognition can enhance an environment model's situational awareness. This is especially prevalent in scenarios where traditional geometric descriptions and heuristics in AVs are insufficient. However, action recognition has traditionally been studied for humans, and its limited adaptability to noisy, un-clipped, un-pampered, raw RGB data has limited its application in other fields. To push for the advancement and adoption of action recognition into AVs, this work proposes a novel two-stage action recognition system, termed RALACs. RALACs formulates the problem of action recognition for road scenes, and bridges the gap between it and the established field of human action recognition. This work shows how attention layers can be useful for encoding the relations across agents, and stresses how such a scheme can be class-agnostic. Furthermore, to address the dynamic nature of agents on the road, RALACs constructs a novel approach to adapting Region of Interest (ROI) Alignment to agent tracks for downstream action classification. Finally, our scheme also considers the problem of active agent detection, and utilizes a novel application of fusing optical flow maps to discern relevant agents in a road scene. We show that our proposed scheme can outperform the baseline on the ICCV2021 Road Challenge dataset and by deploying it on a real vehicle platform, we provide preliminary insight to the usefulness of action recognition in decision making.
翻译:当应用于自动驾驶车辆(AV)场景时,动作识别能够增强环境模型的态势感知能力。这在传统几何描述与启发式方法无法充分满足自动驾驶需求的场景中尤为重要。然而,动作识别传统上主要针对人类行为研究,其对含噪、未裁剪、未优化、原始RGB数据的适应性有限,制约了其在其他领域的应用。为推动动作识别在自动驾驶领域的发展与应用,本文提出一种名为RALACs的新型两阶段动作识别系统。RALACs将道路场景动作识别问题公式化,并弥合其与成熟的人类动作识别领域之间的鸿沟。本研究展示了注意力层对编码智能体间关系的有效性,并强调此类方案具有类别无关性。此外,为应对道路上智能体的动态特性,RALACs提出了一种创新方法,将感兴趣区域(ROI)对齐适配至智能体轨迹,以服务于下游动作分类。最后,本方案还考虑了主动智能体检测问题,通过融合光流图的创新应用来辨识道路场景中的相关智能体。实验表明,所提方案在ICCV2021道路挑战赛数据集上优于基线方法,同时通过在真实车辆平台上的部署验证,初步揭示了动作识别在决策过程中的应用价值。