Accurate understanding and prediction of human behaviors are critical prerequisites for autonomous vehicles, especially in highly dynamic and interactive scenarios such as intersections in dense urban areas. In this work, we aim at identifying crossing pedestrians and predicting their future trajectories. To achieve these goals, we not only need the context information of road geometry and other traffic participants but also need fine-grained information of the human pose, motion and activity, which can be inferred from human keypoints. In this paper, we propose a novel multi-task learning framework for pedestrian crossing action recognition and trajectory prediction, which utilizes 3D human keypoints extracted from raw sensor data to capture rich information on human pose and activity. Moreover, we propose to apply two auxiliary tasks and contrastive learning to enable auxiliary supervisions to improve the learned keypoints representation, which further enhances the performance of major tasks. We validate our approach on a large-scale in-house dataset, as well as a public benchmark dataset, and show that our approach achieves state-of-the-art performance on a wide range of evaluation metrics. The effectiveness of each model component is validated in a detailed ablation study.
翻译:精确理解与预测人类行为是自动驾驶汽车的关键前提,尤其在密集城区交叉口这类高度动态且交互复杂的场景中。本研究旨在识别过街行人并预测其未来轨迹。为实现上述目标,我们不仅需要道路几何结构及其他交通参与者的上下文信息,还需要从人体关键点推断的人体姿态、运动与活动的细粒度特征。本文提出一种用于行人过街行为识别与轨迹预测的新型多任务学习框架,该框架通过从原始传感器数据中提取3D人体关键点,以捕获丰富的人体姿态与活动信息。此外,我们引入两项辅助任务与对比学习机制,通过辅助监督增强关键点特征表征学习,从而进一步提升主任务性能。我们在大规模自有数据集及公开基准数据集上验证了该方法,结果表明本方法在多项评估指标上均达到最优水平。详细的消融研究验证了各模型组件的有效性。