Pedestrian intention prediction is crucial for autonomous driving. In particular, knowing if pedestrians are going to cross in front of the ego-vehicle is core to performing safe and comfortable maneuvers. Creating accurate and fast models that predict such intentions from sequential images is challenging. A factor contributing to this is the lack of datasets with diverse crossing and non-crossing (C/NC) scenarios. We address this scarceness by introducing a framework, named ARCANE, which allows programmatically generating synthetic datasets consisting of C/NC video clip samples. As an example, we use ARCANE to generate a large and diverse dataset named PedSynth. We will show how PedSynth complements widely used real-world datasets such as JAAD and PIE, so enabling more accurate models for C/NC prediction. Considering the onboard deployment of C/NC prediction models, we also propose a deep model named PedGNN, which is fast and has a very low memory footprint. PedGNN is based on a GNN-GRU architecture that takes a sequence of pedestrian skeletons as input to predict crossing intentions.
翻译:行人意图预测对自动驾驶至关重要。特别是,判断行人是否会在自车前方横穿道路,是执行安全舒适操作的核心。基于连续图像构建快速准确的意图预测模型面临挑战,其成因之一是缺乏包含多样化横穿与非横穿(C/NC)场景的数据集。针对这一数据稀缺问题,我们提出名为ARCANE的框架,可程序化生成由C/NC视频片段样本构成的合成数据集。作为示例,我们利用ARCANE生成了名为PedSynth的大规模多样化数据集。研究表明,PedSynth能有效补充JAAD和PIE等广泛使用的真实数据集,从而提升C/NC预测模型的准确性。考虑到C/NC预测模型的车载部署需求,我们还提出一种名为PedGNN的深度模型,该模型具有快速推理和极低内存占用特性。PedGNN采用GNN-GRU架构,以行人骨架序列作为输入进行横穿意图预测。