Pedestrian Intention prediction is one of the key technologies in the transition from level 3 to level 4 autonomous driving. To understand pedestrian crossing behaviour, several elements and features should be taken into consideration to make the roads of tomorrow safer for everybody. We introduce a transformer / video vision transformer based algorithm of different sizes which uses different data modalities .We evaluated our algorithms on popular pedestrian behaviour dataset, JAAD, and have reached SOTA performance and passed the SOTA in metrics like Accuracy, AUC and F1-score. The advantages brought by different model design choices are investigated via extensive ablation studies.
翻译:行人意图预测是从L3级向L4级自动驾驶演进的关键技术之一。为理解行人横穿道路的行为,需综合考虑多种要素与特征,以提升未来道路的普适安全性。本文提出一种基于Transformer/视频视觉Transformer的多尺度算法框架,该框架融合了多种数据模态。我们在主流行人行为数据集JAAD上对所提算法进行了评估,其在准确率、AUC及F1分数等指标上均达到并超越了当前最优性能。通过系统的消融实验,我们深入探究了不同模型设计策略带来的性能优势。