The pedestrian crossing intention prediction problem is to estimate whether or not the target pedestrian will cross the street. State-of-the-art techniques heavily depend on visual data acquired through the front camera of the ego-vehicle to make a prediction of the pedestrian's crossing intention. Hence, the efficiency of current methodologies tends to decrease notably in situations where visual input is imprecise, for instance, when the distance between the pedestrian and ego-vehicle is considerable or the illumination levels are inadequate. To address the limitation, in this paper, we present the design, implementation, and evaluation of the first-of-its-kind pedestrian crossing intention prediction model based on integration of motion sensor data gathered through the smartwatch (or smartphone) of the pedestrian. We propose an innovative machine learning framework that effectively integrates motion sensor data with visual input to enhance the predictive accuracy significantly, particularly in scenarios where visual data may be unreliable. Moreover, we perform an extensive data collection process and introduce the first pedestrian intention prediction dataset that features synchronized motion sensor data. The dataset comprises 255 video clips that encompass diverse distances and lighting conditions. We trained our model using the widely-used JAAD and our own datasets and compare the performance with a state-of-the-art model. The results demonstrate that our model outperforms the current state-of-the-art method, particularly in cases where the distance between the pedestrian and the observer is considerable (more than 70 meters) and the lighting conditions are inadequate.
翻译:行人过街意图预测问题旨在估计目标行人是否会横穿马路。现有技术严重依赖通过自车前置摄像头获取的视觉数据来预测行人的过街意图。因此,当视觉输入不精确时(例如行人与自车距离较远或光照条件不足),当前方法的效果会显著下降。为解决这一局限,本文首次提出并设计、实现及评估了一种基于行人智能手表(或智能手机)运动传感器数据的过街意图预测模型。我们提出了一种创新的机器学习框架,该框架有效融合运动传感器数据与视觉输入,在视觉数据可能不可靠的场景下显著提升了预测准确性。此外,我们开展了大规模数据采集工作,并发布了首个包含同步运动传感器数据的行人意图预测数据集。该数据集包含255个视频片段,覆盖不同距离和光照条件。我们使用广泛使用的JAAD数据集及自建数据集训练模型,并与现有最优方法进行了性能对比。结果表明,本模型在行人与观察者距离较远(超过70米)且光照条件不足时,性能优于当前最优方法。