Beyond RMSE: Do machine-learned models of road user interaction produce human-like behavior?

Aravinda Ramakrishnan Srinivasan,Yi-Shin Lin,Morris Antonello,Anthony Knittel,Mohamed Hasan,Majd Hawasly,John Redford,Subramanian Ramamoorthy,Matteo Leonetti,Jac Billington,Richard Romano,Gustav Markkula

from arxiv, This work has been accepted for publication in the IEEE Transactions on Intelligent Transportation Systems journal on 13th March 2023

Autonomous vehicles use a variety of sensors and machine-learned models to predict the behavior of surrounding road users. Most of the machine-learned models in the literature focus on quantitative error metrics like the root mean square error (RMSE) to learn and report their models' capabilities. This focus on quantitative error metrics tends to ignore the more important behavioral aspect of the models, raising the question of whether these models really predict human-like behavior. Thus, we propose to analyze the output of machine-learned models much like we would analyze human data in conventional behavioral research. We introduce quantitative metrics to demonstrate presence of three different behavioral phenomena in a naturalistic highway driving dataset: 1) The kinematics-dependence of who passes a merging point first 2) Lane change by an on-highway vehicle to accommodate an on-ramp vehicle 3) Lane changes by vehicles on the highway to avoid lead vehicle conflicts. Then, we analyze the behavior of three machine-learned models using the same metrics. Even though the models' RMSE value differed, all the models captured the kinematic-dependent merging behavior but struggled at varying degrees to capture the more nuanced courtesy lane change and highway lane change behavior. Additionally, the collision aversion analysis during lane changes showed that the models struggled to capture the physical aspect of human driving: leaving adequate gap between the vehicles. Thus, our analysis highlighted the inadequacy of simple quantitative metrics and the need to take a broader behavioral perspective when analyzing machine-learned models of human driving predictions.

翻译：自动驾驶汽车利用多种传感器和机器学习模型预测周围道路使用者的行为。现有文献中的大多数机器学习模型专注于定量误差指标（如均方根误差RMSE）来学习和报告模型能力。这种对定量误差指标的侧重往往忽略了模型更重要的行为层面，引发了一个问题：这些模型是否真正预测了类人行为？因此，我们提出像传统行为研究中分析人类数据那样分析机器学习模型的输出。我们引入定量指标来证明在自然驾驶场景的高速公路数据集中存在三类不同的行为现象：1）谁先通过汇合点的运动学依赖性；2）高速公路上车辆为容纳匝道车辆而进行的车道变更；3）高速公路上车辆为避免前车冲突而进行的车道变更。随后，我们使用相同指标分析三种机器学习模型的行为。尽管模型的RMSE值不同，但所有模型都能捕捉运动学依赖的汇合行为，但在捕捉更微妙的礼节性车道变更和高速公路车道变更行为时，不同模型表现出不同程度的困难。此外，车道变更期间的碰撞规避分析表明，这些模型难以捕捉人类驾驶的物理特性：在车辆之间保持足够间距。因此，我们的分析凸显了简单定量指标的局限性，并表明在分析人类驾驶预测的机器学习模型时需要采用更广泛的行为视角。