Effectively capturing the joint distribution of all agents in a scene is relevant for predicting the true evolution of the scene and in turn providing more accurate information to the decision processes of autonomous vehicles. While new models have been developed for this purpose in recent years, it remains unclear how to best represent the joint distributions particularly from the perspective of the interactions between agents. Thus far there is no clear consensus on how best to represent interactions between agents; whether they should be learned implicitly from data by neural networks, or explicitly modeled using the spatial and temporal relations that are more grounded in human decision-making. This paper aims to study various means of describing interactions within the same network structure and their effect on the final learned joint distributions. Our findings show that more often than not, simply allowing a network to establish interactive connections between agents based on data has a detrimental effect on performance. Instead, having well defined interactions (such as which agent of an agent pair passes first at an intersection) can often bring about a clear boost in performance.
翻译:有效捕捉场景中所有智能体的联合分布,对于预测场景的真实演化过程至关重要,进而能为自动驾驶车辆的决策过程提供更准确的信息。尽管近年来已为此目的开发了多种新模型,但如何最佳地表示联合分布,特别是从智能体间交互的角度来看,仍不明确。迄今为止,关于如何最佳表示智能体间的交互尚未形成明确共识:究竟应通过神经网络从数据中隐式学习,还是应使用更贴近人类决策过程的时空关系进行显式建模。本文旨在研究在同一网络结构中描述交互的各种方式及其对最终学习到的联合分布的影响。我们的研究结果表明,在多数情况下,仅允许网络基于数据在智能体间建立交互连接反而会对性能产生不利影响。相反,明确定义的交互(例如在交叉路口处确定智能体对中哪一个优先通过)通常能带来显著的性能提升。