Deep reinforcement learning (DRL) provides a promising way for intelligent agents (e.g., autonomous vehicles) to learn to navigate complex scenarios. However, DRL with neural networks as function approximators is typically considered a black box with little explainability and often suffers from suboptimal performance, especially for autonomous navigation in highly interactive multi-agent environments. To address these issues, we propose three auxiliary tasks with spatio-temporal relational reasoning and integrate them into the standard DRL framework, which improves the decision making performance and provides explainable intermediate indicators. We propose to explicitly infer the internal states (i.e., traits and intentions) of surrounding agents (e.g., human drivers) as well as to predict their future trajectories in the situations with and without the ego agent through counterfactual reasoning. These auxiliary tasks provide additional supervision signals to infer the behavior patterns of other interactive agents. Multiple variants of framework integration strategies are compared. We also employ a spatio-temporal graph neural network to encode relations between dynamic entities, which enhances both internal state inference and decision making of the ego agent. Moreover, we propose an interactivity estimation mechanism based on the difference between predicted trajectories in these two situations, which indicates the degree of influence of the ego agent on other agents. To validate the proposed method, we design an intersection driving simulator based on the Intelligent Intersection Driver Model (IIDM) that simulates vehicles and pedestrians. Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics and provides explainable intermediate indicators (i.e., internal states, and interactivity scores) for decision making.
翻译:深度强化学习(DRL)为智能体(如自动驾驶车辆)学习复杂场景导航提供了有前景的途径。然而,以神经网络作为函数近似器的DRL通常被视为黑箱模型,可解释性不足,且在高度交互的多智能体环境的自主导航任务中常出现次优性能。为解决这些问题,我们提出三项融合时空关系推理的辅助任务,并将其集成至标准DRL框架中,从而提升决策性能并提供可解释的中间指标。具体而言,我们提出通过反事实推理显式推断周围智能体(如人类驾驶员)的内部状态(即特质与意图),并预测其在外界智能体存在/缺失两种情境下的未来轨迹。这些辅助任务为推断其他交互智能体的行为模式提供了额外监督信号。我们比较了多种框架集成策略的变体,同时采用时空图神经网络编码动态实体间的关联,增强了外界智能体的内部状态推断与决策能力。此外,我们提出基于两种情境预测轨迹差异的交互性估计机制,该差异可反映外界智能体对其他智能体的影响程度。为验证所提方法,我们基于智能交叉口驾驶员模型(IIDM)构建了包含车辆与行人的交叉口驾驶仿真器。实验表明,我们的方法在标准评估指标上达到了鲁棒且领先的性能,并为决策过程提供了可解释的中间指标(即内部状态与交互性分数)。