Acquiring driving policies that can transfer to unseen environments is challenging when driving in dense traffic flows. The design of traffic flow is essential and previous studies are unable to balance interaction and safety-criticism. To tackle this problem, we propose a socially adversarial traffic flow. We propose a Contextual Partially-Observable Stochastic Game to model traffic flow and assign Social Value Orientation (SVO) as context. We then adopt a two-stage framework. In Stage 1, each agent in our socially-aware traffic flow is driven by a hierarchical policy where upper-level policy communicates genuine SVOs of all agents, which the lower-level policy takes as input. In Stage 2, each agent in the socially adversarial traffic flow is driven by the hierarchical policy where upper-level communicates mistaken SVOs, taken by the lower-level policy trained in Stage 1. Driving policy is adversarially trained through a zero-sum game formulation with upper-level policies, resulting in a policy with enhanced zero-shot transfer capability to unseen traffic flows. Comprehensive experiments on cross-validation verify the superior zero-shot transfer performance of our method.
翻译:在密集交通流环境下,获取可迁移至未知场景的驾驶策略具有挑战性。交通流的设计至关重要,但现有研究难以兼顾交互性与安全关键性。为解决此问题,我们提出一种社交对抗交通流。首先,构建上下文部分可观测随机博弈模型来表征交通流,并引入社会价值取向作为上下文变量。随后采用两阶段框架:第一阶段,社交感知交通流中的每个智能体由分层策略驱动,上层策略传递所有智能体的真实社会价值取向,下层策略以此作为输入;第二阶段,社交对抗交通流中的每个智能体由同样的分层策略驱动,但上层策略传递经过篡改的社会价值取向,由第一阶段训练的下层策略执行。驾驶策略通过零和博弈形式与上层策略进行对抗训练,最终获得对未知交通流具有增强零样本迁移能力的策略。跨验证集的全面实验表明,本方法在零样本迁移性能上具有显著优势。