To safely and efficiently solve motion planning problems in multi-agent settings, most approaches attempt to solve a joint optimization that explicitly accounts for the responses triggered in other agents. This often results in solutions with an exponential computational complexity, making these methods intractable for complex scenarios with many agents. While sequential predict-and-plan approaches are more scalable, they tend to perform poorly in highly interactive environments. This paper proposes a method to improve the interactive capabilities of sequential predict-and-plan methods in multi-agent navigation problems by introducing predictability as an optimization objective. We interpret predictability through the use of general prediction models, by allowing agents to predict themselves and estimate how they align with these external predictions. We formally introduce this behavior through the free-energy of the system, which reduces under appropriate bounds to the Kullback-Leibler divergence between plan and prediction, and use this as a penalty for unpredictable trajectories.The proposed interpretation of predictability allows agents to more robustly leverage prediction models, and fosters a soft social convention that accelerates agreement on coordination strategies without the need of explicit high level control or communication. We show how this predictability-aware planning leads to lower-cost trajectories and reduces planning effort in a set of multi-robot problems, including autonomous driving experiments with human driver data, where we show that the benefits of considering predictability apply even when only the ego-agent uses this strategy.
翻译:为安全高效地解决多智能体运动规划问题,现有方法大多尝试通过联合优化显式考虑其他智能体的响应行为。这通常导致具有指数计算复杂度的解,使得此类方法在复杂多智能体场景中难以求解。虽然顺序预测-规划方法具有更好的可扩展性,但在高度交互环境中往往表现不佳。本文提出通过引入可预测性作为优化目标,以提升顺序预测-规划方法在多智能体导航问题中的交互能力。我们通过通用预测模型阐释可预测性概念,允许智能体进行自我预测并评估其行为与外部预测的一致性。通过系统自由能形式化描述该行为——在适当边界条件下可简化为规划与预测间的Kullback-Leibler散度,并以此作为不可预测轨迹的惩罚项。所提出的可预测性阐释机制使智能体能更鲁棒地利用预测模型,并形成柔性社会公约以加速协调策略共识的达成,无需显式高层控制或通信。我们通过多机器人问题集(包括采用人类驾驶数据的自动驾驶实验)证明,这种可预测性感知规划能产生更低成本的轨迹并减少规划开销,即使仅自智能体采用该策略仍能体现其优势。