In recent years, simulations of pedestrians using the multi-agent reinforcement learning (MARL) have been studied. This study considered the roads on a grid-world environment, and implemented pedestrians as MARL agents using an echo-state network and the least squares policy iteration method. Under this environment, the ability of these agents to learn to move forward by avoiding other agents was investigated. Specifically, we considered two types of tasks: the choice between a narrow direct route and a broad detour, and the bidirectional pedestrian flow in a corridor. The simulations results indicated that the learning was successful when the density of the agents was not that high.
翻译:近年来,利用多智能体强化学习(MARL)进行行人仿真的研究已得到开展。本研究在网格世界环境中考虑道路场景,并采用回声状态网络与最小二乘策略迭代方法将行人实现为MARL智能体。在此环境下,探究了智能体通过学习实现避让他人并向前移动的能力。具体而言,我们考虑了两类任务:在狭窄直达路径与宽阔绕行路线之间的选择,以及在走廊中的双向行人流。仿真结果表明,当智能体密度不太高时,学习过程能够成功完成。