We investigate the feasibility of deploying reinforcement learning (RL) policies for constrained crowd navigation using a low-fidelity simulator. We introduce a representation of the dynamic environment, separating human and obstacle representations. Humans are represented through detected states, while obstacles are represented as computed point clouds based on maps and robot localization. This representation enables RL policies trained in a low-fidelity simulator to deploy in real world with a reduced sim2real gap. Additionally, we propose a spatio-temporal graph to model the interactions between agents and obstacles. Based on the graph, we use attention mechanisms to capture the robot-human, human-human, and human-obstacle interactions. Our method significantly improves navigation performance in both simulated and real-world environments. Video demonstrations can be found at https://sites.google.com/view/constrained-crowdnav/home.
翻译:本研究探讨了在低仿真度模拟器中训练强化学习(RL)策略并应用于受限人群导航任务的可行性。我们提出了一种动态环境表征方法,将行人与障碍物的表征进行分离。行人通过检测到的状态进行表征,而障碍物则基于地图与机器人定位信息以计算得到的点云形式表征。该表征方式能够缩小仿真与现实之间的差距,使得在低仿真度模拟器中训练的强化学习策略能够迁移至真实世界部署。此外,我们提出了一种时空图模型来刻画智能体与障碍物之间的交互关系。基于该图结构,我们利用注意力机制来捕捉机器人-行人、行人-行人以及行人-障碍物之间的交互作用。我们的方法在仿真环境与真实场景中均显著提升了导航性能。视频演示可在 https://sites.google.com/view/constrained-crowdnav/home 查看。