In order for autonomous mobile robots to navigate in human spaces, they must abide by our social norms. Reinforcement learning (RL) has emerged as an effective method to train robot navigation policies that are able to respect these norms. However, a large portion of existing work in the field conducts both RL training and testing in simplistic environments. This limits the generalization potential of these models to unseen environments, and the meaningfulness of their reported results. We propose a method to improve the generalization performance of RL social navigation methods using curriculum learning. By employing multiple environment types and by modeling pedestrians using multiple dynamics models, we are able to progressively diversify and escalate difficulty in training. Our results show that the use of curriculum learning in training can be used to achieve better generalization performance than previous training methods. We also show that results presented in many existing state-of-the art RL social navigation works do not evaluate their methods outside of their training environments, and thus do not reflect their policies' failure to adequately generalize to out-of-distribution scenarios. In response, we validate our training approach on larger and more crowded testing environments than those used in training, allowing for more meaningful measurements of model performance.
翻译:为在人类空间中实现自主移动机器人导航,必须遵循社会规范。强化学习已成为训练能尊重这些规范的机器人导航策略的有效方法。然而,该领域现有大部分工作在训练和测试环节均采用简单化环境,这限制了模型对未知环境的泛化能力及其报告结果的意义。本文提出一种利用课程学习提升强化学习社交导航方法泛化性能的方案。通过采用多种环境类型并运用多动力学模型模拟行人行为,我们能够逐步多样化训练过程并增加难度。实验表明,在训练中引入课程学习可获得优于传统训练方法的泛化性能。同时我们发现,现有诸多社交导航强化学习前沿工作仅在训练环境中评估方法,未能反映其策略在分布外场景中泛化失效的问题。为此,我们在比训练环境更大且更拥挤的测试环境中验证训练方案,从而实现对模型性能更有意义的评估。