Deep reinforcement learning (RL) has shown promising results in robot motion planning with first attempts in human-robot collaboration (HRC). However, a fair comparison of RL approaches in HRC under the constraint of guaranteed safety is yet to be made. We, therefore, present human-robot gym, a benchmark for safe RL in HRC. Our benchmark provides eight challenging, realistic HRC tasks in a modular simulation framework. Most importantly, human-robot gym includes a safety shield that provably guarantees human safety. We are, thereby, the first to provide a benchmark to train RL agents that adhere to the safety specifications of real-world HRC. This bridges a critical gap between theoretic RL research and its real-world deployment. Our evaluation of six environments led to three key results: (a) the diverse nature of the tasks offered by human-robot gym creates a challenging benchmark for state-of-the-art RL methods, (b) incorporating expert knowledge in the RL training in the form of an action-based reward can outperform the expert, and (c) our agents negligibly overfit to training data.
翻译:深度强化学习(RL)在机器人运动规划中展现出令人瞩目的成果,并已初步应用于人机协作(HRC)领域。然而,在安全性保障约束下,针对HRC中各类RL方法的公平比较尚属空白。为此,我们提出人机健身房——一个面向HRC安全强化学习的基准测试系统。该基准测试在模块化仿真框架中提供了八项具有挑战性的现实HRC任务。尤为重要的是,人机健身房内置可证明保障人类安全的安全防护机制,从而首次为满足真实世界HRC安全规范的RL智能体训练提供了标准化测试平台,弥合了理论RL研究与实际应用之间的关键鸿沟。通过对六个环境的评估,我们获得三项核心发现:(a)人机健身房提供的多样化任务为现有最优RL方法构建了具有挑战性的测试基准,(b)将基于动作奖励的专家知识融入RL训练能够超越专家表现,(c)我们的智能体对训练数据几乎不存在过拟合现象。