Deployment of reinforcement learning algorithms for robotics applications in the real world requires ensuring the safety of the robot and its environment. Safe robot reinforcement learning (SRRL) is a crucial step towards achieving human-robot coexistence. In this paper, we envision a human-centered SRRL framework consisting of three stages: safe exploration, safety value alignment, and safe collaboration. We examine the research gaps in these areas and propose to leverage interactive behaviors for SRRL. Interactive behaviors enable bi-directional information transfer between humans and robots, such as conversational robot ChatGPT. We argue that interactive behaviors need further attention from the SRRL community. We discuss four open challenges related to the robustness, efficiency, transparency, and adaptability of SRRL with interactive behaviors.
翻译:将强化学习算法部署于现实世界的机器人应用中,需要确保机器人及其环境的安全性。安全机器人强化学习是实现人机共存的关键步骤。本文提出了一种以人为中心的安全机器人强化学习框架,该框架包含三个阶段:安全探索、安全价值对齐与安全协作。我们分析了这些领域的研究空白,并提出利用交互行为来促进安全机器人强化学习。交互行为能够实现人与机器人之间的双向信息传递,例如具备对话能力的机器人ChatGPT。我们认为,安全机器人强化学习领域需要进一步关注交互行为。最后,我们讨论了交互式安全机器人强化学习在鲁棒性、效率、透明度和适应性四个方面所面临的开放性挑战。