Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures. To address this limitation, we propose a novel approach called RE-MOVE (\textbf{RE}quest help and \textbf{MOVE} on), which uses language-based feedback to adjust trained policies to real-time changes in the environment. In this work, we enable the trained policy to decide \emph{when to ask for feedback} and \emph{how to incorporate feedback into trained policies}. RE-MOVE incorporates epistemic uncertainty to determine the optimal time to request feedback from humans and uses language-based feedback for real-time adaptation. We perform extensive synthetic and real-world evaluations to demonstrate the benefits of our proposed approach in several test-time dynamic navigation scenarios. Our approach enable robots to learn from human feedback and adapt to previously unseen adversarial situations.
翻译:基于强化学习的连续控制机器人导航任务策略在实时部署过程中往往难以适应环境变化,可能导致灾难性故障。为解决这一局限,我们提出名为RE-MOVE(请求帮助并继续移动)的创新方法,该方法利用语言反馈调整已训练策略以适应环境的实时变化。本研究使已训练策略能够自主决定"何时请求反馈"以及"如何将反馈融入已有策略"。RE-MOVE通过认知不确定性确定向人类请求反馈的最佳时机,并利用语言反馈实现实时自适应。我们开展了大量合成环境与真实场景评估,验证了该方法在多种测试时动态导航场景中的优势。该方案使机器人能够从人类反馈中学习,并适应先前未遇见的对抗性情境。