Reinforcement learning-based policies for continuous control robotic navigation tasks often fail to adapt to changes in the environment during real-time deployment, which may result in catastrophic failures. To address this limitation, we propose a novel approach called RE-MOVE (REquest help and MOVE on) to adapt already trained policy to real-time changes in the environment without re-training via utilizing a language-based feedback. The proposed approach essentially boils down to addressing two main challenges of (1) when to ask for feedback and, if received, (2) how to incorporate feedback into trained policies. RE-MOVE incorporates an epistemic uncertainty-based framework to determine the optimal time to request instructions-based feedback. For the second challenge, we employ a zero-shot learning natural language processing (NLP) paradigm with efficient, prompt design and leverage state-of-the-art GPT-3.5, Llama-2 language models. To show the efficacy of the proposed approach, we performed extensive synthetic and real-world evaluations in several test-time dynamic navigation scenarios. Utilizing RE-MOVE result in up to 80% enhancement in the attainment of successful goals, coupled with a reduction of 13.50% in the normalized trajectory length, as compared to alternative approaches, particularly in demanding real-world environments with perceptual challenges.
翻译:基于强化学习的连续控制机器人导航任务策略在实时部署中往往难以适应环境变化,可能导致灾难性故障。为克服这一局限,我们提出一种名为RE-MOVE(请求协助并继续移动)的新方法,通过利用基于语言的反馈来调整已训练策略以适应实时环境变化,而无需重新训练。该方法本质上需解决两大关键挑战:(1)何时请求反馈;(2)若收到反馈,如何将其融入已训练策略。RE-MOVE采用基于认知不确定性的框架来确定请求指令反馈的最佳时机。针对第二个挑战,我们利用零样本学习自然语言处理(NLP)范式,结合高效的提示设计,并借助最先进的GPT-3.5、Llama-2语言模型。为验证所提方法的有效性,我们在多个测试时动态导航场景中进行了广泛的合成环境与真实世界评估。与替代方法相比,采用RE-MOVE可提升高达80%的目标达成成功率,并降低13.50%的归一化轨迹长度,尤其在存在感知挑战的严苛真实环境中表现显著。