Deep Reinforcement Learning (DRL) has demonstrated strong performance in robotic control but remains susceptible to out-of-distribution (OOD) states, often resulting in unreliable actions and task failure. While previous methods have focused on minimizing or preventing OOD occurrences, they largely neglect recovery once an agent encounters such states. Although the latest research has attempted to address this by guiding agents back to in-distribution states, their reliance on uncertainty estimation hinders scalability in complex environments. To overcome this limitation, we introduce Language Models for Out-of-Distribution Recovery (LaMOuR), which enables recovery learning without relying on uncertainty estimation. LaMOuR generates dense reward codes that guide the agent back to a state where it can successfully perform its original task, leveraging the capabilities of LVLMs in image description, logical reasoning, and code generation. Experimental results show that LaMOuR substantially enhances recovery efficiency across diverse locomotion tasks and even generalizes effectively to complex environments, including humanoid locomotion and mobile manipulation, where existing methods struggle. The code and supplementary materials are available at https://lamour-rl.github.io/.
翻译:深度强化学习(DRL)在机器人控制领域已展现出卓越性能,但对分布外(OOD)状态仍表现出脆弱性,常导致动作不可靠及任务失败。现有方法多聚焦于最小化或预防OOD状态的出现,却普遍忽视了智能体遭遇此类状态后的恢复机制。尽管最新研究尝试通过引导智能体返回分布内状态来解决此问题,但其对不确定性估计的依赖严重制约了在复杂环境中的可扩展性。为突破此局限,我们提出基于语言模型的分布外状态恢复框架(LaMOuR),该框架无需依赖不确定性估计即可实现恢复学习。LaMOuR通过利用大型视觉语言模型在图像描述、逻辑推理和代码生成方面的能力,生成密集奖励代码以引导智能体返回能成功执行原始任务的状态。实验结果表明,LaMOuR在多种运动任务中显著提升了恢复效率,并能有效泛化至人形运动与移动操作等复杂环境——这些场景正是现有方法难以应对的。代码及补充材料详见 https://lamour-rl.github.io/。