Inverse Reinforcement Learning aims to recover reward models from expert demonstrations, but traditional methods yield "black-box" models that are difficult to interpret and debug. In this work, we introduce GRACE (Generating Rewards As CodE), a method for using Large Language Models within an evolutionary search to reverse-engineer an interpretable, code-based reward function directly from expert trajectories. The resulting reward function is executable code that can be inspected and verified. We empirically validate GRACE on the BabyAI and AndroidWorld benchmarks, where it efficiently learns highly accurate rewards, even in complex, multi-task settings. Further, we demonstrate that the resulting reward leads to strong policies, compared to both competitive Imitation Learning and online RL approaches with ground-truth rewards. Finally, we show that GRACE is able to build complex reward APIs in multi-task setups.
翻译:逆向强化学习旨在从专家演示中恢复奖励模型,但传统方法会产生难以解释和调试的“黑盒”模型。本文提出GRACE(Generating Rewards As CodE),该方法在进化搜索中使用大型语言模型,直接从专家轨迹中逆向工程出可解释的、基于代码的奖励函数。所生成的奖励函数是可执行代码,可供检查与验证。我们在BabyAI和AndroidWorld基准测试中实证验证了GRACE的有效性,结果表明即使在复杂的多任务场景中,该方法也能高效学习高精度奖励函数。此外,与使用真实奖励的竞争性模仿学习及在线强化学习方法相比,GRACE生成的奖励函数能引导出更优的策略。最后,我们证明GRACE能够在多任务配置中构建复杂的奖励API。