Inverse Reinforcement Learning aims to recover reward models from expert demonstrations, but traditional methods yield black-box models that are difficult to interpret and debug. In this work, we introduce GRACE (Generating Rewards As CodE), a method for using Large Language Models within an evolutionary search to reverse-engineer an interpretable, code-based reward function directly from expert trajectories. The resulting reward function is executable code that can be inspected and verified. We empirically validate GRACE on the MuJoCo, BabyAI and AndroidWorld benchmarks, where it efficiently learns highly accurate rewards, even in complex, multi-task settings. Further, we demonstrate that the resulting reward leads to strong policies, compared to both competitive Imitation Learning and online RL approaches with ground-truth rewards. Finally, we show that GRACE is able to build complex reward APIs in multi-task setups.
翻译:逆强化学习旨在从专家演示中恢复奖励模型,但传统方法产生难以解释和调试的黑箱模型。本研究提出GRACE(以代码形式生成奖励),该方法在进化搜索中利用大语言模型,直接从专家轨迹逆向工程出可解释的、基于代码的奖励函数。生成的奖励函数为可执行代码,可供检查与验证。我们在MuJoCo、BabyAI和AndroidWorld基准测试中对GRACE进行了实证验证,结果表明即使在复杂的多任务场景中,该方法也能高效学习高精度奖励函数。进一步实验证明,相较于使用真实奖励的竞争性模仿学习与在线强化学习方法,GRACE生成的奖励能引导出更优策略。最后,我们展示了GRACE在多任务配置中构建复杂奖励API的能力。