Reinforcement learning (RL) algorithms are highly sensitive to reward function specification, which remains a central challenge limiting their broad applicability. We present ARM-FM: Automated Reward Machines via Foundation Models, a framework for automated, compositional reward design in RL that leverages the high-level reasoning capabilities of foundation models (FMs). Reward machines (RMs) -- an automata-based formalism for reward specification -- are used as the mechanism for RL objective specification, and are automatically constructed via the use of FMs. The structured formalism of RMs yields effective task decompositions, while the use of FMs enables objective specifications in natural language. Concretely, we (i) use FMs to automatically generate RMs from natural language specifications; (ii) associate language embeddings with each RM automata-state to enable generalization across tasks; and (iii) provide empirical evidence of ARM-FM's effectiveness in a diverse suite of challenging environments, including evidence of zero-shot generalization.
翻译:强化学习(RL)算法对奖励函数的设定高度敏感,这仍然是限制其广泛应用的核心挑战。我们提出了ARM-FM:基于基础模型的自动化奖励机,这是一个利用基础模型(FMs)的高层推理能力,在RL中进行自动化、组合式奖励设计的框架。奖励机(RMs)——一种基于自动机的奖励设定形式化方法——被用作RL目标设定的机制,并通过使用FMs自动构建。RMs的结构化形式化方法产生了有效的任务分解,而FMs的使用则允许以自然语言进行目标设定。具体而言,我们(i)使用FMs从自然语言描述自动生成RMs;(ii)将语言嵌入与每个RM自动机状态相关联,以实现跨任务的泛化;以及(iii)在多样化的挑战性环境套件中提供了ARM-FM有效性的实证证据,包括零样本泛化的证据。