We introduce RLang, a domain-specific language (DSL) for communicating domain knowledge to an RL agent. Unlike existing RL DSLs that ground to \textit{single} elements of a decision-making formalism (e.g., the reward function or policy), RLang can specify information about every element of a Markov decision process. We define precise syntax and grounding semantics for RLang, and provide a parser that grounds RLang programs to an algorithm-agnostic \textit{partial} world model and policy that can be exploited by an RL agent. We provide a series of example RLang programs demonstrating how different RL methods can exploit the resulting knowledge, encompassing model-free and model-based tabular algorithms, policy gradient and value-based methods, hierarchical approaches, and deep methods.
翻译:摘要:我们提出RLang,这是一种领域特定语言(DSL),用于向强化学习(RL)智能体传递领域知识。与现有将决策制定形式化(如奖励函数或策略)中的单一元素作为基础RL DSL不同,RLang能够指定马尔可夫决策过程中每个元素的信息。我们为RLang定义了精确的语法和基础语义,并提供了一个解析器,可将RLang程序解析为算法无关的部分世界模型和策略,供RL智能体利用。通过一系列示例RLang程序,我们展示了不同RL方法(涵盖无模型与基于模型的表格算法、策略梯度与基于价值的方法、分层方法及深度方法)如何利用由此产生的知识。