Humans can leverage both symbolic reasoning and intuitive reactions. In contrast, reinforcement learning policies are typically encoded in either opaque systems like neural networks or symbolic systems that rely on predefined symbols and rules. This disjointed approach severely limits the agents' capabilities, as they often lack either the flexible low-level reaction characteristic of neural agents or the interpretable reasoning of symbolic agents. To overcome this challenge, we introduce BlendRL, a neuro-symbolic RL framework that harmoniously integrates both paradigms within RL agents that use mixtures of both logic and neural policies. We empirically demonstrate that BlendRL agents outperform both neural and symbolic baselines in standard Atari environments, and showcase their robustness to environmental changes. Additionally, we analyze the interaction between neural and symbolic policies, illustrating how their hybrid use helps agents overcome each other's limitations.
翻译:人类能够同时利用符号推理与直觉反应。相比之下,强化学习策略通常仅编码于神经网络等不透明系统,或依赖于预定义符号与规则的符号系统中。这种割裂的方法严重限制了智能体的能力,因为它们往往既缺乏神经智能体灵活的底层反应特性,又缺少符号智能体可解释的推理能力。为克服这一挑战,我们提出了BlendRL,一个神经符号强化学习框架,该框架在同时使用逻辑策略与神经策略混合的强化学习智能体中和谐地整合了两种范式。我们通过实验证明,在标准Atari环境中,BlendRL智能体的表现优于纯神经与纯符号基线方法,并展示了其对环境变化的鲁棒性。此外,我们分析了神经策略与符号策略之间的相互作用,阐释了二者的混合使用如何帮助智能体克服彼此的限制。