While reinforcement learning has achieved remarkable successes in several domains, its real-world application is limited due to many methods failing to generalise to unfamiliar conditions. In this work, we consider the problem of generalising to new transition dynamics, corresponding to cases in which the environment's response to the agent's actions differs. For example, the gravitational force exerted on a robot depends on its mass and changes the robot's mobility. Consequently, in such cases, it is necessary to condition an agent's actions on extrinsic state information and pertinent contextual information reflecting how the environment responds. While the need for context-sensitive policies has been established, the manner in which context is incorporated architecturally has received less attention. Thus, in this work, we present an investigation into how context information should be incorporated into behaviour learning to improve generalisation. To this end, we introduce a neural network architecture, the Decision Adapter, which generates the weights of an adapter module and conditions the behaviour of an agent on the context information. We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance compared to previous approaches in several environments. Beyond this, the Decision Adapter is more robust to irrelevant distractor variables than several alternative methods.
翻译:尽管强化学习在多个领域取得了显著成功,但由于许多方法无法泛化到陌生条件,其实际应用仍受到限制。本文针对新转移动态的泛化问题展开研究,这类问题对应环境对智能体行为的响应发生改变的情况。例如,机器人所受重力取决于其质量,并会改变其移动能力。因此,在这类情况下,必须根据外部状态信息及反映环境响应方式的相关上下文信息来调节智能体的动作。尽管上下文敏感策略的必要性已得到证实,但上下文在架构中的具体整合方式却较少受到关注。为此,本文研究了如何将上下文信息融入行为学习以提升泛化能力。我们提出了一种神经网络架构——决策适配器(Decision Adapter),该架构生成适配器模块的权重,并根据上下文信息调节智能体的行为。实验表明,决策适配器是先前提出架构的有效泛化形式,并在多个环境中展现出优于先前方法的泛化性能。此外,与几种替代方法相比,决策适配器对无关干扰变量具有更强的鲁棒性。