Deep reinforcement learning suffers from catastrophic forgetting and sample inefficiency making it less applicable to the ever-changing real world. However, the ability to use previously learned knowledge is essential for AI agents to quickly adapt to novelties. Often, certain spatial information observed by the agent in the previous interactions can be leveraged to infer task-specific rules. Inferred rules can then help the agent to avoid potentially dangerous situations in the previously unseen states and guide the learning process increasing agent's novelty adaptation speed. In this work, we propose a general framework that is applicable to deep reinforcement learning agents. Our framework provides the agent with an autonomous way to discover the task-specific rules in the novel environments and self-supervise it's learning. We provide a rule-driven deep Q-learning agent (RDQ) as one possible implementation of that framework. We show that RDQ successfully extracts task-specific rules as it interacts with the world and uses them to drastically increase its learning efficiency. In our experiments, we show that the RDQ agent is significantly more resilient to the novelties than the baseline agents, and is able to detect and adapt to novel situations faster.
翻译:深度强化学习存在灾难性遗忘和样本效率低下的问题,使其难以适用于不断变化的现实世界。然而,利用先前习得知识的能力对于AI代理快速适应新情境至关重要。通常,代理在先前交互过程中观察到的特定空间信息可用于推断任务特定规则。推断出的规则能够帮助代理在未观测状态中规避潜在危险情境,并引导学习过程以提升代理对新奇事物的适应速度。本研究提出一种适用于深度强化学习代理的通用框架。该框架赋予代理在未知环境中自主发现任务特定规则的能力,并实现自监督学习。我们提供了规则驱动的深度Q学习代理(RDQ)作为该框架的一种具体实现。实验表明,RDQ在与环境交互过程中能成功提取任务特定规则,并利用这些规则显著提升学习效率。我们的实验证明,RDQ代理对新奇情境的鲁棒性显著优于基线代理,且能够更快地检测并适应新情境。