The visual world provides an abundance of information, but many input pixels received by agents often contain distracting stimuli. Autonomous agents need the ability to distinguish useful information from task-irrelevant perceptions, enabling them to generalize to unseen environments with new distractions. Existing works approach this problem using data augmentation or large auxiliary networks with additional loss functions. We introduce MaDi, a novel algorithm that learns to mask distractions by the reward signal only. In MaDi, the conventional actor-critic structure of deep reinforcement learning agents is complemented by a small third sibling, the Masker. This lightweight neural network generates a mask to determine what the actor and critic will receive, such that they can focus on learning the task. The masks are created dynamically, depending on the current input. We run experiments on the DeepMind Control Generalization Benchmark, the Distracting Control Suite, and a real UR5 Robotic Arm. Our algorithm improves the agent's focus with useful masks, while its efficient Masker network only adds 0.2% more parameters to the original structure, in contrast to previous work. MaDi consistently achieves generalization results better than or competitive to state-of-the-art methods.
翻译:摘要:视觉世界提供了丰富的信息,但智能体接收到的许多输入像素常常包含干扰刺激。自主智能体需要具备区分有用信息与任务无关感知的能力,从而能够泛化到具有新干扰的未知环境中。现有方法通常采用数据增强或带有额外损失函数的大型辅助网络来处理这一问题。我们提出了MaDi,一种仅通过奖励信号学习屏蔽干扰的新算法。在MaDi中,深度强化学习智能体传统的演员-评论家结构由第三个轻量级网络——掩码器(Masker)作为补充。这一轻量级神经网络动态生成掩码,决定演员和评论家接收哪些信息,从而使它们能够专注于学习任务。掩码根据当前输入动态生成。我们在DeepMind控制泛化基准、干扰控制套件以及真实UR5机械臂上进行了实验。我们的算法通过有用的掩码提升了智能体的注意力,同时其高效的掩码器网络仅向原始结构增加了0.2%的参数,这与先前工作形成对比。MaDi在泛化结果上始终优于或与最先进方法相当。