The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence. With this ability, humans can efficiently perform real world tasks without considering all possible nuisance factors.How can artificial agents do the same? What kind of information can agents safely discard as noises? In this work, we categorize information out in the wild into four types based on controllability and relation with reward, and formulate useful information as that which is both controllable and reward-relevant. This framework clarifies the kinds information removed by various prior work on representation learning in reinforcement learning (RL), and leads to our proposed approach of learning a Denoised MDP that explicitly factors out certain noise distractors. Extensive experiments on variants of DeepMind Control Suite and RoboDesk demonstrate superior performance of our denoised world model over using raw observations alone, and over prior works, across policy optimization control tasks as well as the non-control task of joint position regression.
翻译:从噪声中分离信号并以干净抽象进行推理的能力,对智能至关重要。凭借这种能力,人类可以高效执行现实任务,而无需考虑所有可能的干扰因素。人工智能体如何做到同样的事情?哪些信息可以被智能体安全地视为噪声?在本工作中,我们根据可控性和与奖励的关系,将现实世界中的信息分为四类,并将有用信息定义为既可控又与奖励相关的信息。这一框架澄清了先前强化学习中各种表示学习方法所移除的信息类型,并引出了我们提出的学习一种去噪马尔可夫决策过程的方法,该过程显式地分解出某些噪声干扰项。在DeepMind Control套件和RoboDesk的多种变体上进行的大量实验表明,我们的去噪世界模型在策略优化控制任务以及非控制任务(如联合位置回归)上,均优于仅使用原始观测的方法以及先前的工作。