Despite the significant advances in Deep Reinforcement Learning (RL) observed in the last decade, the amount of training experience necessary to learn effective policies remains one of the primary concerns in both simulated and real environments. Looking to solve this issue, previous work has shown that improved efficiency can be achieved by separately modeling the agent and environment, but usually requires a supervisory signal. In contrast to RL, humans can perfect a new skill from a small number of trials and often do so without a supervisory signal, making neuroscientific studies of human development a valuable source of inspiration for RL. In particular, we explore the idea of motor prediction, which states that humans develop an internal model of themselves and of the consequences that their motor commands have on the immediate sensory inputs. Our insight is that the movementofthe agent provides a cue that allows the duality between the agent and environment to be learned. To instantiate this idea, we present Ego-Foresight (EF), a self-supervised method for disentangling agent information based on motion and prediction. Our main finding is that, when used as an auxiliary task in feature learning, self-supervised agent awareness improves the sample-efficiency and performance of the underlying RL algorithm. To test our approach, we study the ability of EF to predict agent movement and disentangle agent information. Then, we integrate EF with model-free and model based RL algorithms to solve simulated control tasks, showing improved sample-efficiency and performance.
翻译:尽管深度强化学习在过去十年取得了显著进展,但学习有效策略所需的训练经验量仍是模拟和真实环境中的主要挑战之一。为解决这一问题,已有研究表明,通过分别建模智能体和环境可以提升效率,但这类方法通常需要监督信号。与强化学习不同,人类能够通过少量尝试完美掌握新技能,且往往无需监督信号,这使得人类发展的神经科学研究成为强化学习的宝贵灵感来源。具体而言,我们探索了"运动预测"这一概念——人类会构建关于自身及其运动指令对即时感官输入影响的内部模型。我们的关键洞察在于,智能体的运动提供了可学习其与环境二元性的线索。为实现这一思想,我们提出Ego-Foresight (EF),一种基于运动与预测的自监督式智能体信息解耦方法。主要发现是,当作为特征学习中的辅助任务时,自监督的智能体感知能力能够提升底层强化学习算法的样本效率与性能。为验证该方法,我们研究了EF预测智能体运动及解耦智能体信息的能力,并将其与无模型和基于模型的强化学习算法集成,用于解决模拟控制任务,实验表明该方法能显著提升样本效率与性能。