In this work, we investigate Active Reinforcement Learning (Active-RL), where an embodied agent simultaneously learns action policy for the task while also controlling its visual observations in partially observable environments. We denote the former as motor policy and the latter as sensory policy. For example, humans solve real world tasks by hand manipulation (motor policy) together with eye movements (sensory policy). Active-RL poses challenges on coordinating two policies given their mutual influence. We propose SUGARL, Sensorimotor Understanding Guided Active Reinforcement Learning, a framework that models motor and sensory policies separately, but jointly learns them using with an intrinsic sensorimotor reward. This learnable reward is assigned by sensorimotor reward module, incentivizes the sensory policy to select observations that are optimal to infer its own motor action, inspired by the sensorimotor stage of humans. Through a series of experiments, we show the effectiveness of our method across a range of observability conditions and its adaptability to existed RL algorithms. The sensory policies learned through our method are observed to exhibit effective active vision strategies.
翻译:在本工作中,我们研究了主动强化学习(Active-RL),即具身智能体在部分可观测环境中同时学习任务的动作策略并控制其视觉观测。我们将前者称为运动策略,后者称为感知策略。例如,人类通过手部操作(运动策略)与眼球运动(感知策略)共同解决现实世界任务。由于两种策略之间存在相互影响,主动强化学习对它们的协调提出了挑战。我们提出了SUGARL(传感器运动理解引导的主动强化学习)框架,该框架分别建模运动策略与感知策略,但通过内在的传感器运动奖励对其进行联合学习。该可学习奖励由传感器运动奖励模块分配,受人类传感器运动阶段启发,激励感知策略选择最有利于推断自身运动动作的观测。通过一系列实验,我们展示了该方法在不同可观测性条件下的有效性及其对现有强化学习算法的适应性。实验观察到,通过学习获得的感知策略展现出有效的主动视觉策略。