In this work, we investigate Active Vision Reinforcement Learning (ActiveVision-RL), where an embodied agent simultaneously learns action policy for the task while also controlling its visual observations in partially observable environments. We denote the former as motor policy and the latter as sensory policy. For example, humans solve real world tasks by hand manipulation (motor policy) together with eye movements (sensory policy). ActiveVision-RL poses challenges on coordinating two policies given their mutual influence. We propose SUGARL, Sensorimotor Understanding Guided Active Reinforcement Learning, a framework that models motor and sensory policies separately, but jointly learns them using with an intrinsic sensorimotor reward. This learnable reward is assigned by sensorimotor reward module, incentivizes the sensory policy to select observations that are optimal to infer its own motor action, inspired by the sensorimotor stage of humans. Through a series of experiments, we show the effectiveness of our method across a range of observability conditions and its adaptability to existed RL algorithms. The sensory policies learned through our method are observed to exhibit effective active vision strategies.
翻译:本文探索了主动视觉强化学习(ActiveVision-RL),其中具身智能体在部分可观测环境中同时学习任务的动作策略并控制其视觉观测。我们将前者称为运动策略,后者称为感知策略。例如,人类通过手部操作(运动策略)与眼球运动(感知策略)协同完成现实世界任务。主动视觉强化学习因两种策略的相互影响,对其协调提出了挑战。我们提出SUGARL(感知运动理解引导的主动强化学习)框架,该框架将运动策略与感知策略分开建模,但通过内在感知运动奖励进行联合学习。这种可学习奖励由感知运动奖励模块生成,受人类感知运动阶段启发,激励感知策略选择最能推断自身运动动作的观测。通过系列实验,我们证明了该方法在多种可观测性条件下的有效性及其对现有强化学习算法的适应性。实验观察到,通过本方法习得的感知策略展现了高效的主动视觉机制。