Learning control from pixels is difficult for reinforcement learning (RL) agents because representation learning and policy learning are intertwined. Previous approaches remedy this issue with auxiliary representation learning tasks, but they either do not consider the temporal aspect of the problem or only consider single-step transitions, which may cause learning inefficiencies if important environmental changes take many steps to manifest. We propose Hierarchical $k$-Step Latent (HKSL), an auxiliary task that learns multiple representations via a hierarchy of forward models that learn to communicate and an ensemble of $n$-step critics that all operate at varying magnitudes of step skipping. We evaluate HKSL in a suite of 30 robotic control tasks with and without distractors and a task of our creation. We find that HKSL either converges to higher or optimal episodic returns more quickly than several alternative representation learning approaches. Furthermore, we find that HKSL's representations capture task-relevant details accurately across timescales (even in the presence of distractors) and that communication channels between hierarchy levels organize information based on both sides of the communication process, both of which improve sample efficiency.
翻译:从像素中学习控制对强化学习代理而言十分困难,因为表征学习与策略学习相互交织。现有方法通过辅助表征学习任务缓解此问题,但要么未考虑时间维度,要么仅处理单步转移——若重要环境变化需多步才能显现,将导致学习效率低下。我们提出层次化k步隐变量(HKSL)这一辅助任务:通过可通信的层次化前向模型家族学习多尺度表征,并集成多个步长跳跃幅度各异的n步评价器。在包含干扰物与不包含干扰物的30个机器人控制任务及自主设计任务中,HKSL在多类替代表征学习方法中展现出更快收敛至更高或最优回合回报的特性。实验表明,HKSL表征能跨时间尺度精准捕获任务相关细节(即便存在干扰物),且层级间通信通道基于通信双方信息组织内容——两者共同提升了样本效率。