The low-level sensory and motor signals in deep reinforcement learning, which exist in high-dimensional spaces such as image observations or motor torques, are inherently challenging to understand or utilize directly for downstream tasks. While sensory representations have been extensively studied, the representations of motor actions are still an area of active exploration. Our work reveals that a space containing meaningful action representations emerges when a multi-task policy network takes as inputs both states and task embeddings. Moderate constraints are added to improve its representation ability. Therefore, interpolated or composed embeddings can function as a high-level interface within this space, providing instructions to the agent for executing meaningful action sequences. Empirical results demonstrate that the proposed action representations are effective for intra-action interpolation and inter-action composition with limited or no additional learning. Furthermore, our approach exhibits superior task adaptation ability compared to strong baselines in Mujoco locomotion tasks. Our work sheds light on the promising direction of learning action representations for efficient, adaptable, and composable RL, forming the basis of abstract action planning and the understanding of motor signal space. Project page: https://sites.google.com/view/emergent-action-representation/
翻译:深度强化学习中的低层次感知与运动信号存在于高维空间(如图像观测或电机扭矩),天然难以理解或直接用于下游任务。尽管感知表征已被广泛研究,但运动动作的表征仍处于活跃探索阶段。我们的工作揭示:当多任务策略网络同时以状态和任务嵌入作为输入时,会涌现出包含有意义行为表征的空间。通过施加适度约束以增强其表征能力,在该空间中进行插值或组合得到的嵌入可作为高层接口,为智能体执行有意义动作序列提供指令。实验结果表明,所提出的行为表征在动作内插值与动作间组合任务中,在无需或仅需少量额外学习的情况下均表现出有效性。此外,在Mujoco运动控制任务中,我们的方法相较于强基线方法展现出更优的任务适应能力。本研究揭示了学习高效、可适配、可组合强化学习行为表征这一具有前景的方向,为抽象动作规划与运动信号空间理解奠定了基础。项目页面:https://sites.google.com/view/emergent-action-representation/