In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.
翻译:摘要:在强化学习领域,表示学习已被证明是处理复杂图像任务的有效工具,但在物理控制问题等低层状态环境中常被忽视。本文提出SALE——一种学习状态与动作间细微交互建模嵌入表示的新方法,实现了从低层状态进行有效表示学习。我们系统研究了这些嵌入表示的设计空间,并重点阐述了关键设计考量。通过将SALE与强化学习中的检查点自适应机制集成到TD3算法中,我们构建了TD7算法,其在现有连续控制算法中表现显著优异。在OpenAI gym基准测试中,TD7在30万时间步和500万时间步时分别较TD3获得平均276.7%和50.7%的性能提升,且适用于在线及离线两种设定。