In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.
翻译:在强化学习(RL)领域,表示学习已被证明是处理复杂图像任务的有效工具,但在面对物理控制问题等低层级状态环境时却常被忽视。本文提出SALE方法,这是一种学习嵌入表示的新方法,旨在建模状态与动作之间的微妙交互,从而基于低层级状态实现有效的表示学习。我们深入研究了这些嵌入表示的设计空间,并强调了关键的设计考量。通过将SALE与针对RL的检查点自适应方法集成到TD3算法中,我们形成了TD7算法,该算法在现有连续控制算法中表现出显著优势。在OpenAI gym基准测试任务中,TD7在300k时间步和5M时间步下分别比TD3平均性能提升276.7%和50.7%,且同时适用于在线和离线两种设置。