Real-world reinforcement learning (RL) environments, whether in robotics or industrial settings, often involve non-visual observations and require not only efficient but also reliable and thus interpretable and flexible RL approaches. To improve efficiency, agents that perform state representation learning with auxiliary tasks have been widely studied in visual observation contexts. However, for real-world problems, dedicated representation learning modules that are decoupled from RL agents are more suited to meet requirements. This study compares common auxiliary tasks based on, to the best of our knowledge, the only decoupled representation learning method for low-dimensional non-visual observations. We evaluate potential improvements in sample efficiency and returns for environments ranging from a simple pendulum to a complex simulated robotics task. Our findings show that representation learning with auxiliary tasks only provides performance gains in sufficiently complex environments and that learning environment dynamics is preferable to predicting rewards. These insights can inform future development of interpretable representation learning approaches for non-visual observations and advance the use of RL solutions in real-world scenarios.
翻译:现实中的强化学习环境,无论是机器人领域还是工业场景,通常涉及非视觉观测,不仅需要高效的强化学习方法,还需要可靠、可解释且灵活的解决方案。为提升效率,在视觉观测情境下,结合辅助任务进行状态表征学习的智能体已得到广泛研究。然而,对于实际问题,与强化学习智能体解耦的专用表征学习模块更能满足需求。本研究基于目前已知的唯一适用于低维非视觉观测的解耦表征学习方法,比较了常见的辅助任务。我们评估了从简单摆系统到复杂模拟机器人任务等环境中,样本效率与回报的潜在提升效果。研究结果表明,仅当环境足够复杂时,结合辅助任务的表征学习才能带来性能提升,且学习环境动力学优于预测奖励。这些发现可为未来面向非视觉观测的可解释表征学习方法开发提供指导,并推动强化学习解决方案在现实场景中的应用。