Reinforcement Learning (RL) has been widely used to solve tasks where the environment consistently provides a dense reward value. However, in real-world scenarios, rewards can often be poorly defined or sparse. Auxiliary signals are indispensable for discovering efficient exploration strategies and aiding the learning process. In this work, inspired by intrinsic motivation theory, we postulate that the intrinsic stimuli of novelty and surprise can assist in improving exploration in complex, sparsely rewarded environments. We introduce a novel sample-efficient method able to learn directly from pixels, an image-based extension of TD3 with an autoencoder called \textit{NaSA-TD3}. The experiments demonstrate that NaSA-TD3 is easy to train and an efficient method for tackling complex continuous-control robotic tasks, both in simulated environments and real-world settings. NaSA-TD3 outperforms existing state-of-the-art RL image-based methods in terms of final performance without requiring pre-trained models or human demonstrations.
翻译:强化学习(RL)已被广泛用于解决环境持续提供密集奖励值的任务。然而,在现实场景中,奖励往往定义不清或稀疏。辅助信号对于发现高效的探索策略和辅助学习过程是不可或缺的。在本工作中,受内在动机理论启发,我们假设新奇性和意外性这两种内在激励,能够协助改进在复杂、稀疏奖励环境中的探索。我们提出了一种新颖的、能够直接从像素学习的高样本效率方法,即一种基于图像的TD3扩展方法,该方法结合了一个名为 \textit{NaSA-TD3} 的自编码器。实验表明,无论是在模拟环境还是真实世界场景中,NaSA-TD3 都易于训练,并且是解决复杂连续控制机器人任务的有效方法。在最终性能方面,NaSA-TD3 优于现有最先进的基于图像的RL方法,且无需预训练模型或人类演示。