Reinforcement learning(RL) algorithms face the challenge of limited data efficiency, particularly when dealing with high-dimensional state spaces and large-scale problems. Most of RL methods often rely solely on state transition information within the same episode when updating the agent's Critic, which can lead to low data efficiency and sub-optimal training time consumption. Inspired by human-like analogical reasoning abilities, we introduce a novel mesh information propagation mechanism, termed the 'Imagination Mechanism (IM)', designed to significantly enhance the data efficiency of RL algorithms. Specifically, IM enables information generated by a single sample to be effectively broadcasted to different states across episodes, instead of simply transmitting in the same episode. This capability enhances the model's comprehension of state interdependencies and facilitates more efficient learning of limited sample information. To promote versatility, we extend the IM to function as a plug-and-play module that can be seamlessly and fluidly integrated into other widely adopted RL algorithms. Our experiments demonstrate that IM consistently boosts four mainstream SOTA RL algorithms, such as SAC, PPO, DDPG, and DQN, by a considerable margin, ultimately leading to superior performance than before across various tasks. For access to our code and data, please visit https://github.com/OuAzusaKou/imagination_mechanism
翻译:强化学习算法面临数据效率有限的问题,尤其在处理高维状态空间和大规模任务时尤为突出。大多数强化学习方法在更新智能体的Critic时,仅依赖同一回合内的状态转移信息,这导致数据效率低下及训练耗时次优。受人类类比推理能力的启发,我们提出一种名为"想象机制(IM)"的新型网格信息传播机制,旨在显著提升强化学习算法的数据效率。具体而言,IM使单个样本产生的信息能够跨回合有效传播至不同状态,而非仅在相同回合内传递。这种能力增强了模型对状态间依赖关系的理解,并促进对有限样本信息的高效学习。为提升通用性,我们将IM扩展为即插即用模块,可无缝流畅地集成到其他广泛使用的强化学习算法中。实验表明,IM持续显著提升SAC、PPO、DDPG及DQN等四种主流顶尖强化学习算法的性能,最终在多种任务中取得优于以往的表现。代码与数据访问请见:https://github.com/OuAzusaKou/imagination_mechanism