Advancements in reinforcement learning have led to the development of sophisticated models capable of learning complex decision-making tasks. However, efficiently integrating world models with decision transformers remains a challenge. In this paper, we introduce a novel approach that combines the Dreamer algorithm's ability to generate anticipatory trajectories with the adaptive learning strengths of the Online Decision Transformer. Our methodology enables parallel training where Dreamer-produced trajectories enhance the contextual decision-making of the transformer, creating a bidirectional enhancement loop. We empirically demonstrate the efficacy of our approach on a suite of challenging benchmarks, achieving notable improvements in sample efficiency and reward maximization over existing methods. Our results indicate that the proposed integrated framework not only accelerates learning but also showcases robustness in diverse and dynamic scenarios, marking a significant step forward in model-based reinforcement learning.
翻译:强化学习的进展催生了能够学习复杂决策任务的先进模型。然而,如何高效地将世界模型与决策Transformer相结合仍是一个挑战。本文提出了一种新颖方法,将Dreamer算法生成预测性轨迹的能力与在线决策Transformer的自适应学习优势相结合。我们的方法实现了并行训练,其中Dreamer生成的轨迹增强了Transformer的上下文决策能力,形成双向增强循环。我们通过一系列具有挑战性的基准测试实证验证了该方法的有效性,在样本效率和奖励最大化方面较现有方法取得了显著提升。结果表明,所提出的集成框架不仅加速了学习过程,还在多样化和动态场景中展现出鲁棒性,标志着基于模型的强化学习向前迈出了重要一步。