Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge. Following developments in generative modelling, model-based RL positions itself as a strong contender. Recent advances in sequence modelling have led to effective transformer-based world models, albeit at the price of heavy computations due to the long sequences of tokens required to accurately simulate environments. In this work, we propose $\Delta$-IRIS, a new agent with a world model architecture composed of a discrete autoencoder that encodes stochastic deltas between time steps and an autoregressive transformer that predicts future deltas by summarizing the current state of the world with continuous tokens. In the Crafter benchmark, $\Delta$-IRIS sets a new state of the art at multiple frame budgets, while being an order of magnitude faster to train than previous attention-based approaches. We release our code and models at https://github.com/vmicheli/delta-iris.
翻译:扩展深度强化学习方法面临重大挑战。随着生成建模领域的发展,基于模型的强化学习方法已成为强有力的竞争者。序列建模的最新进展催生了有效的基于Transformer的世界模型,但代价是由于准确模拟环境需要长序列标记而导致沉重计算负担。本研究提出$\Delta$-IRIS——一种新型智能体,其世界模型架构由离散自编码器与自回归Transformer组成:前者编码时间步之间的随机差分,后者通过连续标记汇总世界当前状态以预测未来差分。在Crafter基准测试中,$\Delta$-IRIS在多种帧数预算下均达到最新最优性能,且训练速度比先前基于注意力机制的方法快一个数量级。代码与模型发布于https://github.com/vmicheli/delta-iris。