This paper explores the use of model-based offline reinforcement learning with long model rollouts. While some literature criticizes this approach due to compounding errors, many practitioners have found success in real-world applications. The paper aims to demonstrate that long rollouts do not necessarily result in exponentially growing errors and can actually produce better Q-value estimates than model-free methods. These findings can potentially enhance reinforcement learning techniques.
翻译:本文探讨了在基于模型的离线强化学习中使用长模型展开的方法。尽管部分文献因误差累积问题对此方法提出批评,但许多实践者已在现实应用中取得了成功。本文旨在证明长展开不必然导致误差呈指数级增长,实际上可能产生比无模型方法更优的Q值估计。这些发现有望推动强化学习技术的进一步发展。