The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could advance Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlinear dynamics of the environment can be linearized in a high-dimensional latent space. This allows us to efficiently parallelize the sequential problem of long-range prediction using convolution while accounting for the agent's action at every time step. Our approach also enables stability analysis and better control over gradients through time. Taken together, these advantages result in significant improvement over the existing approaches, both in the efficiency and the accuracy of modeling dynamics over extended horizons. We also show that this model can be easily incorporated into dynamics modeling for model-based planning and model-free RL and report promising experimental results.
翻译:交互环境中动力学模型的准确构建对于成功的长程预测至关重要。这种能力能够推动强化学习和规划算法的发展,但实现这一目标极具挑战性。模型估计中的误差会不断累积,导致长时域内的预测偏差持续增大。我们从库普曼理论视角解决该问题,通过在高维隐空间中实现非线性动力学的线性化处理。这使得我们能够利用卷积运算高效并行化长程预测的序列问题,同时考虑智能体每个时间步的动作。本方法还实现了稳定性分析,并增强了对时间梯度的控制能力。综合这些优势,本方法在扩展时域内的动力学建模效率与准确性两方面,均较现有方法取得显著提升。我们还证明了该模型可轻松集成到基于模型的规划和无模型强化学习的动力学建模中,并报告了令人鼓舞的实验结果。