The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could advance Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlinear dynamics of the environment can be linearized in a high-dimensional latent space. This allows us to efficiently parallelize the sequential problem of long-range prediction using convolution, while accounting for the agent's action at every time step. Our approach also enables stability analysis and better control over gradients through time. Taken together, these advantages result in significant improvement over the existing approaches, both in the efficiency and the accuracy of modeling dynamics over extended horizons. We also report promising experimental results in dynamics modeling for the scenarios of both model-based planning and model-free RL.
翻译:交互式环境中动力学的精确建模对于成功的长程预测至关重要。这一能力可以推动强化学习和规划算法的发展,但实现起来极具挑战性。模型估计中的误差会不断累积,导致长时域上的预测偏差持续增大。我们从库普曼理论视角切入该问题——该理论通过在高维潜在空间中对环境非线性动力学进行线性化,使得我们能够利用卷积高效并行化长程预测的顺序问题,同时考虑智能体每一步的动作。该方法还能实现稳定性分析,并更好地控制随时间传播的梯度。综合这些优势,无论是在扩展时域上建模的效率还是精度方面,我们的方法均显著超越现有技术。我们还报告了在基于模型的规划与无模型强化学习两种场景下动力学建模的令人鼓舞的实验结果。