Learning-based control methods typically assume stationary system dynamics, an assumption often violated in real-world systems due to drift, wear, or changing operating conditions. We study reinforcement learning for control under time-varying dynamics. We consider a continual model-based reinforcement learning setting in which an agent repeatedly learns and controls a dynamical system whose transition dynamics evolve across episodes. We analyze the problem using Gaussian process dynamics models under frequentist variation-budget assumptions. Our analysis shows that persistent non-stationarity requires explicitly limiting the influence of outdated data to maintain calibrated uncertainty and meaningful dynamic regret guarantees. Motivated by these insights, we propose a practical optimistic model-based reinforcement learning algorithm with adaptive data buffer mechanisms and demonstrate improved performance on continuous control benchmarks with non-stationary dynamics.
翻译:基于学习的控制方法通常假设系统动力学是平稳的,然而由于漂移、磨损或运行条件变化,这一假设在实际系统中常被违背。我们研究时变动力学下的控制强化学习问题,考虑一个持续性的基于模型强化学习场景:智能体反复学习并控制一个跨回合(episode)转移动力学演变的动力系统。在频率派变分预算假设下,我们采用高斯过程动力学模型对该问题进行分析。分析表明,持续的非平稳性要求明确限制过时数据的影响,以保持校准后的不确定性及有意义的动态遗憾保证。受此启发,我们提出一种实用的乐观型基于模型强化学习算法,该算法配备自适应数据缓冲机制,并在具有非平稳动力学的连续控制基准任务中展现出更优性能。