By framing reinforcement learning as a sequence modeling problem, recent work has enabled the use of generative models, such as diffusion models, for planning. While these models are effective in predicting long-horizon state trajectories in deterministic environments, they face challenges in dynamic settings with moving obstacles. Effective collision avoidance demands continuous monitoring and adaptive decision-making. While replanning at every timestep could ensure safety, it introduces substantial computational overhead due to the repetitive prediction of overlapping state sequences -- a process that is particularly costly with diffusion models, known for their intensive iterative sampling procedure. We propose an adaptive generative planning approach that dynamically adjusts replanning frequency based on the uncertainty of action predictions. Our method minimizes the need for frequent, computationally expensive, and redundant replanning while maintaining robust collision avoidance performance. In experiments, we obtain a 13.5% increase in the mean trajectory length and a 12.7% increase in mean reward over long-horizon planning, indicating a reduction in collision rates and an improved ability to navigate the environment safely.
翻译:通过将强化学习构建为序列建模问题,近期研究使得生成模型(如扩散模型)能够用于规划任务。尽管这些模型在确定性环境中预测长时程状态轨迹方面表现优异,但在存在移动障碍物的动态场景中面临挑战。有效的碰撞规避需要持续的环境监测与自适应决策。虽然每个时间步都重新规划可以确保安全性,但由于需要重复预测重叠的状态序列,这会带来巨大的计算开销——对于以密集迭代采样过程著称的扩散模型而言,这一过程尤其昂贵。我们提出一种自适应生成式规划方法,该方法根据动作预测的不确定性动态调整重新规划频率。我们的方法在保持鲁棒碰撞规避性能的同时,最大限度地减少了频繁、计算成本高昂且冗余的重新规划需求。实验结果表明,在长时程规划任务中,我们实现了平均轨迹长度13.5%的提升和平均奖励12.7%的提升,这表明碰撞率有所降低,且安全导航环境的能力得到增强。