Robust robot planning in dynamic, human-centric environments remains challenging due to multimodal uncertainty, the need for real-time adaptation, and safety requirements. Optimization-based planners enable explicit constraint handling but can be sensitive to initialization and struggle in dynamic settings. Learning-based planners capture multimodal solution spaces more naturally, but often lack reliable constraint satisfaction. In this paper, we introduce a unified generation-refinement framework that combines reward-guided conditional flow matching (CFM) with model predictive path integral (MPPI) control. Our key idea is a bidirectional information exchange between generation and optimization: reward-guided CFM produces diverse, informed trajectory priors for MPPI refinement, while the optimized MPPI trajectory warm-starts the next CFM generation step. Using autonomous social navigation as a motivating application, we demonstrate that the proposed approach improves the trade-off between safety, task performance, and computation time, while adapting to dynamic environments in real-time. The source code is publicly available at https://cfm-mppi.github.io.
翻译:在动态、人类密集的环境中实现鲁棒机器人规划仍面临多模态不确定性、实时适应性需求以及安全约束等挑战。基于优化的规划器能够显式处理约束,但易受初始条件影响且在动态场景中表现不稳定;而基于学习的规划器虽更自然地捕捉多模态解空间,却往往缺乏可靠的约束满足能力。本文提出一种统一的生成-精化框架,将奖励引导的条件流匹配(CFM)与模型预测路径积分控制(MPPI)相结合。其核心思想是构建生成与优化之间的双向信息交换机制:奖励引导的CFM为MPPI精化步骤生成多样化的、信息丰富的轨迹先验,同时优化后的MPPI轨迹为下一轮CFM生成步骤提供暖启动。以自主社交导航为典型应用场景,实验证明该方法在安全性能、任务效率与计算时间之间实现了更优的权衡,并具备实时适应动态环境的能力。源代码已开源至 https://cfm-mppi.github.io。