Flow Matching (FM) is a powerful approach for behavior cloning in multimodal action spaces [Jiang et al., 2025], but because it is not trained to directly maximize expected return, there is still room to improve how FM policies act at test time. This work investigates whether a learned world model can improve FM policies by enabling Model Predictive Path Integral (MPPI) planning over candidate action sequences proposed by the policy. Building on TD-MPC2 [Hansen et al., 2024], I introduce FlowMPC, a framework that combines an imitation-learned FM policy with a learned world model for test-time planning in ManiSkill manipulation tasks [Tao et al., 2025]. Across PickCube and PickSingleYCB, adding the world model improved performance over the FM policy alone, with especially clear gains in end-of-episode success. These results suggest that world-model-based planning can effectively complement flow-based imitation policies without modifying the FM training objective.
翻译:流匹配(Flow Matching, FM)是一种在多模态动作空间中进行行为克隆的强大方法 [Jiang 等,2025],但由于其训练目标并非直接最大化期望回报,因此FM策略在测试时的行为仍有改进空间。本研究探讨了学习到的世界模型能否通过实现基于策略生成候选动作序列的模型预测路径积分(Model Predictive Path Integral, MPPI)规划,从而改进FM策略。基于 TD-MPC2 [Hansen 等,2024],本文提出了 FlowMPC 框架,该框架将模仿学习的FM策略与学习到的世界模型相结合,用于ManiSkill操作任务 [Tao 等,2025] 的测试时规划。在PickCube和PickSingleYCB任务中,加入世界模型后,相比单独使用FM策略性能有所提升,尤其在回合末成功率方面优势明显。这些结果表明,基于世界模型的规划可以在不修改FM训练目标的情况下,有效补充基于流的模仿策略。