Diffusion models offer powerful generative capabilities for robot trajectory planning, yet their practical deployment on robots is hindered by a critical bottleneck: a reliance on imitation learning from expert demonstrations. This paradigm is often impractical for specialized robots where data is scarce and creates an inefficient, theoretically suboptimal training pipeline. To overcome this, we introduce PegasusFlow, a hierarchical rolling-denoising framework that enables direct and parallel sampling of trajectory score gradients from environmental interaction, completely bypassing the need for expert data. Our core innovation is a novel sampling algorithm, Weighted Basis Function Optimization (WBFO), which leverages spline basis representations to achieve superior sample efficiency and faster convergence compared to traditional methods like MPPI. The framework is embedded within a scalable, asynchronous parallel simulation architecture that supports massively parallel rollouts for efficient data collection. Extensive experiments on trajectory optimization and robotic navigation tasks demonstrate that our approach, particularly Action-Value WBFO (AVWBFO) combined with a reinforcement learning warm-start, significantly outperforms baselines. In a challenging barrier-crossing task, our method achieved a 100% success rate and was 18% faster than the next-best method, validating its effectiveness for complex terrain locomotion planning. https://masteryip.github.io/pegasusflow.github.io/
翻译:扩散模型为机器人轨迹规划提供了强大的生成能力,但其在机器人上的实际部署受到一个关键瓶颈的制约:依赖于专家演示的模仿学习。对于数据稀缺的专用机器人,这种范式通常不切实际,并造成了低效且理论上非最优的训练流程。为克服此问题,我们提出了PegasusFlow,一个分层滚动去噪框架,能够直接从环境交互中并行采样轨迹分数梯度,完全绕过了对专家数据的需求。我们的核心创新是一种新颖的采样算法——加权基函数优化(WBFO),该算法利用样条基表示,相比MPPI等传统方法,实现了更优的采样效率和更快的收敛速度。该框架嵌入在一个可扩展的异步并行仿真架构中,支持大规模并行推演以实现高效数据收集。在轨迹优化和机器人导航任务上的大量实验表明,我们的方法,特别是结合强化学习热启动的动作-价值WBFO(AVWBFO),显著优于基线方法。在一个具有挑战性的越障任务中,我们的方法实现了100%的成功率,且比次优方法快18%,验证了其在复杂地形运动规划中的有效性。https://masteryip.github.io/pegasusflow.github.io/