Diffusion models have become a powerful tool for generative modeling in robotics, with diffusion policies excelling at modeling multimodal action-trajectory distributions. However, when demonstrations are limited, standard sampling often reproduces dominant behaviors while neglecting valid but rare modes, limiting the discovery of novel solutions. Existing approaches, such as guidance methods or combining reinforcement learning with diffusion, either push samples into infeasible regions or struggle to escape local minima, failing to systematically uncover diverse behaviors. To address these challenges, we propose a framework that combines Feynman-Kac correctors with a novel guiding potential that systematically guides diffusion policy samples towards promising yet underrepresented samples. These trajectories are refined using sampling-based trajectory optimization and reincorporated into the training set to retrain the diffusion policy. Our method effectively mines and repairs novel trajectories, enabling the systematic discovery of diverse and executable behaviors. We demonstrate the effectiveness of our framework across a range of manipulation environments, consistently discovering new behaviors.
翻译:扩散模型已成为机器人领域生成式建模的强大工具,其中扩散策略在建模多模态动作轨迹分布方面表现出色。然而,在演示数据有限的情况下,标准采样通常重复再现主导行为,而忽略有效但罕见的模式,从而限制了新颖解决方案的发现。现有方法——如引导方法或将强化学习与扩散模型相结合——要么将样本推向不可行区域,要么难以逃离局部最小值,无法系统地揭示多样化行为。为解决这些挑战,我们提出了一种结合费曼-卡茨校正器与新引导势能的框架,该势能系统地引导扩散策略样本朝向有前景但未被充分代表的样本。这些轨迹通过基于采样的轨迹优化进行精炼,并重新纳入训练集以重新训练扩散策略。我们的方法有效挖掘并修复新轨迹,从而实现对多样化且可执行行为的系统性发现。我们在多种操作环境中验证了该框架的有效性,持续发现新行为。