Generative models have emerged as powerful tools for planning, with compositional approaches offering particular promise for modeling long-horizon task distributions by composing together local, modular generative models. This compositional paradigm spans diverse domains, from multi-step manipulation planning to panoramic image synthesis to long video generation. However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this mode averaging problem by embedding search directly within the diffusion denoising process. Our method explores diverse combinations of local modes through population-based sampling, prunes infeasible candidates using likelihood-based filtering, and enforces global consistency through iterative resampling between overlapping segments. CDGS matches oracle performance on seven robot manipulation tasks, outperforming baselines that lack compositionality or require long-horizon training data. The approach generalizes across domains, enabling coherent text-guided panoramic images and long videos through effective local-to-global message passing. More details: https://cdgsearch.github.io/
翻译:生成模型已成为规划任务中的强大工具,其中组合方法通过组合局部模块化生成模型,为建模长时域任务分布提供了独特优势。这种组合范式涵盖多个领域,从多步操作规划到全景图像合成,再到长视频生成。然而,组合生成模型面临一个关键挑战:当局部分布呈多模态时,现有组合方法会对不兼容的模式进行平均,导致生成的规划既缺乏局部可行性又缺乏全局一致性。我们提出组合扩散引导搜索(CDGS),该方法通过将搜索直接嵌入扩散去噪过程来解决模式平均问题。我们的方法通过基于种群的采样探索局部模式的多样化组合,利用基于似然的过滤剪枝不可行候选方案,并通过重叠段之间的迭代重采样来保证全局一致性。CDGS在七项机器人操作任务中达到与基准方法相当的性能,优于缺乏组合性或需要长时域训练数据的基线方法。该方案具有良好的跨领域泛化能力,通过有效的局部到全局信息传递,实现了连贯的文本引导全景图像和长视频生成。更多细节请访问:https://cdgsearch.github.io/