This paper proposes ProEdit - a simple yet effective framework for high-quality 3D scene editing guided by diffusion distillation in a novel progressive manner. Inspired by the crucial observation that multi-view inconsistency in scene editing is rooted in the diffusion model's large feasible output space (FOS), our framework controls the size of FOS and reduces inconsistency by decomposing the overall editing task into several subtasks, which are then executed progressively on the scene. Within this framework, we design a difficulty-aware subtask decomposition scheduler and an adaptive 3D Gaussian splatting (3DGS) training strategy, ensuring high quality and efficiency in performing each subtask. Extensive evaluation shows that our ProEdit achieves state-of-the-art results in various scenes and challenging editing tasks, all through a simple framework without any expensive or sophisticated add-ons like distillation losses, components, or training procedures. Notably, ProEdit also provides a new way to control, preview, and select the "aggressivity" of editing operation during the editing process.
翻译:本文提出ProEdit——一种新颖的渐进式扩散蒸馏引导框架,以简洁高效的方式实现高质量三维场景编辑。受关键观察启发:场景编辑中的多视角不一致性源于扩散模型庞大的可行输出空间,本框架通过将整体编辑任务分解为若干子任务并在场景上渐进执行,从而控制可行输出空间规模并减少不一致性。在此框架内,我们设计了难度感知的子任务分解调度器与自适应三维高斯溅射训练策略,确保每个子任务执行的高质量与高效率。大量评估表明,ProEdit在多样化场景与挑战性编辑任务中均取得最先进成果,且仅通过简洁框架实现,无需任何昂贵或复杂的附加组件(如蒸馏损失、额外模块或训练流程)。值得注意的是,ProEdit还为编辑过程中控制、预览和选择编辑操作的“激进程度”提供了新途径。