We present TRACE, a mesh-guided 3DGS editing framework that achieves automated, high-fidelity scene transformation. By anchoring video diffusion with explicit 3D geometry, TRACE uniquely enables fine-grained, part-level manipulatio--such as local pose shifting or component replacemen--while preserving the structural integrity of the central subject, a capability largely absent in existing editing methods. Our approach comprises three key stages: (1) Multi-view 3D-Anchor Synthesis, which leverages a sparse-view editor trained on our MV-TRACE datase--the first multi-view consistent dataset dedicated to scene-coherent object addition and modificatio--to generate spatially consistent 3D-anchors; (2) Tangible Geometry Anchoring (TGA), which ensures precise spatial synchronization between inserted meshes and the 3DGS scene via two-phase registration; and (3) Contextual Video Masking (CVM), which integrates 3D projections into an autoregressive video pipeline to achieve temporally stable, physically-grounded rendering. Extensive experiments demonstrate that TRACE consistently outperforms existing methods especially in editing versatility and structural integrity.
翻译:我们提出TRACE——一种网格引导的三维高斯泼溅编辑框架,可实现自动化、高保真的场景变换。通过将视频扩散与显式三维几何锚定,TRACE独特地实现了细粒度的部件级操作(如局部姿态调整或组件替换),同时保持中心主体的结构完整性——这一能力在现有编辑方法中基本缺失。该方法包含三个关键阶段:(1)多视图三维锚点合成:利用在我们MV-TRACE数据集(首个致力于场景一致性物体添加与修改的多视图一致数据集)上训练的稀疏视图编辑器,生成空间一致的三维锚点;(2)可感知几何锚定:通过两阶段配准确保插入网格与三维高斯泼溅场景之间的精确空间同步;(3)上下文视频掩码:将三维投影集成到自回归视频流水线中,实现时间稳定、基于物理的渲染。大量实验表明,TRACE在编辑多样性与结构完整性方面始终优于现有方法。