We present Edit3DGS, a unified framework for dynamic 3D head editing that integrates 2D instruction-guided diffusion with 3D Gaussian splatting. Unlike prior approaches that separately address frame-based edits or static 3D reconstruction, our method couples semantic controllability in the image domain with photorealistic, temporally consistent 3D representations. Given an input video, editable facial regions are masked and modified using a text-conditioned diffusion model to support fine-grained operations such as expression transformation, attribute modification, and appearance refinement. The edited frames are then aggregated through 3D Gaussian splatting to produce a coherent, high-fidelity avatar that preserves both identity and motion dynamics. To enforce consistency, Edit3DGS incorporates multi-view batch editing and lightweight inpainting strategies that recover lost expressions across timesteps. Experimental results demonstrate that our framework enables controllable, artifact-free head editing with smooth temporal transitions, offering practical applications in virtual avatars, immersive communication, film production, and interactive media.
翻译:我们提出Edit3DGS,一种集成了2D指令引导扩散与3D高斯泼溅的动态3D头部编辑统一框架。与分别处理单帧编辑或静态3D重建的现有方法不同,本方法将图像域的语义可控性与逼真且时间一致的3D表征相结合。给定输入视频,利用文本条件扩散模型对可编辑面部区域进行掩膜和修改,以支持表情转换、属性修改和外观细化等细粒度操作。随后通过3D高斯泼溅聚合编辑后的帧,生成同时保留身份特征与运动动态的连贯高保真虚拟形象。为强制执行一致性,Edit3DGS引入多视图批量编辑和轻量级修复策略,以恢复时间步中丢失的表情。实验结果表明,本框架能实现具有平滑时间过渡的可控无伪影头部编辑,在虚拟角色、沉浸式通信、电影制作及交互媒体领域具有实际应用价值。