Generating and inserting new objects into 3D content is a compelling approach for achieving versatile scene recreation. Existing methods, which rely on SDS optimization or single-view inpainting, often struggle to produce high-quality results. To address this, we propose a novel method for object insertion in 3D content represented by Gaussian Splatting. Our approach introduces a multi-view diffusion model, dubbed MVInpainter, which is built upon a pre-trained stable video diffusion model to facilitate view-consistent object inpainting. Within MVInpainter, we incorporate a ControlNet-based conditional injection module to enable controlled and more predictable multi-view generation. After generating the multi-view inpainted results, we further propose a mask-aware 3D reconstruction technique to refine Gaussian Splatting reconstruction from these sparse inpainted views. By leveraging these fabricate techniques, our approach yields diverse results, ensures view-consistent and harmonious insertions, and produces better object quality. Extensive experiments demonstrate that our approach outperforms existing methods.
翻译:在三维内容中生成并插入新物体是实现多样化场景重建的一种引人注目的方法。现有方法依赖于SDS优化或单视角修复,往往难以生成高质量结果。为解决此问题,我们提出了一种在基于高斯溅射表示的三维内容中进行物体插入的新方法。我们的方法引入了一个多视角扩散模型,命名为MVInpainter,该模型基于预训练的稳定视频扩散模型构建,以促进视角一致的物体修复。在MVInpainter中,我们整合了一个基于ControlNet的条件注入模块,以实现可控且更可预测的多视角生成。生成多视角修复结果后,我们进一步提出了一种掩码感知的三维重建技术,以从这些稀疏的修复视角中优化高斯溅射重建。通过利用这些构建技术,我们的方法能够产生多样化结果,确保视角一致且和谐的插入,并生成更优的物体质量。大量实验证明,我们的方法优于现有方法。