The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose a coarse-to-fine 3D scene texturing framework, referred to as RoomTex, to generate high-fidelity and style-consistent textures for untextured compositional scene meshes. In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency. In the fine stage, based on the panoramic image and perspective depth maps, RoomTex will refine and texture every single object in the room iteratively along a series of selected camera views, until this object is completely painted. Moreover, we propose to maintain superior alignment between RGB and depth spaces via subtle edge detection methods. Extensive experiments show our method is capable of generating high-quality and diverse room textures, and more importantly, supporting interactive fine-grained texture control and flexible scene editing thanks to our inpainting-based framework and compositional mesh input. Our project page is available at https://qwang666.github.io/RoomTex/.
翻译:扩散模型的发展推动了文本到三维物体生成的边界。尽管将物体组合到具有合理几何结构的场景中较为直接,但由于物体间的风格不一致和遮挡问题,完美地为此类场景生成纹理并非易事。为解决这些问题,我们提出了一个从粗到细的三维场景纹理生成框架,称为RoomTex,用于为未纹理化的组合场景网格生成高保真且风格一致的纹理。在粗粒度阶段,RoomTex首先将场景网格展开为全景深度图,并利用ControlNet生成房间全景图,该全景图作为确保全局纹理一致性的粗粒度参考。在细粒度阶段,基于全景图像和透视深度图,RoomTex将沿一系列选定相机视角迭代地细化并纹理化房间内的每个独立物体,直至该物体被完全绘制。此外,我们提出通过精细的边缘检测方法保持RGB空间与深度空间之间的高度对齐。大量实验表明,我们的方法能够生成高质量且多样化的房间纹理;更重要的是,得益于基于修复的框架和组合式网格输入,本方法支持交互式的细粒度纹理控制与灵活的场景编辑。项目页面详见 https://qwang666.github.io/RoomTex/。