We propose Text2Scene, a method to automatically create realistic textures for virtual scenes composed of multiple objects. Guided by a reference image and text descriptions, our pipeline adds detailed texture on labeled 3D geometries in the room such that the generated colors respect the hierarchical structure or semantic parts that are often composed of similar materials. Instead of applying flat stylization on the entire scene at a single step, we obtain weak semantic cues from geometric segmentation, which are further clarified by assigning initial colors to segmented parts. Then we add texture details for individual objects such that their projections on image space exhibit feature embedding aligned with the embedding of the input. The decomposition makes the entire pipeline tractable to a moderate amount of computation resources and memory. As our framework utilizes the existing resources of image and text embedding, it does not require dedicated datasets with high-quality textures designed by skillful artists. To the best of our knowledge, it is the first practical and scalable approach that can create detailed and realistic textures of the desired style that maintain structural context for scenes with multiple objects.
翻译:我们提出Text2Scene方法,该方法能够自动为包含多个物体的虚拟场景生成逼真纹理。在参考图像与文本描述的引导下,我们的管线为室内场景中带标注的三维几何体添加细节纹理,使得生成的颜色符合层次化结构或通常由相似材质构成的语义部件。不同于对整个场景进行单步平面风格化处理,我们从几何分割中获取弱语义线索,并通过为分割部件分配初始颜色来进一步明晰这些线索。随后为单个物体添加纹理细节,使其在图像空间的投影特征嵌入与输入特征嵌入对齐。这种分解使得整个管线在适度的计算资源与内存条件下具有可行性。由于框架利用现有图像和文本嵌入资源,因此无需采用由专业艺术家设计的高质量纹理专用数据集。据我们所知,这是首个实用且可扩展的方法,能够为包含多个物体的场景创建具有所需风格且保持结构上下文的精细化逼真纹理。