Controllable 3D indoor scene synthesis stands at the forefront of technological progress, offering various applications like gaming, film, and augmented/virtual reality. The capability to stylize and de-couple objects within these scenarios is a crucial factor, providing an advanced level of control throughout the editing process. This control extends not just to manipulating geometric attributes like translation and scaling but also includes managing appearances, such as stylization. Current methods for scene stylization are limited to applying styles to the entire scene, without the ability to separate and customize individual objects. Addressing the intricacies of this challenge, we introduce a unique pipeline designed for synthesis 3D indoor scenes. Our approach involves strategically placing objects within the scene, utilizing information from professionally designed bounding boxes. Significantly, our pipeline prioritizes maintaining style consistency across multiple objects within the scene, ensuring a cohesive and visually appealing result aligned with the desired aesthetic. The core strength of our pipeline lies in its ability to generate 3D scenes that are not only visually impressive but also exhibit features like photorealism, multi-view consistency, and diversity. These scenes are crafted in response to various natural language prompts, demonstrating the versatility and adaptability of our model.
翻译:可控的三维室内场景合成处于技术进步的前沿,可应用于游戏、电影及增强/虚拟现实等多个领域。在这些场景中实现物体风格化与解耦能力是关键技术要素,能够为编辑过程提供更高级别的控制——不仅涵盖平移、缩放等几何属性的操控,还包括风格化等外观管理。现有场景风格化方法仅能对整体场景施加统一风格,无法分离并定制单个物体。针对这一复杂挑战,我们提出了一种专用于三维室内场景合成的独特流程:利用专业设计的边界框信息,在场景中策略性地放置物体。尤为重要的是,该流程优先保持场景内多个物体间的风格一致性,确保最终结果符合预期美学且视觉协调。本流程的核心优势在于能够生成兼具照片级真实感、多视图一致性与多样性的三维场景,且可响应多种自然语言提示进行生成,充分展现了模型的通用性与适应性。