DreamAnywhere: Object-Centric Panoramic 3D Scene Generation

Edoardo Alberto Dominici,Jozef Hladky,Floor Verhoeven,Lukas Radl,Thomas Deixelberger,Stefan Ainetter,Philipp Drescher,Stefan Hauswiesner,Arno Coomans,Giacomo Nazzaro,Konstantinos Vardis,Markus Steinberger

from arxiv, WACV 2026 Oral

Recent advances in text-to-3D scene generation have demonstrated significant potential to transform content creation across multiple industries. Although the research community has made impressive progress in addressing the challenges of this complex task, existing methods often generate environments that are only front-facing, lack visual fidelity, exhibit limited scene understanding, and are typically fine-tuned for either indoor or outdoor settings. In this work, we address these issues and propose DreamAnywhere, a modular system for the fast generation and prototyping of 3D scenes. Our system synthesizes a 360° panoramic image from text, decomposes it into background and objects, constructs a complete 3D representation through hybrid inpainting, and lifts object masks to detailed 3D objects that are placed in the virtual environment. DreamAnywhere supports immersive navigation and intuitive object-level editing, making it ideal for scene exploration, visual mock-ups, and rapid prototyping -- all with minimal manual modeling. These features make our system particularly suitable for low-budget movie production, enabling quick iteration on scene layout and visual tone without the overhead of traditional 3D workflows. Our modular pipeline is highly customizable as it allows components to be replaced independently. Compared to current state-of-the-art text and image-based 3D scene generation approaches, DreamAnywhere shows significant improvements in coherence in novel view synthesis and achieves competitive image quality, demonstrating its effectiveness across diverse and challenging scenarios. A comprehensive user study demonstrates a clear preference for our method over existing approaches, validating both its technical robustness and practical usefulness.

翻译：近年来，文本到三维场景生成技术的进展已展现出变革多个行业内容创作的巨大潜力。尽管研究界在这一复杂任务的挑战应对上取得了显著进步，但现有方法通常仅生成正对视角的环境，存在视觉保真度不足、场景理解有限的问题，且通常仅针对室内或室外场景进行微调。本研究针对这些问题提出了DreamAnywhere——一个用于快速生成与原型构建三维场景的模块化系统。该系统从文本合成360°全景图像，将其分解为背景与对象，通过混合修复技术构建完整的三维表征，并将对象掩码提升为细节丰富的三维物体后置入虚拟环境。DreamAnywhere支持沉浸式导航与直观的对象级编辑，使其成为场景探索、视觉原型与快速迭代的理想工具——所有这些都只需极少量的人工建模。这些特性使我们的系统特别适用于低预算电影制作，能够快速迭代场景布局与视觉基调，而无需传统三维工作流的额外开销。我们的模块化流程具有高度可定制性，允许独立替换各组件。与当前基于文本和图像的最先进三维场景生成方法相比，DreamAnywhere在新视角合成的连贯性方面展现出显著提升，并实现了具有竞争力的图像质量，证明了其在多样化和挑战性场景中的有效性。一项全面的用户研究表明，相较于现有方法，用户对我们的方法表现出明确偏好，这验证了其技术鲁棒性与实际应用价值。