Inspired by cognitive theories, we introduce AnyHome, a framework that translates any text into well-structured and textured indoor scenes at a house-scale. By prompting Large Language Models (LLMs) with designed templates, our approach converts provided textual narratives into amodal structured representations. These representations guarantee consistent and realistic spatial layouts by directing the synthesis of a geometry mesh within defined constraints. A Score Distillation Sampling process is then employed to refine the geometry, followed by an egocentric inpainting process that adds lifelike textures to it. AnyHome stands out with its editability, customizability, diversity, and realism. The structured representations for scenes allow for extensive editing at varying levels of granularity. Capable of interpreting texts ranging from simple labels to detailed narratives, AnyHome generates detailed geometries and textures that outperform existing methods in both quantitative and qualitative measures.
翻译:受认知理论启发,我们提出AnyHome框架,该框架能将任意文本转换为结构合理且纹理丰富的室内场景(达到房屋级别)。通过为大型语言模型(LLMs)设计提示模板,该方法将提供的文本叙述转换为非模态结构化表示。这些表示通过引导几何网格在限定约束下的合成,确保空间布局的一致性与真实性。随后采用分数蒸馏采样过程优化几何结构,再通过自我中心修补过程为其添加逼真纹理。AnyHome在可编辑性、可定制性、多样性和真实感方面表现突出。场景的结构化表示支持多粒度级别的广泛编辑。从简单标签到详细叙述的文本均可被解析,AnyHome生成的精细几何结构与纹理在定量和定性指标上均优于现有方法。