Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions

What constitutes the "vibe" of a particular scene? What should one find in "a busy, dirty city street", "an idyllic countryside", or "a crime scene in an abandoned living room"? The translation from abstract scene descriptions to stylized scene elements cannot be done with any generality by extant systems trained on rigid and limited indoor datasets. In this paper, we propose to leverage the knowledge captured by foundation models to accomplish this translation. We present a system that can serve as a tool to generate stylized assets for 3D scenes described by a short phrase, without the need to enumerate the objects to be found within the scene or give instructions on their appearance. Additionally, it is robust to open-world concepts in a way that traditional methods trained on limited data are not, affording more creative freedom to the 3D artist. Our system demonstrates this using a foundation model "team" composed of a large language model, a vision-language model and several image diffusion models, which communicate using an interpretable and user-editable intermediate representation, thus allowing for more versatile and controllable stylized asset generation for 3D artists. We introduce novel metrics for this task, and show through human evaluations that in 91% of the cases, our system outputs are judged more faithful to the semantics of the input scene description than the baseline, thus highlighting the potential of this approach to radically accelerate the 3D content creation process for 3D artists.

翻译：何为特定场景的“氛围”？“繁忙肮脏的城市街道”、“田园诗般的乡村”或“废弃客厅中的犯罪现场”应包含哪些元素？现有系统受限于刚性且有限的数据集，无法泛化地从抽象场景描述转化为风格化场景元素。本文提出利用基础模型捕获的知识来实现这一转化。我们呈现的系统可作为工具，通过简短短语为3D场景生成风格化资产，无需枚举场景内物体或指定其外观。此外，该系统对开放世界概念的鲁棒性远超传统基于有限数据训练的方法，为3D艺术家赋予更多创意自由。该系统通过由大语言模型、视觉-语言模型及多个图像扩散模型组成的基础模型“团队”，借助可解释且用户可编辑的中间表示进行通信，从而实现更灵活可控的风格化资产生成。我们为这一任务引入新指标，并通过人类评估表明：在91%的案例中，系统输出对输入场景描述的语义忠实度优于基线方法，从而凸显该方案对加速3D艺术家内容创作流程的潜力。

相关内容

ASSETS

关注 0

ACM SIGACCESS Conference on Computers and Accessibility是为残疾人和老年人提供与计算机相关的设计、评估、使用和教育研究的首要论坛。我们欢迎提交原始的高质量的有关计算和可访问性的主题。今年，ASSETS首次将其范围扩大到包括关于计算机无障碍教育相关主题的原创高质量研究。官网链接：http://assets19.sigaccess.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日