State of the Art on Diffusion Models for Visual Computing

Ryan Po,Wang Yifan,Vladislav Golyanik,Kfir Aberman,Jonathan T. Barron,Amit H. Bermano,Eric Ryan Chan,Tali Dekel,Aleksander Holynski,Angjoo Kanazawa,C. Karen Liu,Lingjie Liu,Ben Mildenhall,Matthias Nießner,Björn Ommer,Christian Theobalt,Peter Wonka,Gordon Wetzstein

The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike.

翻译：视觉计算领域正因生成式人工智能（AI）的兴起而快速发展，其在图像、视频及三维场景的生成、编辑与重建方面释放了前所未有的能力。在这些领域中，扩散模型已成为首选的生成式AI架构。仅在过去一年间，基于扩散工具与应用的相关文献数量呈指数级增长，且相关论文持续发表于计算机图形学、计算机视觉及AI领域，每天都有新作在arXiv平台发布。这一领域的快速发展使得研究者难以跟上所有最新进展。本技术现状报告（STAR）旨在介绍扩散模型的基本数学概念、主流Stable Diffusion模型的实现细节与设计选择，并概述这些生成式AI工具的重要方面，包括个性化、条件控制、反演等。此外，我们全面回顾了近年来迅速增长的扩散模型生成与编辑文献，并按生成媒介类型进行分类，涵盖二维图像、视频、三维物体、运动轨迹及四维场景。最后，我们探讨了可用数据集、评估指标、开放挑战及社会影响。本报告为研究人员、艺术家及实践者探索这一令人振奋的领域提供了直观的入门起点。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日