The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike.
翻译:视觉计算领域正因生成式人工智能(AI)的兴起而快速发展,其在图像、视频及三维场景的生成、编辑与重建方面释放了前所未有的能力。在这些领域中,扩散模型已成为首选的生成式AI架构。仅在过去一年间,基于扩散工具与应用的相关文献数量呈指数级增长,且相关论文持续发表于计算机图形学、计算机视觉及AI领域,每天都有新作在arXiv平台发布。这一领域的快速发展使得研究者难以跟上所有最新进展。本技术现状报告(STAR)旨在介绍扩散模型的基本数学概念、主流Stable Diffusion模型的实现细节与设计选择,并概述这些生成式AI工具的重要方面,包括个性化、条件控制、反演等。此外,我们全面回顾了近年来迅速增长的扩散模型生成与编辑文献,并按生成媒介类型进行分类,涵盖二维图像、视频、三维物体、运动轨迹及四维场景。最后,我们探讨了可用数据集、评估指标、开放挑战及社会影响。本报告为研究人员、艺术家及实践者探索这一令人振奋的领域提供了直观的入门起点。