FashionR2R：基于扩散模型的纹理保持型渲染图像到真实图像转换 (FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models)

Modeling and producing lifelike clothed human images has attracted researchers' attention from different areas for decades, with the complexity from highly articulated and structured content. Rendering algorithms decompose and simulate the imaging process of a camera, while are limited by the accuracy of modeled variables and the efficiency of computation. Generative models can produce impressively vivid human images, however still lacking in controllability and editability. This paper studies photorealism enhancement of rendered images, leveraging generative power from diffusion models on the controlled basis of rendering. We introduce a novel framework to translate rendered images into their realistic counterparts, which consists of two stages: Domain Knowledge Injection (DKI) and Realistic Image Generation (RIG). In DKI, we adopt positive (real) domain finetuning and negative (rendered) domain embedding to inject knowledge into a pretrained Text-to-image (T2I) diffusion model. In RIG, we generate the realistic image corresponding to the input rendered image, with a Texture-preserving Attention Control (TAC) to preserve fine-grained clothing textures, exploiting the decoupled features encoded in the UNet structure. Additionally, we introduce SynFashion dataset, featuring high-quality digital clothing images with diverse textures. Extensive experimental results demonstrate the superiority and effectiveness of our method in rendered-to-real image translation.

翻译：数十年来，对逼真着装人体图像的建模与生成因其高度关节化与结构化内容所带来的复杂性，吸引了来自不同领域研究者的关注。渲染算法通过分解和模拟相机的成像过程来实现图像生成，但其效果受限于建模变量的精确度与计算效率。生成模型能够生成令人印象深刻的生动人体图像，但在可控性与可编辑性方面仍存在不足。本文研究渲染图像的真实感增强，在渲染提供的可控基础上利用扩散模型的生成能力。我们提出一种新颖的框架，将渲染图像转换为其对应的真实图像，该框架包含两个阶段：领域知识注入（DKI）与真实图像生成（RIG）。在DKI阶段，我们采用正域（真实图像）微调与负域（渲染图像）嵌入的方法，将知识注入预训练的文本到图像（T2I）扩散模型中。在RIG阶段，我们生成与输入渲染图像对应的真实图像，并利用一种纹理保持注意力控制（TAC）机制来保留细粒度的服装纹理，该机制利用了UNet结构中编码的解耦特征。此外，我们引入了SynFashion数据集，该数据集包含具有多样化纹理的高质量数字服装图像。大量实验结果证明了我们方法在渲染图像到真实图像转换任务上的优越性与有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日