Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling

In this paper, we focus on the task of conditional image generation, where an image is synthesized according to user instructions. The critical challenge underpinning this task is ensuring both the fidelity of the generated images and their semantic alignment with the provided conditions. To tackle this issue, previous studies have employed supervised perceptual losses derived from pre-trained models, i.e., reward models, to enforce alignment between the condition and the generated result. However, we observe one inherent shortcoming: considering the diversity of synthesized images, the reward model usually provides inaccurate feedback when encountering newly generated data, which can undermine the training process. To address this limitation, we propose an uncertainty-aware reward modeling, called Ctrl-U, including uncertainty estimation and uncertainty-aware regularization, designed to reduce the adverse effects of imprecise feedback from the reward model. Given the inherent cognitive uncertainty within reward models, even images generated under identical conditions often result in a relatively large discrepancy in reward loss. Inspired by the observation, we explicitly leverage such prediction variance as an uncertainty indicator. Based on the uncertainty estimation, we regularize the model training by adaptively rectifying the reward. In particular, rewards with lower uncertainty receive higher loss weights, while those with higher uncertainty are given reduced weights to allow for larger variability. The proposed uncertainty regularization facilitates reward fine-tuning through consistency construction. Extensive experiments validate the effectiveness of our methodology in improving the controllability and generation quality, as well as its scalability across diverse conditional scenarios. Codes are publicly available at https://grenoble-zhang.github.io/Ctrl-U-Page/.

翻译：本文聚焦于条件图像生成任务，即根据用户指令合成图像。该任务的核心挑战在于确保生成图像的保真度及其与给定条件的语义对齐。为解决此问题，先前研究采用基于预训练模型（即奖励模型）的监督感知损失来强制条件与生成结果的对齐。然而，我们观察到其存在一个固有缺陷：考虑到合成图像的多样性，奖励模型在面对新生成数据时通常提供不准确的反馈，这可能破坏训练过程。为克服这一局限，我们提出一种不确定性感知的奖励建模方法Ctrl-U，包含不确定性估计与不确定性感知正则化，旨在降低奖励模型不精确反馈的负面影响。鉴于奖励模型内在的认知不确定性，即使在相同条件下生成的图像也往往导致奖励损失存在较大差异。受此现象启发，我们显式地将此类预测方差作为不确定性指标。基于不确定性估计，我们通过自适应修正奖励来正则化模型训练。具体而言，较低不确定性的奖励被赋予较高的损失权重，而较高不确定性的奖励则被降低权重以允许更大变异性。所提出的不确定性正则化通过一致性构建促进了奖励微调。大量实验验证了我们的方法在提升可控性与生成质量方面的有效性，及其在多样化条件场景下的可扩展性。代码公开于https://grenoble-zhang.github.io/Ctrl-U-Page/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日