In this paper, we focus on the task of conditional image generation, where an image is synthesized according to user instructions. The critical challenge underpinning this task is ensuring both the fidelity of the generated images and their semantic alignment with the provided conditions. To tackle this issue, previous studies have employed supervised perceptual losses derived from pre-trained models, i.e., reward models, to enforce alignment between the condition and the generated result. However, we observe one inherent shortcoming: considering the diversity of synthesized images, the reward model usually provides inaccurate feedback when encountering newly generated data, which can undermine the training process. To address this limitation, we propose an uncertainty-aware reward modeling, called Ctrl-U, including uncertainty estimation and uncertainty-aware regularization, designed to reduce the adverse effects of imprecise feedback from the reward model. Given the inherent cognitive uncertainty within reward models, even images generated under identical conditions often result in a relatively large discrepancy in reward loss. Inspired by the observation, we explicitly leverage such prediction variance as an uncertainty indicator. Based on the uncertainty estimation, we regularize the model training by adaptively rectifying the reward. In particular, rewards with lower uncertainty receive higher loss weights, while those with higher uncertainty are given reduced weights to allow for larger variability. The proposed uncertainty regularization facilitates reward fine-tuning through consistency construction. Extensive experiments validate the effectiveness of our methodology in improving the controllability and generation quality, as well as its scalability across diverse conditional scenarios. Codes are publicly available at https://grenoble-zhang.github.io/Ctrl-U-Page/.
翻译:本文聚焦于条件图像生成任务,即根据用户指令合成图像。该任务的核心挑战在于确保生成图像的保真度及其与给定条件的语义对齐。为解决此问题,先前研究采用基于预训练模型(即奖励模型)的监督感知损失来强制条件与生成结果的对齐。然而,我们观察到其存在一个固有缺陷:考虑到合成图像的多样性,奖励模型在面对新生成数据时通常提供不准确的反馈,这可能破坏训练过程。为克服这一局限,我们提出一种不确定性感知的奖励建模方法Ctrl-U,包含不确定性估计与不确定性感知正则化,旨在降低奖励模型不精确反馈的负面影响。鉴于奖励模型内在的认知不确定性,即使在相同条件下生成的图像也往往导致奖励损失存在较大差异。受此现象启发,我们显式地将此类预测方差作为不确定性指标。基于不确定性估计,我们通过自适应修正奖励来正则化模型训练。具体而言,较低不确定性的奖励被赋予较高的损失权重,而较高不确定性的奖励则被降低权重以允许更大变异性。所提出的不确定性正则化通过一致性构建促进了奖励微调。大量实验验证了我们的方法在提升可控性与生成质量方面的有效性,及其在多样化条件场景下的可扩展性。代码公开于https://grenoble-zhang.github.io/Ctrl-U-Page/。