DP$^2$O-SR：面向真实世界图像超分辨率的直接感知偏好优化 (DP$^2$O-SR: Direct Perceptual Preference Optimization for Real-World Image Super-Resolution)

Benefiting from pre-trained text-to-image (T2I) diffusion models, real-world image super-resolution (Real-ISR) methods can synthesize rich and realistic details. However, due to the inherent stochasticity of T2I models, different noise inputs often lead to outputs with varying perceptual quality. Although this randomness is sometimes seen as a limitation, it also introduces a wider perceptual quality range, which can be exploited to improve Real-ISR performance. To this end, we introduce Direct Perceptual Preference Optimization for Real-ISR (DP$^2$O-SR), a framework that aligns generative models with perceptual preferences without requiring costly human annotations. We construct a hybrid reward signal by combining full-reference and no-reference image quality assessment (IQA) models trained on large-scale human preference datasets. This reward encourages both structural fidelity and natural appearance. To better utilize perceptual diversity, we move beyond the standard best-vs-worst selection and construct multiple preference pairs from outputs of the same model. Our analysis reveals that the optimal selection ratio depends on model capacity: smaller models benefit from broader coverage, while larger models respond better to stronger contrast in supervision. Furthermore, we propose hierarchical preference optimization, which adaptively weights training pairs based on intra-group reward gaps and inter-group diversity, enabling more efficient and stable learning. Extensive experiments across both diffusion- and flow-based T2I backbones demonstrate that DP$^2$O-SR significantly improves perceptual quality and generalizes well to real-world benchmarks.

翻译：得益于预训练的文本到图像（T2I）扩散模型，真实世界图像超分辨率（Real-ISR）方法能够合成丰富且逼真的细节。然而，由于T2I模型固有的随机性，不同的噪声输入往往导致具有不同感知质量的输出。尽管这种随机性有时被视为一种局限，但它也引入了更广的感知质量范围，可用于提升Real-ISR性能。为此，我们提出了面向Real-ISR的直接感知偏好优化（DP$^2$O-SR），这是一个无需昂贵人工标注即可将生成模型与感知偏好对齐的框架。我们通过结合在大规模人类偏好数据集上训练的全参考和无参考图像质量评估（IQA）模型，构建了一种混合奖励信号。该奖励同时鼓励结构保真度和自然外观。为了更好地利用感知多样性，我们超越了标准的最佳-最差选择方法，从同一模型的输出中构建多个偏好对。我们的分析表明，最优的选择比例取决于模型容量：较小的模型受益于更广的覆盖范围，而较大的模型对监督中更强的对比响应更好。此外，我们提出了分层偏好优化，该方法基于组内奖励差距和组间多样性自适应地加权训练对，从而实现更高效和稳定的学习。在基于扩散和基于流的T2I骨干网络上进行的广泛实验表明，DP$^2$O-SR显著提升了感知质量，并能很好地泛化到真实世界基准测试中。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日