Blind face restoration remains a persistent challenge due to the inherent ill-posedness of reconstructing holistic structures from severely constrained observations. Current generative approaches, while capable of synthesizing realistic textures, often suffer from information asymmetry -- the intrinsic disparity between the information-sparse low quality inputs and the information-dense high quality outputs. This imbalance leads to a one-to-many mapping, where insufficient constraints result in stochastic uncertainty and hallucinatory artifacts. To bridge this gap, we present \textbf{Pref-Restore}, a hierarchical framework that integrates discrete semantic logic with continuous texture generation to achieve deterministic, preference-aligned restoration. Our methodology fundamentally addresses this information disparity through two complementary strategies: (1) Augmenting Input Density: We employ an auto-regressive integrator to reformulate textual instructions into dense latent queries, injecting high-level semantic stability to constrain the degraded signals; (2) Pruning Output Distribution: We pioneer the integration of on-policy reinforcement learning directly into the diffusion restoration loop. By transforming human preferences into differentiable constraints, we explicitly penalize stochastic deviations, thereby sharpening the posterior distribution toward the desired high-fidelity outcomes. Extensive experiments demonstrate that Pref-Restore achieves state-of-the-art performance across synthetic and real-world benchmarks. Furthermore, empirical analysis confirms that our preference-aligned strategy significantly reduces solution entropy, establishing a robust pathway toward reliable and deterministic blind restoration.
翻译:盲人脸恢复由于从严重受限的观测中重建整体结构这一固有的不适定性,仍然是一个持续的挑战。当前的生成方法虽然能够合成逼真的纹理,但常常受到信息不对称的困扰——即信息稀疏的低质量输入与信息密集的高质量输出之间的内在差异。这种不平衡导致了一对多的映射关系,其中约束不足引发了随机不确定性和幻觉伪影。为了弥合这一差距,我们提出了 **Pref-Restore**,一个将离散语义逻辑与连续纹理生成相结合的分层框架,以实现确定性的、偏好对齐的恢复。我们的方法通过两种互补策略从根本上解决了这种信息差异:(1) 增强输入密度:我们采用自回归积分器将文本指令重新表述为密集的潜在查询,注入高级语义稳定性以约束退化信号;(2) 修剪输出分布:我们率先将策略上强化学习直接集成到扩散恢复循环中。通过将人类偏好转化为可微约束,我们显式地惩罚随机偏差,从而将后验分布锐化至期望的高保真结果。大量实验表明,Pref-Restore 在合成和真实世界基准测试中均实现了最先进的性能。此外,实证分析证实,我们的偏好对齐策略显著降低了解决方案的熵,为可靠且确定性的盲恢复建立了一条稳健的路径。