Recent advances in diffusion and flow-based generative models have demonstrated remarkable success in image restoration tasks, achieving superior perceptual quality compared to traditional deep learning approaches. However, these methods either require numerous sampling steps to generate high-quality images, resulting in significant computational overhead, or rely on common model distillation, which usually imposes a fixed fidelity-realism trade-off and thus lacks flexibility. In this paper, we introduce OFTSR, a novel flow-based framework for one-step image super-resolution that can produce outputs with tunable levels of fidelity and realism. Our approach first trains a conditional flow-based super-resolution model to serve as a teacher model. We then distill this teacher model by applying a specialized constraint. Specifically, we force the predictions from our one-step student model for same input to lie on the same sampling ODE trajectory of the teacher model. This alignment ensures that the student model's single-step predictions from initial states match the teacher's predictions from a closer intermediate state. Through extensive experiments on datasets including FFHQ (256$\times$256), DIV2K, and ImageNet (256$\times$256), we demonstrate that OFTSR achieves state-of-the-art performance for one-step image super-resolution, while having the ability to flexibly tune the fidelity-realism trade-off. Codes: \href{https://github.com/yuanzhi-zhu/OFTSR}{https://github.com/yuanzhi-zhu/OFTSR}.
翻译:近期,基于扩散和流的生成模型在图像恢复任务中取得了显著成功,相较于传统深度学习方法展现出更优的感知质量。然而,这些方法要么需要大量采样步骤才能生成高质量图像,导致显著的计算开销;要么依赖常见的模型蒸馏技术,但通常固定了保真度-真实感的权衡,因而缺乏灵活性。本文提出OFTSR,一种基于流的新型框架,用于实现单步图像超分辨率,并能生成具有可调保真度与真实感级别的输出结果。该方法首先训练一个条件流超分辨率模型作为教师模型,随后通过施加特定约束对该教师模型进行蒸馏。具体而言,我们强制单步学生模型对相同输入的预测结果位于教师模型同一采样常微分方程(ODE)轨迹上。这种对齐确保学生模型从初始状态出发的单步预测与教师模型从更接近的中间状态出发的预测相匹配。通过在FFHQ(256×256)、DIV2K和ImageNet(256×256)等数据集上的大量实验,我们证明OFTSR在单步图像超分辨率任务中达到了最先进性能,同时具备灵活调节保真度-真实感权衡的能力。代码:\href{https://github.com/yuanzhi-zhu/OFTSR}{https://github.com/yuanzhi-zhu/OFTSR}。