Fast, faithful and photorealistic diffusion-based image super-resolution with enhanced Flow Map models

Diffusion-based image super-resolution (SR) has recently attracted significant attention by leveraging the expressive power of large pre-trained text-to-image diffusion models (DMs). A central practical challenge is resolving the trade-off between reconstruction faithfulness and photorealism. To address inference efficiency, many recent works have explored knowledge distillation strategies specifically tailored to SR, enabling one-step diffusion-based approaches. However, these teacher-student formulations are inherently constrained by information compression, which can degrade perceptual cues such as lifelike textures and depth of field, even with high overall perceptual quality. In parallel, self-distillation DMs, known as Flow Map models, have emerged as a promising alternative for image generation tasks, enabling fast inference while preserving the expressivity and training stability of standard DMs. Building on these developments, we propose FlowMapSR, a novel diffusion-based framework for image super-resolution explicitly designed for efficient inference. Beyond adapting Flow Map models to SR, we introduce two complementary enhancements: (i) positive-negative prompting guidance, based on a generalization of classifier free-guidance paradigm to Flow Map models, and (ii) adversarial fine-tuning using Low-Rank Adaptation (LoRA). Among the considered Flow Map formulations (Eulerian, Lagrangian, and Shortcut), we find that the Shortcut variant consistently achieves the best performance when combined with these enhancements. Extensive experiments show that FlowMapSR achieves a better balance between reconstruction faithfulness and photorealism than recent state-of-the-art methods for both x4 and x8 upscaling, while maintaining competitive inference time. Notably, a single model is used for both upscaling factors, without any scale-specific conditioning or degradation-guided mechanisms.

翻译：基于扩散的图像超分辨率技术近期通过利用大规模预训练文本到图像扩散模型的表达能力而受到广泛关注。其核心实践挑战在于解决重建保真度与照片真实感之间的权衡问题。为提升推理效率，许多最新研究探索了专门针对超分辨率任务设计的知识蒸馏策略，实现了单步扩散式超分辨率方法。然而，这类师生式框架本质上受限于信息压缩，即使整体感知质量较高，仍可能削弱如逼真纹理和景深等感知线索。与此同时，被称为流映射模型的自蒸馏扩散模型已成为图像生成任务中极具前景的替代方案，在保持标准扩散模型表达能力与训练稳定性的同时实现了快速推理。基于这些进展，我们提出FlowMapSR——一个专为高效推理设计的创新型扩散式图像超分辨率框架。除了将流映射模型适配于超分辨率任务外，我们引入了两项互补增强技术：（一）基于分类器无关引导范式在流映射模型中的泛化而构建的正负提示引导机制；（二）采用低秩自适应技术的对抗性微调。在考察的流映射模型变体（欧拉式、拉格朗日式与捷径式）中，我们发现捷径式变体结合上述增强技术能持续获得最佳性能。大量实验表明，在4倍与8倍放大任务中，FlowMapSR相比当前最先进方法在重建保真度与照片真实感之间取得了更优平衡，同时保持具有竞争力的推理时间。值得注意的是，该单一模型可同时适用于两种放大倍数，无需任何尺度特定条件机制或退化引导机制。