Recent advancements in Low-Light Image Enhancement (LLIE) have focused heavily on Diffusion Probabilistic Models, which achieve high perceptual quality but suffer from significant computational latency (often exceeding 2-4 seconds per image). Conversely, traditional CNN-based baselines offer real-time inference but struggle with "over-smoothing," failing to recover fine structural details in extreme low-light conditions. This creates a practical gap in the literature: the lack of a model that provides generative-level texture recovery at edge-deployable speeds. In this paper, we address this trade-off by proposing a hybrid Attention U-Net GAN. We demonstrate that the heavy iterative sampling of diffusion models is not strictly necessary for texture recovery. Instead, by integrating Attention Gates into a lightweight U-Net backbone and training within a conditional adversarial framework, we can approximate the high-frequency fidelity of generative models in a single forward pass. Extensive experiments on the SID dataset show that our method achieves a best-in-class LPIPS score of 0.112 among efficient models, significantly outperforming efficient baselines (SID, EnlightenGAN) while maintaining an inference latency of 0.06s. This represents a 40x speedup over latent diffusion models, making our approach suitable for near real-time applications.
翻译:低光图像增强领域的最新进展主要集中于扩散概率模型,此类模型虽能实现较高的感知质量,但存在显著的计算延迟问题(单张图像处理时间常超过2-4秒)。相比之下,传统的基于CNN的基线方法虽能实现实时推理,却普遍存在“过度平滑”现象,难以在极端低光条件下恢复精细的结构细节。这导致当前研究存在一个实践缺口:缺乏能够在边缘部署速度下提供生成级纹理恢复的模型。本文通过提出混合注意力U-Net GAN来解决这一权衡问题。我们证明扩散模型的重度迭代采样对于纹理恢复并非绝对必要。通过将注意力门机制集成至轻量级U-Net主干网络,并在条件对抗框架中进行训练,我们能够在单次前向传播中逼近生成模型的高频保真度。在SID数据集上的大量实验表明,本方法在高效模型中取得了0.112的顶尖LPIPS分数,显著优于高效基线模型(SID、EnlightenGAN),同时保持0.06秒的推理延迟。相较于潜在扩散模型实现了40倍的加速,使得本方法适用于近实时应用场景。