Existing learning-based denoising methods typically train models to generalize the image prior from large-scale datasets, suffering from the variability in noise distributions encountered in real-world scenarios. In this work, we propose a new perspective on the denoising challenge by highlighting the distinct separation between noise and image priors. This insight forms the basis for our development of conditional optimization framework, designed to overcome the constraints of traditional denoising framework. To this end, we introduce a Locally Noise Prior Estimation (LoNPE) algorithm, which accurately estimates the noise prior directly from a single raw noisy image. This estimation acts as an explicit prior representation of the camera sensor's imaging environment, distinct from the image prior of scenes. Additionally, we design an auxiliary learnable LoNPE network tailored for practical application to sRGB noisy images. Leveraging the estimated noise prior, we present a novel Conditional Denoising Transformer (Condformer), by incorporating the noise prior into a conditional self-attention mechanism. This integration allows the Condformer to segment the optimization process into multiple explicit subspaces, significantly enhancing the model's generalization and flexibility. Extensive experimental evaluations on both synthetic and real-world datasets, demonstrate that the proposed method achieves superior performance over current state-of-the-art methods. The source code is available at https://github.com/YuanfeiHuang/Condformer.
翻译:现有的基于学习的去噪方法通常通过在大规模数据集上训练模型来泛化图像先验,这在实际场景中遇到噪声分布变化时存在局限。在本工作中,我们通过强调噪声先验与图像先验之间的本质分离,为去噪挑战提出了一个新的视角。这一洞见构成了我们开发条件优化框架的基础,旨在克服传统去噪框架的约束。为此,我们引入了局部噪声先验估计算法,该算法能够直接从单张原始噪声图像中精确估计噪声先验。这一估计作为相机传感器成像环境的显式先验表征,与场景的图像先验相区别。此外,我们设计了一个辅助可学习的LoNPE网络,专门用于实际应用中的sRGB噪声图像处理。利用估计的噪声先验,我们提出了一种新颖的条件去噪Transformer,通过将噪声先验融入条件自注意力机制中。这种集成使Condformer能够将优化过程分割为多个显式子空间,显著增强了模型的泛化能力和灵活性。在合成和真实世界数据集上进行的大量实验评估表明,所提方法在性能上超越了当前最先进的方法。源代码可在 https://github.com/YuanfeiHuang/Condformer 获取。