We present the RAW domain diffusion model (RDDM), an end-to-end diffusion model that restores photo-realistic images directly from the sensor RAW data. While recent sRGB-domain diffusion methods achieve impressive results, they are caught in a dilemma between high fidelity and image generation. These models process lossy sRGB inputs and neglect the accessibility of the sensor RAW images in many scenarios, e.g., in image and video capturing in edge devices, resulting in sub-optimal performance. RDDM obviates this limitation by directly restoring images in the RAW domain, replacing the conventional two-stage image signal processing (ISP)->Image Restoration (IR) pipeline. However, a simple adaptation of pre-trained diffusion models to the RAW domain confronts many challenges. To this end, we propose: (1) a RAW-domain VAE (RVAE), encoding sensor RAW and decoding it into an enhanced linear domain image, to solve the out-of-distribution (OOD) issues between the different domain distributions; (2) a configurable multi-bayer (CMB) LoRA module, adapting diverse RAW Bayer patterns such as RGGB, BGGR, etc. To compensate for the deficiency in the dataset, we develop a scalable data synthesis pipeline synthesizing RAW LQ-HQ pairs from existing sRGB datasets for large-scale training. Extensive experiments demonstrate RDDM's superiority over state-of-the-art sRGB diffusion methods, yielding higher fidelity results with fewer artifacts. Codes will be publicly available at https://github.com/YanCHEN-fr/RDDM.
翻译:本文提出RAW域扩散模型(RDDM),一种可直接从传感器RAW数据中复原照片级真实感图像的端到端扩散模型。尽管近期sRGB域扩散方法取得了令人瞩目的成果,但其在高保真度与图像生成之间陷入两难境地。这些模型处理有损的sRGB输入,且忽视了传感器RAW图像在许多场景(如边缘设备中的图像与视频采集)中的可获取性,导致性能未能达到最优。RDDM通过直接在RAW域进行图像复原,取代了传统的图像信号处理(ISP)→图像复原(IR)两阶段流程,从而克服了这一局限。然而,将预训练扩散模型简单适配至RAW域面临诸多挑战。为此,我们提出:(1)RAW域变分自编码器(RVAE),通过编码传感器RAW数据并将其解码至增强的线性域图像,以解决不同域分布间的分布外(OOD)问题;(2)可配置多拜耳(CMB)LoRA模块,适配RGGB、BGGR等多种RAW拜耳阵列模式。为弥补数据集的不足,我们开发了可扩展的数据合成流程,从现有sRGB数据集中合成RAW低质量-高质量配对数据以进行大规模训练。大量实验表明,RDDM优于当前最先进的sRGB扩散方法,能以更少的伪影产生更高保真度的结果。代码将在https://github.com/YanCHEN-fr/RDDM 公开提供。