Beyond Learned Metadata-based Raw Image Reconstruction

While raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the pixel space of the raw image. However, these approaches often leave space for pursuing more effective image representations and compact metadata. In this work, we propose a novel framework that learns a compact representation in the latent space, serving as metadata, in an end-to-end manner. Compared with lossy image compression, we analyze the intrinsic difference of the raw image reconstruction task caused by rich information from the sRGB image. Based on the analysis, a novel backbone design with asymmetric and hybrid spatial feature resolutions is proposed, which significantly improves the rate-distortion performance. Besides, we propose a novel design of the context model, which can better predict the order masks of encoding/decoding based on both the sRGB image and the masks of already processed features. Benefited from the better modeling of the correlation between order masks, the already processed information can be better utilized. Moreover, a novel sRGB-guided adaptive quantization precision strategy, which dynamically assigns varying levels of quantization precision to different regions, further enhances the representation ability of the model. Finally, based on the iterative properties of the proposed context model, we propose a novel strategy to achieve variable bit rates using a single model. This strategy allows for the continuous convergence of a wide range of bit rates. Extensive experimental results demonstrate that the proposed method can achieve better reconstruction quality with a smaller metadata size.

翻译：尽管原始图像相对于sRGB图像具有线性度与细粒度量化等级等显著优势，但由于其巨大的存储需求，并未被普通用户广泛采用。近期研究提出通过在原始图像像素空间内设计采样掩码来实现原始图像压缩。然而，这些方法在追求更高效的图像表示与紧凑元数据方面仍有提升空间。本研究提出了一种新颖框架，以端到端方式在潜在空间中学习紧凑表示作为元数据。与有损图像压缩相比，我们分析了sRGB图像丰富信息导致的原始图像重建任务的本质差异。基于该分析，我们提出了一种具有非对称混合空间特征分辨率的新型主干网络设计，显著提升了率失真性能。此外，我们提出了一种上下文模型新设计，可同时基于sRGB图像与已处理特征的掩码更准确地预测编码/解码顺序掩码。得益于对顺序掩码间相关性的更优建模，已处理信息得以更好利用。更进一步，提出了一种sRGB引导的自适应量化精度策略，可动态为不同区域分配不同量化精度级，增强了模型表示能力。最终，基于所提出上下文模型的迭代特性，我们提出了一种采用单一模型实现可变比特率的新策略，该策略支持宽范围比特率的连续收敛。大量实验结果表明，所提方法能以更小的元数据尺寸获得更好的重建质量。