High-dynamic-range (HDR) formats and displays are becoming increasingly prevalent, yet state-of-the-art image generators (e.g., Stable Diffusion and FLUX) typically remain limited to low-dynamic-range (LDR) output due to the lack of large-scale HDR training data. In this work, we show that existing pretrained diffusion models can be easily adapted to HDR generation without retraining from scratch. A key challenge is that HDR images are natively represented in linear RGB, whose intensity and color statistics differ substantially from those of sRGB-encoded LDR images. This gap, however, can be effectively bridged by converting HDR inputs into perceptually uniform encodings (e.g., using PU21 or PQ). Empirically, we find that LDR-pretrained variational autoencoders (VAEs) reconstruct PU21-encoded HDR inputs with fidelity comparable to LDR data, whereas linear RGB inputs cause severe degradations. Motivated by this finding, we describe an efficient adaptation strategy that freezes the VAE and finetunes only the denoiser via low-rank adaptation in a perceptually uniform space. This results in a unified computational method that supports both text-to-HDR synthesis and single-image RAW-to-HDR reconstruction. Experiments demonstrate that our perceptually encoded adaptation consistently improves perceptual fidelity, text-image alignment, and effective dynamic range, relative to previous techniques.
翻译:高动态范围(HDR)格式与显示器正日益普及,然而,由于缺乏大规模HDR训练数据,当前最先进的图像生成器(例如Stable Diffusion和FLUX)通常仍局限于输出低动态范围(LDR)图像。在本工作中,我们证明无需从头开始重新训练,即可轻松将现有的预训练扩散模型适配用于HDR生成。一个关键挑战在于,HDR图像原生以线性RGB表示,其强度与色彩统计特性与sRGB编码的LDR图像存在显著差异。然而,通过将HDR输入转换为感知均匀编码(例如使用PU21或PQ),可以有效弥合这一差距。实验表明,经过LDR预训练的变分自编码器(VAEs)能够以与LDR数据相当的保真度重建PU21编码的HDR输入,而线性RGB输入则会导致严重的性能退化。基于这一发现,我们提出了一种高效的适配策略:在感知均匀空间中冻结VAE,仅通过低秩适配对去噪器进行微调。由此产生了一种统一的计算方法,同时支持文本到HDR的合成与单图像RAW到HDR的重建。实验证明,相较于现有技术,我们基于感知编码的适配方法在感知保真度、图文对齐度以及有效动态范围方面均取得了持续改进。