High Dynamic Range (HDR) video reconstruction aims to recover fine brightness, color, and details from Low Dynamic Range (LDR) videos. However, existing methods often suffer from color inaccuracies and temporal inconsistencies. To address these challenges, we propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling (W-MIM). WMNet adopts a two-phase training strategy: In Phase I, W-MIM performs self-reconstruction pre-training by selectively masking color and detail information in the wavelet domain, enabling the network to develop robust color restoration capabilities. A curriculum learning scheme further refines the reconstruction process. Phase II fine-tunes the model using the pre-trained weights to improve the final reconstruction quality. To improve temporal consistency, we introduce the Temporal Mixture of Experts (T-MoE) module and the Dynamic Memory Module (DMM). T-MoE adaptively fuses adjacent frames to reduce flickering artifacts, while DMM captures long-range dependencies, ensuring smooth motion and preservation of fine details. Additionally, since existing HDR video datasets lack scene-based segmentation, we reorganize HDRTV4K into HDRTV4K-Scene, establishing a new benchmark for HDR video reconstruction. Extensive experiments demonstrate that WMNet achieves state-of-the-art performance across multiple evaluation metrics, significantly improving color fidelity, temporal coherence, and perceptual quality. The code is available at: https://github.com/eezkni/WMNet
翻译:高动态范围(HDR)视频重建旨在从低动态范围(LDR)视频中恢复精细的亮度、色彩与细节。然而,现有方法常存在色彩不准确与时序不一致的问题。为应对这些挑战,我们提出了WMNet,一种新颖的HDR视频重建网络,它利用小波域掩码图像建模(W-MIM)。WMNet采用两阶段训练策略:在第一阶段,W-MIM通过在小波域选择性地掩蔽色彩与细节信息进行自重建预训练,使网络能够发展出鲁棒的颜色恢复能力。课程学习方案进一步优化了重建过程。第二阶段使用预训练权重对模型进行微调,以提升最终重建质量。为提高时序一致性,我们引入了时序专家混合(T-MoE)模块与动态记忆模块(DMM)。T-MoE自适应地融合相邻帧以减少闪烁伪影,而DMM则捕获长程依赖关系,确保运动平滑与细节保留。此外,由于现有HDR视频数据集缺乏基于场景的分割,我们将HDRTV4K重组为HDRTV4K-Scene,为HDR视频重建建立了一个新的基准。大量实验表明,WMNet在多项评估指标上均达到了最先进的性能,显著提升了色彩保真度、时序一致性与感知质量。代码发布于:https://github.com/eezkni/WMNet